Saturday, March 14, 2015

Production-ready mesosphere cluster on Azure with single command

Lately I have been busy exploring how a production ready mesosphere cluster can be built on top of Azure. The journey was interesting because it went through quite a few technologies that I was almost oblivious before I started. But at the same time excited and amazed by their capabilities that I feel I should share this experience. This article is aimed to explain these technologies to a beginner rather than any DevOps ninjas.

Before I go forward, let’s set some very basic description up for few technologies/components that are being used into the process. What else can be better than starting it with the Docker?

Docker

From Docker site, Docker is an open platform for developing, shipping, and running applications. Docker is designed to deliver your applications faster. With Docker you can separate your applications from your infrastructure and treat your infrastructure like a managed application. Docker helps you ship code faster, test faster, deploy faster, and shorten the cycle between writing code and running code. Docker uses LinuxContainers – (LXC) and the AuFS file system. One can easily be confused with virtual machines (in fact there are few questions in Stackoverflow about this), but docker differs in many aspects from virtual machines. It is significantly lightweight compare to a vm. More importantly it can work with delta changes. Let’s try to understand what that means with an example scenario:

Let’s say we have an application runs in a web server (Apache for example) and serves a HTML document, JavaScripts. We can now define a script (DSL) file that describes how the application was constructed in a Dockerfile. Dockerfile is a Docker file that describes the application. Specifying that it needs an OS (let’s say Ubuntu) and then it needs Apache web server and then HTML, JS files should be copied into certain directory. And it needs a port to be opened for Apache, etc.

With that Dockerfile, we can instruct Docker (which is a Daemon process running after installing Docker) to build the image from this file. Once the image is build we can ask Docker to run that image (like an instance) and that’s it! The application is running. The image can be metaphorically seen as a VHD for Virtual machines.

It gets more interesting, when Docker registry (a.k.a hub) comes into the picture. Notice, in our Dockerfile we first said we need Ubuntu as OS. So how that become part of our image during the Docker build? There is a public registry (Docker hub pretty much like GitHub) where plenty of images are made available by numerous contributors. There is a base image that only build an image with the OS Ubuntu. And in our Dockerfile we simply mentioned that image is our base image. On top of that image we added Apache web server (like a layer) and then our HTMLs (second layer). When Docker daemon builds the image, it will look for the local cache for base Ubuntu image and when not found it will fetch it from the public Docker registry. Then it will create the other layers on top of it to compose the image we want. Now if we change our HTMLs (add/remove them) and ask Docker daemon to build it again, it will be significantly faster. Because it recognizes the deltas and doesn’t download the Ubuntu or Apache again. It only change the layer that has changes and delivers a new image. Which we can run and our changes will be reflect as expected. One can also define their own private Docker registry, in that case their images will not be publicly exposed- suitable for enterprise business applications.

This feature makes it really powerful for Continuous deployment process. Where the build pipelines can output Docker image of the application component, push it to the registry (public or private hub) and in the production do a pull (as it recognizes deltas it will be faster) and run that new image. Pretty darn cool! In order to know more about Docker, visit their site.

Vagrant

Vagrant is a tool for building complete development environments, sandboxed in a virtual machine. It helps enforce good practices by encouraging the use of automation so that development environments are as close to production as possible.

It’s kind of a tool that address the infamous works in my machine problem. A developer can build an environment and create a vagrant file for the environment, vagrant makes sure that the same vagrant file allows other developers get the exact same environment to run the same application.

Vagrant file is like a Dockerfile (described above) where VMs are defined (with their network needs, port forwarding, file sharing etc). Once vagrant executed such a file with a

vagrant up
command on console, it uses a virtual machine provider (Oracle VirtualBox for example) to provision the VMs and once the machine is booted, it will also allow us to write scripts in ansible, puppet, chef, Terraform or even plain and old bash that will be executed into the guest VMs to prepare them as needed. Although bash isn't idempotent out of the box. However tools like, ansible, Terraform are idempotent, which makes them really the tool of choice. Vagrant in conjunction with these system configuration technologies can provide true Infrastructure as code.

It’s over a year now, MSOpenTech developed Azure provider for vagrant. Which allows not to manage infrastructure in a vagrant file and possibly use the same file to provision identical infrastructure both on a developer’s local machine and on Azure production area, exactly the same way and easily (possibly with a single command).

So, now we know that Docker ensures that we can containerize and ship an application exactly the way we like into production, and vagrant with/without ansible, puppet etc can build the required infrastructure, we can run few application instance nice and smooth in production. But the problem gets little complicated when we want to scale up/out/down our applications. In a Microservice scenario the problem gets amplified quite far. An application can easily end up having numerous dockerized containers running on multiple machines. Managing or even keeping track of those application instances can easily become a nightmare. It’s obvious that there need some automation, to manage the container instances, scale few of them up/out as needed (based on demands), allocating resources (CPU, RAMs un-evenly to these application based on their need), spread them over multiple machines to achieve high availability, making them fault-tolerant-spining up new instances in case of a failure.

Hell lot of works! Good news is we don’t need to develop that beast. There are solutions to address such scenarios. Mesosphere is one of them.

Mesosphere

Mesosphere - as their site described it,

it’s like a new kind of operating system. The Mesosphere Datacenter Operating System (DCOS) is a new kind of operating system that spans all of the servers in a physical or cloud-based datacenter, and runs on top of any Linux distribution.
It’s a big thing-as it sounds. It indeed is. The Mesosphere DCOS includes a rich ecosystem of components. The components that needs to be focused on this articles are as follows:

Apache ZooKeeper

ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications. Each time they are implemented there are lot of works that go into fixing the bugs and race conditions that are inevitable. Because of the difficulty of implementing these kinds of services, applications initially usually skimp on them, which make them brittle in the presence of change and difficult to manage. Even when done correctly, different implementations of these services lead to management complexity when the applications are deployed.

Mesos

Mesos site says:

Apache Mesos abstracts CPU, memory, storage, and other compute resources away from machines (physical or virtual), enabling fault-tolerant and elastic distributed systems to easily be built and run effectively. It is an open source software originally developed at the University of California at Berkeley. It can run many applications on a dynamically shared pool of nodes.
It is battle tested, prominent users of Mesos include Twitter, Airbnb etc.

Mesos is built using the same principles as the Linux kernel, only at a different level of abstraction. The Mesos kernel runs on every machine and provides applications (e.g., Hadoop, Spark, Kafka, Elastic Search) with API’s for resource management and scheduling across entire datacenter and cloud environments. It can scale out to massive clusters like 10,000 of nodes. Its Fault-tolerant replicated master and slaves using ZooKeeper, and supports docker containers.

Mesos has one “leader” mesos-master (with multiple standby masters managed by ZooKeeper- which makes it resilient), and multiple mesos slaves- which is like the worker nodes. The worker nodes issue “offers” (the capabilities of the machines) to Mesos. Mesos also supports “frameworks” which can play with the offers that made available to the master. These frameworks can actually be a scheduler that decides what workloads can be assigned to which worker based on the offers it receives from Mesos. One such framework we will be looking at is Marathon.

Marathon

Marathon is a cluster-wide init and control system for services in cgroups or Docker containers.

Marathon a roughly like a scheduler framework (actually more than that- but we will see it later) that works together with Chronos and sits of top of Mesos.

Marathon provides a REST API for starting, stopping, and scaling applications. Marathon can run in highly-available mode by running multiple copies. The state of running tasks gets stored in the Mesos state abstraction.

Marathon is a meta framework: It can start other Mesos frameworks such as Chronos or Storm with it to ensure they survive machine failures. It can launch anything that can be launched in a standard shell (thus, Docker images too).

See them in action

We now have some basic understanding about these components, especially the mesosphere cluster, let’s build a vagrant configuration that will build a mesosphere cluster on our local windows machine (laptop is sufficient, I used a windows 8.1 machine as playground). We will be using three mesos masters and all of them also installed ZooKeeper and Marathon on them. And we will have three mesos slave machines to run workloads. To prepare the laptop we need to download and install vagrant first. Next step would be creating the vagrantfile that contains the infrastructure as coded. Here is the script snippet that defines the master Vms, the entire vagrant file can be found here.

As we can see here, we are defining the master machines with ip address starts from 192.0.2.1, and goes like 192.0.2.2, 192.0.2.3. (vagrant file is a Ruby file- therefore it’s absolutely programmable script). We can literally now go to this directory from command prompt and run

$> vagrant up

This should create three VMs in local Oracle VirtualBox (that’s the default provider here). However, once the machines get created we need to install mesos, marathon and Zookeeper on them and also need to configure them on those machines. Here comes the provision part. The code snippet here shows at the end we tell vagrant to provision the guest OS by a bash command file. This is not the best option in my opinion (because it’s not idempotent), ansible, Terraform would be best options, but bash is easy to understand the stuffs.

The master provisioning script is also into the same GitHub repo.

Let’s quickly walkthrough some crucial part from the script.

sudo apt-get -y install mesosphere
Setting up ZooKeeper configuration with all the master machine ip addresses:
sudo sed -i -e s/localhost:2181/192.0.2.101:2181,192.0.2.102:2181,192.0.2.103:2181/g /etc/mesos/zk

The script in GitHub has comments that explains what these configuration does. So I will not repeat them here. The basic idea is, installing and configuring the mesos masters and marathons for the cluster.

The vagrant file also creates three slave machines, these are the machines where workloads will be executed. The slave machines are also configured with mesos slave software components in the same way we provisioned the master machines. There is a slave script into the above mentioned GitHub repo.

Now we are pretty much ready to kick it off. Just vagrant up, and your laptop has now a virtual cluster that is conceptually production ready! Of course no one should use Oracle Virtual Box to build a cluster on a single hardware, doesn’t make sense. But the code and idea is absolutely ready to use with a different provider Like Azure or AWS or any cloud vendor or even our proprietary bare-metal data center.

Taking it one step further

Let’s build the same cluster on Microsoft Azure. MSOpentech has very recently created azure provider for vagrant. We will be using that here. However there are some limitations that took a while for me to work around. The first problem is Vagrant provisioning scripts need to know and use the ip address of the VMs that are created by the provider. For VirtualBox it’s not an issue. We can define the ip address upfront in our vagrant file. But in Azure, the ip addresses will be assigned to them dynamically. Also we need to use the internal ip addresses of the machines, not the virtual public ip addresses. Using virtual ip addresses will cause the master servers communicate each other going out and then coming in to the Azure load balancer, costly and slow. Using Azure virtual network though we can define ip ranges, but we never can guarantee which machine has got exact what ip address. I managed to work around this issue by using Azure CLI and powershell.

The work around is like following, a power shell script boots the entire provision process (light.ps1), it uses vagrant to do the VM provisioning (creating a cloud service for all six machines), creating and attaching disks for them. Once the vagrant finished booting up machines, the powershell script gets control back. It then uses Azure cmdlet to read the machine metadata from the cloud service that was just provisioned.

These metadata returns the internal ip addresses of the machines. The script then creates some bash files into a local directory- to configure the mesos, marathon and zookeeper etc, using the ip addresses retrieved earlier.

Once these provision files are available in disk, the powershell script calls vagrant again to provision each machine by using those dynamically created bash files. The process finally creates the Azure endpoints to the appropriate servers so that we can access the mesos and marathon console from our local machine to administer and monitor the cluster we have just created. The entire scripts and vagrant files can be found into this GitRepo.

The process takes about 25 to 30 minutes based on internet speed, but it ends up having a production ready mesos cluster up and running on Windows Azure. All we need to do is get the powershell script and vagrant file and launch the “Light.ps1” form powershell command line. Which is kind of cool!

The script already created end points for Mesos and Marathon into the VM. We can now visit Mesos management console by following an url like http://cloudservicename.cloudapp.net:5050. It may be the case that a different master is leading the cluster, in that case, the port may be 5051 or 5052. But the console will display that message too.

Similarly the Marathon management console can be located at http://cloudservicename.cloudapp.net:8080. Where we can monitor, scale tasks with button clicks. But it has power full REST API which can be leveraged to take to even further.

Summary

It's quite a lot of stuffs going on here. Specially for someone who is new to this territory. But I can say is, the possibilities it offer probably pays off the effort of learning and dealing them.