Docker & CoreOS - Part 2

In Part 1 of this series we gave you an overview of the deployment process at Clearbit. In this post I'm going to talk about our how we use fleet to run our services.

Note: We use Amazon's EC2 system. In this article when I refer to a machine I'm really talking about an EC2 instance. I've used machine to avoid confusion with an instance (runnning process) of an application.

Life before Fleet

Before migrating over to CoreOS/Fleet we were using dokku-alt, which is a great Heroku-esque PaaS. For small apps it's a great solution. It has a config system to control environment variables available to your application, deploys are simple and quick (just a git push) and it has support for rollbacks. It worked really well for us in the beginning.

As we grew we started to need more and more instances of each service. dokku-alt had no way of running multiple instances of a service on one machine, so we ran each service on multiple machines. Having several machines meant deploys would require pushing to each machine (one by one, so as not to take the service down on all machines at once). Configuration was also per-machine so any configuration changes for a service needed to be pushed to each machine, which also triggered a restart of the application instance.

Our services became busier still and we found that although our frontend instances were coping just fine we needed to increase the number of workers processing jobs on our queues to service the requests. dokku-alt isn't really designed for non-web-facing applications, we managed to get by with some modifications to dokku-alt (submitted and merged upstream), but we felt we were pushing at the limits of the dokku-alt architecture and started to look at other options.

Now what?

After looking into solutions like Deis and Flynn we decided we'd feel happier with something with simpler semantics. We were attracted to Fleet because of it's simplicity and flexibility, and the reputation of the CoreOS team.

Fleet is effectively a clustered layer on top of systemd. Fleet uses systemd unit files with an (optional) added section to tell fleet which machines it should run on. There is very little magic.

To co-ordinate between the machines fleet uses CoreoOS's etcd which is a distributed key-value store using the Raft consensus protocol. etcd is also available to us to easily share service configuration across the cluster, no more changing it one node at a time. We can adjust the configuration from any node, and all other nodes will reliably see the change.

The documentation for CoreOS let us get a cluster up and running quickly for trying things out (we'll cover the details of that and how we build clusters in a future post).

We experimented with running some small applications on the cluster and everything seemed to work as advertised. I experimented with a few different patterns found in the fleet documentation, mailing lists and blog posts for wiring docker containers together with fleet. There were some wrinkles, but it looked like this could work well for us, decision made.


Fleet unit files are systemd unit files with an extra section. Our first unit file (watchlisthub-http@.service) looked basically like this:


ExecStartPre=-/usr/bin/docker kill watchlisthub-%i  
ExecStartPre=-/usr/bin/docker rm watchlisthub-%i  
ExecStart=/bin/sh -c '/usr/bin/docker run --rm --name watchlisthub-%i \  
    -p 5000:5000 \
    -e RACK_ENV=$(etcdctl get /environment/watchlisthub/RACK_ENV) \
    -e DATABASE_URL=$(etcdctl get /environment/watchlisthub/DATABASE_URL) \
    -e REDIS_URL=$(etcdctl get /environment/watchlisthub/REDIS_URL) \
    ... some other environment variables here ...
ExecStop=/usr/bin/docker stop watchlisthub-%i


This unit file defines a service for one of our APIs, Watchlist. It's a template unit file, in systemd parlance. Unit files which are named: something@.service are intended to define services where you'd like to run multiple instances of the same process with similar options. For example, we might need to run three instances of watchlist to handle the traffic we're getting. To do this we can write a template unit like above, submit it to fleet and then start three instances:

$ fleetctl submit watchlisthub-http@.service
$ fleetctl start watchlisthub-http@1.service
$ fleetctl start watchlisthub-http@2.service
$ fleetctl start watchlisthub-http@3.service

Templates can use some systemd magic in them, namely specifiers. The %i in the unit file is replaced by systemd with whatever was used after the @ when a service is launched from a template. So, for our first watchlisthub instance %i would be substitued with 1, the second instance %i as 2, and so on. In this case we use the instance specifier to form a unique container name so that we can be sure we are operating on the correct container with docker.

You can use the instance specifier to do some interesting things. In the watchlisthuhub unit file the port number is fixed at 5000 (that's what the -p flag to docker is doing), but we could have used %i there instead, allowing us to start three instances of watchlisthub-http listening on different ports, say watchlisthub-http@8001, watchlisthub-http@8002 and watchlisthub-http@8003.

So now that the specifier has been replaced with the instance name, what do the various lines do:

The ExecStartPre lines ensure that the service is down before we try to bring it up, useful when restarting if the process hung.

Starting a command with a - tells systemd not to give up if a commands fail.

The ExecStart line actually starts the service and runs an unfortunately rather long docker command to create a new container with the right environment variables. We're using etcdctl here to pull the environment variables from etcd.

Lastly, the Conflicts line. This is functionality that fleet provides on top of systemd allowing you to control which machine is chosen to run an instance.

Amazon's ELB requires that all machines serving for a particular ELB balancer listen on the same port. This means we needed to tell fleet not to try and run multiple instances of watchlist on the same machine, as only one instance can bind to a given port. The Conflicts line tells fleet not to run a watchlist-http instance on a machine which has a watchlist-http service running already.

At this point we had solved our configuration issue from dokku-alt, configuration was easily shared. We were using docker containers to build our application (we'll go into the build system in another post). We could restart one application instance at a time and it would come up with the latest version. Rolling back was changing a tag in docker to the previous image and restarting.


Happy with how things were going we started to add more services, a lot more services. As we iterated on our use of fleet and other infrastructure it became difficult to keep unit files updated and make sure things were consistent across services. Most lines of the unit files would contain the application name, making diff'ing of unit files awkward and copy/paste/edit error prone. Time to refactor.

As you can see, our first version of the unit files used a rather long winded way of setting environment variables for the container, enumerating each value we wanted to fetch. Thankfully etcd has a concept of directories, so we can get a list of key/value pairs. This became our first fleet pattern:

Configuration for a service would go in /environment/<service_name>/ and our ExecStart became:

ExecStart=/bin/sh -c '/usr/bin/docker run --rm --name watchlisthub-%i \  
    -p 5000:5000 \
    $(for e in $(etcdctl ls /environment/watchlisthub); do \
      echo "-e $(basename $e)=$(etcdctl get $e) "; \
    done) \

This lists the contents of the service's environment directory and passes -e NAME=VALUE to docker for each one. Much better, no need to list the environment variable names and update the unit files when they change.

Our next step was to normalize as many of the docker commands as possible.

systemd supports setting of environment variables which will be available to the various Exec commands, so with a sprinkling of environment variable substitution we could do:

ExecStartPre=-/usr/bin/docker kill $CONTAINER  
ExecStartPre=-/usr/bin/docker rm $CONTAINER  
ExecStart=/bin/sh -c '/usr/bin/docker run --rm --name $CONTAINER \  
    -p 5000:5000 \
    $(for e in $(etcdctl ls /environment/$APPLICATION); do \
      echo "-e $(basename $e)=$(etcdctl get $e) "; \
    done) \
ExecStop=/usr/bin/docker stop $CONTAINER  

Most lines of the unit files could now be consistent, safely updated en-masse, with the differences clearly contained in the environment variables.

Much better, but the bash loop to pull the configuration still grated a little, and managing the configuration with etcdctl felt clumsy.

All of a fluster

I wrote a small tool fluster to help abstract the etcd patterns we were using, it could be used on the CoreOS machines to read the configuration for an application, and on the client side to adjust the configuration. We added support for importing a configuration from dokku-alt, where our production services still lived. This let us continually sync the CoreOS cluster against our production configs easily.

To keep the unit files short fluster reads from our standardised APPLICATION environment variable and pulls the config from etcd.

ExecStart=/bin/sh -c '/usr/bin/docker run --rm --name $CONTAINER \  
    -p 5000:5000 \
    $(/opt/bin/fluster env docker) \

Client side we can easily list and change configuration for an application:

$ fluster config -a watchlisthub
$ fluster config set -a watchlisthub REDIS_URL=redis://...

Later we extended fluster to manage our deploys as well. fluster now allows us to run multiple versions of an application at once. This let us test the waters for a new release, and quickly take the new version down if we see errors.

Coupled with our use of versioned queues in Sidekiq we can now deploy with confidence and quickly rollback new releases without any downtime.

In the next article we'll go into more details on fluster and the fleet/docker patterns we use that allows us to run multiple application versions concurrently without creating hundreds of unit files.