We use Docker and CoreOS extensively at Clearbit. We have over 70 different internal services running across a cluster of 18 machines on EC2. Every day our servers handle upwards of 1.5 million API calls, and that growth rate is only increasing.
All this means being able to scale our infrastructure quickly is incredibly important to the company. We've made significant upfront investments into tools to help us with this and we'd like to share our experiences to help other teams facing similar challenges.
CoreOS & Docker
Building any of this would be a lot harder without standing on the shoulders of two giants: CoreOS & Docker.
CoreOS is a stripped down operating system designed to run Docker containers. It comes bundled with a few tools: a distributed key store called etcd and a distributed services manager built on top of systemctl called fleet.
While CoreOS certainly isn't perfect, we find the tools and primitives it exposes to be at a really great abstraction level. They're very well thought out and don't assume too much about your architecture.
Docker is a compromise between using a full blown virtual machine and something like Capistrano and FTP. Docker images are fairly standalone--once you've built your application into an image you can distribute it fairly easily and it'll run consistently across your servers.
I'm going to take you through a high level look at what happens during a Clearbit deploy. Further articles in this series will go into more depth into each step of the process.
Step one: git push production
To deploy we push to the
production git remote which triggers a build on our deployment server. This takes the branch, wraps it up in a Docker image, and then pushes the Docker image to a locally hosted registry, tagging the image with the deployment's git ref.
Step two: new fleet service
The build server then starts a new fleet service which gets distributed across our cluster. This service then pulls down the relevant docker image and starts the app.
Step three: load balancer registration
Once the fleet service file has started running the docker container it registers itself with the key-value store etcd, recording its IP address and Docker assigned port. Our HAProxy container then notices this change, reloads its configuration and starts to route traffic towards the new service. At this point in time both the new and old code are running side-by-side.
Step four: promotion
If the new code is running and requests are being successfully routed to it it's time to remove the old services. We then use a custom tool called "fluster" to promote the new service and destroy any references to any older services. The old services wind down, remove their references in etcd causing HAProxy to take them out of rotation. The deploy is now complete!
In the next parts of this series we're going to really dig into all the different layers here, setting up the CoreOS cluster, configuring HAProxy, ELB and Route53 to building a deployment server. We'll dive into the nuances of CoreOS such as automatic updates and graceful restarts. We'll also be open sourcing a lot of our fleet management tools, including one called "fluster" that is a lightweight abstraction on top of etcd.