Recent blog entries for jmeskill

Blue/Green Deploys with Kubernetes and Amazon ELB

At Octoblu, we deploy very frequently and we’re tired of our users seeing the occasional blip when a new version is put into production.

Though we’re using Amazon Opsworks to more easily manage our infrastructure, our updates can take a while for dependencies to be installed before the service restarts – not a great experience.

Enter Kubernetes.

We knew that moving to an immutable infrastructure approach would help us deploy our apps, which range from extremely simple web services, to complex near-real-time messaging systems, quicker and easier.

Containerization is the future of app deployment, but managing and scaling a bunch of Docker instances, managing all the port mappings, is not a simple proposition.

Kubernetes simplified that part of our deployment strategy. However, we still had a problem, while Kubernetes is spinning up new versions of our docker instances, we could enter a state where old and new versions were in the mix. If we shut down the old before bringing up the new, we would also have a brief (sometimes not so brief) period of downtime.

Blue/Green Deploys

I first read about Blue/Green deploys in Martin Fowler’s excellent article BlueGreenDeployment, a simple, but powerful concept. We started to build out a way to do this in Kubernetes. After some complicated attempts, we came up with a simple idea: use Amazon ELBs as the router. Kubernetes handles the complexities of routing your request to the appropriate minion by listening to a given port on all minions, making ELB load balancing a piece of cake. Have the ELB listen on port 80 and 443, then route the request to the Kubernetes port on all minions.

Blue or Green?

The next problem was figuring out whether blue or green is currently active. Another simple idea, store a blue port and a green port as tags in the ELB and look at the current configuration of the ELB to see which one is currently live. No need to store the value somewhere that may not be accurate.

Putting it all together.

We currently use a combination of Travis CI and Amazon CodeDeploy to kick off the blue/green deploy process.

The following is part of a script that runs on our Trigger Service deploy. You can check out the code on GitHub if you want to see how it all works together.

I’ve added some annotation to help explain what is happening.

#!/bin/bash

SCRIPT_DIR=`dirname $0`
DISTRIBUTION_DIR=`dirname $SCRIPT_DIR`

export PATH=/usr/local/bin:$PATH
export AWS_DEFAULT_REGION=us-west-2

# Query ELB to get the blue port label
BLUE_PORT=`aws elb describe-tags --load-balancer-name triggers-octoblu-com | jq '.TagDescriptions[0].Tags[] | select(.Key == "blue") | .Value | tonumber'`

# Query ELB to get the green port label
GREEN_PORT=`aws elb describe-tags --load-balancer-name triggers-octoblu-com | jq '.TagDescriptions[0].Tags[] | select(.Key == "green") | .Value | tonumber'`

# Query ELB to figure out the current port
OLD_PORT=`aws elb describe-load-balancers --load-balancer-name triggers-octoblu-com | jq '.LoadBalancerDescriptions[0].ListenerDescriptions[0].Listener.InstancePort'`

# figure out if the new color is blue or green
NEW_COLOR=blue
NEW_PORT=${BLUE_PORT}
if [ "${OLD_PORT}" == "${BLUE_PORT}" ]; then
  NEW_COLOR=green
  NEW_PORT=${GREEN_PORT}
fi

export BLUE_PORT GREEN_PORT OLD_PORT NEW_COLOR NEW_PORT

# crazy template stuff, don't ask.
#
# Some people, when confronted with a problem,
# think "I know, I'll use regular expressions."
# Now they have two problems.
# -- jwz
REPLACE_REGEX='s;(\\*)(\$([a-zA-Z_][a-zA-Z_0-9]*)|\$\{([a-zA-Z_][a-zA-Z_0-9]*)\})?;substr($1,0,int(length($1)/2)).($2&&length($1)%2?$2:$ENV{$3||$4});eg'
perl -pe $REPLACE_REGEX $SCRIPT_DIR/triggers-service-blue-service.yaml.tmpl > $SCRIPT_DIR/triggers-service-blue-service.yaml
perl -pe $REPLACE_REGEX $SCRIPT_DIR/triggers-service-green-service.yaml.tmpl > $SCRIPT_DIR/triggers-service-green-service.yaml

# Always create both services
kubectl delete -f $SCRIPT_DIR/triggers-service-${NEW_COLOR}-service.yaml
kubectl create -f $SCRIPT_DIR/triggers-service-${NEW_COLOR}-service.yaml

# destroy the old version of the new color
kubectl stop rc -lname=triggers-service-${NEW_COLOR}
kubectl delete rc -lname=triggers-service-${NEW_COLOR}
kubectl delete pods -lname=triggers-service-${NEW_COLOR}
kubectl create -f $SCRIPT_DIR/triggers-service-${NEW_COLOR}-controller.yaml

# wait for Kubernetes to bring up the instances properly
x=0
while [ "$x" -lt 20 -a -z "$KUBE_STATUS" ]; do
   x=$((x+1))
   sleep 10
   echo "Checking kubectl status, attempt ${x}..."
   KUBE_STATUS=`kubectl get pod -o json -lname=triggers-service-${NEW_COLOR} | jq ".items[].currentState.info[\"triggers-service-${NEW_COLOR}\"].ready" | uniq | grep true`
done

if [ -z "$KUBE_STATUS" ]; then
  echo "triggers-service-${NEW_COLOR} is not ready, giving up."
  exit 1
fi

# remove the port mappings on the ELB
aws elb delete-load-balancer-listeners --load-balancer-name triggers-octoblu-com --load-balancer-ports 80
aws elb delete-load-balancer-listeners --load-balancer-name triggers-octoblu-com --load-balancer-ports 443

# create new port mappings
aws elb create-load-balancer-listeners --load-balancer-name triggers-octoblu-com --listeners Protocol=HTTP,LoadBalancerPort=80,InstanceProtocol=HTTP,InstancePort=${NEW_PORT}
aws elb create-load-balancer-listeners --load-balancer-name triggers-octoblu-com --listeners Protocol=HTTPS,LoadBalancerPort=443,InstanceProtocol=HTTP,InstancePort=${NEW_PORT},SSLCertificateId=arn:aws:iam::822069890720:server-certificate/startinter.octoblu.com

# reconfigure the health check
aws elb configure-health-check --load-balancer-name triggers-octoblu-com --health-check Target=HTTP:${NEW_PORT}/healthcheck,Interval=30,Timeout=5,UnhealthyThreshold=2,HealthyThreshold=2

Oops happens!

Sometimes Peter makes a mistake. We have to quickly rollback to a prior version. If it is the off-cluster, rollback is as simple as re-mapping the ELB to forward to the old ports. Sometimes Peter tries to fix his mistake with a new deploy and now we have a real mess.

Because this happened more than once, we created oops. Oops allows us to instantly rollback to the off cluster, simply by executing oops-rollback, or quickly re-deploy a previous version oops-deploy git-commit.

We add an .oopsrc to all our apps that looks something like this:

{
"elb-name": "triggers-octoblu-com",
"application-name": "triggers-service",
"deployment-group": "master",
"s3-bucket": "octoblu-deploy"
}

oops list will show us all available deployments.

We are always looking for ways to get better results, if you have some suggestions, let us know.

Syndicated 2015-06-13 17:31:22 from Jade Meskill

Programming Philosophy

A few months back I started an internal weekly mailing list at Octoblu sharing my views on Programming Philosophy. I want to share those ideas a little more broadly and get some new perspectives.

Programming is a deeply creative and philosophical work, unfortunately we don’t share our beliefs widely enough.

You can join the Facebook Group, or I just launched a newsletter that anyone can subscribe to. You can subscribe to Programming Philosophy, or check out the Programming Philosophy Archive.

Syndicated 2015-02-02 08:16:23 from Jade Meskill

What’s in a name? Judd Jacobs.

I’ve been working with a very interesting individual lately, Judd Jacobs. He always has something clever to say.

We’re walking off a salad cliff… together

Suppressed is like depressed but they got sus’d instead

Always remember the name Judd Jacobs for your comedic sayings needs.

Syndicated 2014-03-07 22:56:20 from Jade Meskill

Twenty-Something Theses of Autonomy

I believe that a radically different organization than what exists in the world today. In order to build the new economy (and thus a new world) our ideas of how an organization works must be challenged (“You can’t make an omelete [sic] without nuking the existing social order“). A keystone of this “new way” is Autonomy. In order to get the best results, Freedom is essential. I have begun the process of capturing my theory in my “Twenty-Something Theses of Autonomy.” This list will evolve as I expand on each of these Theses, however, I want to begin the improvement process now by starting a discussion.

Do you see anything obvious missing? What has your experience taught you? Let’s talk.

Twenty-Something Theses of Autonomy

  • Customer Delight Cannot Exceed Worker Delight
  • Fully Engaged + Fully Present = Fully Human
  • Humans Own Outcomes
  • Creativity Seeks Free Spirits
  • Nonlinear Innovation Needs Creativity
  • Innovation Breeds Failure Breeds Innovation
  • Community Improves Results (and Expedites Failure)
  • Fear is the Org Killer
  • Telling Triples Turnover
  • Demanding Delivers Dummies
  • Teams Solve Difficult Problems
  • Autonomy Trumps Hegemony
  • Ivory Towers Are For Wizards (and Look Where That Got Saruman)
  • Only Gamblers Pick Winners
  • Diversity Wins
  • The Best Ideas are at the Market
  • Heterogenous Systems Increase Effectiveness (over time)
  • Simple is Better
  • Maximize Laziness
  • Effort is Expensive
  • Results > Effort
  • Only Results Matter
  • Adults Come to Work
  • Team = Product
  • (Team + Product)^n = Organization
  • Leaders Don’t Manage
  • Results Cover a Multitude of Sins

Syndicated 2014-01-27 15:25:24 from Jade Meskill

Gangplank on Arizona Horizon

I appeared on Arizona Horizon on PBS on Wednesday. You can watch the video here.

Syndicated 2013-09-14 04:52:50 from Jade Meskill

Hacking The Future of Humanity at TEDxLivermore

On June 8th, 2013 I spoke at TEDxLivermore about Hacking the Future of Humanity. The videos were recently published and now you can watch it for yourself.