26 Aug 2008 nbm   » (Journeyer)

Wordpress.com scalability at WordCamp SA 2008

At WordCamp South Africa 2008, held in Cape Town yesterday, we were given a brief overview of how Wordpress.com is set up to scale.

Matt Mullenweg set the scene with some idea of just how huge Wordpress.com is.  I may mess up a few numbers mentioned, but there've been something like 6.5 billion page views on Wordpress.com since the beginning of the year, there are 3.8 million Wordpress.com hosted blogs (only Blogger is bigger), and there are 1.4 billion words in posts created on Wordpress.com.

Warwick Poole then gave us some more in-depth numbers, although pointing out that Wordpress.com was bigger than AdultFriendFinder was a pretty good and well-understood indication from the audience's reaction.  In May 2008, Wordpress.com was served 693 million page views, but this rose to 812 million page views in July.  Over 1TB of media was uploaded in May, 1.3TB in July.  In May, 417TB of traffic left the Wordpress.com data centres.  These numbers are available in the "July wrap-up" post on the Wordpress.com web log.

Apparently, across the approximately 710 servers, 10 000 web requests and 10 000 databases requests are handled per second (I wasn't intelligent to write down whether this was the average).  110 requests per second are done to Amazon's S3 storage service, while 3TB of media is cached on their own media caches.  They output 1.5TB/s (I wrote TB, so it probably is TB and not Tb.  I'm guessing this is peak). They experience approximately 5 server failures a week.

How is it put together?  They use Round Robin DNS which determines the data centre (from testing, it seems there round robin six IPs - two IPs for each of three data centres).  There it hits a load balancer using some combination of nginx, wackamole, and spread.  They use Varnish for serving at least media, and currently use Litespeed web servers.  They also use MySQL and memcached.

They use (and developed) the batcache Wordpress plugin to serve content from memcached - according to the documentation, batcache only potentially servers stale content to first-time visitors - visitors who have interacted with the web log receive up to date content.

When new media is uploaded, its existence and initial location is stored in a table.  As necessary, the other data centres will create their own local copies of that media, and update that table.  The backup media stores in the data centres are write-only - apparently nothing is ever deleted from them.

That's about all I wrote down, but there's quite a bit of information about how Wordpress.com is set up and the sort of load/traffic it has on the Wordpress.com blog and on the blogs of various employees (such as this post on nginx replacing Pound, this one on Pound, and another on varnish) giving some useful information which will probably inform some technology choices we might make at SynthaSite.

Syndicated 2008-08-24 17:13:34 from Neil Blakey-Milner

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!