29 May 2008 nbm   » (Journeyer)

Google I/O: Underneath the covers at Google

This session deserves a much longer post, but I just wanted to put down the most interesting stuff quick.  Basically, a back-end developers guide of how Google is put together - from how a request that someone does in a browser gets a response to how those responses are put together from multiple sources and how those sources are built up.

Everyone knows Google's love of lots of commodity hardware for their servers, but it was interesting to hear some other things - reasonably low-end networking gear too.  Otherwise, that they've back where they started in terms of machines without cases shoved into in-house-designed racks.  The scale has changed dramatically, of course.

"If you have 10k servers, expect to lose 10 a day..."

GFS's masters are same server hardware as slaves - take part in master election like any other machine.  Google puts "millions" of pages together in a GFS "file", since it uses 64MB chunks.  200+ clusters, many of them 1000s of machines, pools of 1000s of clients.  4+PB filesystems, 40GB/s read/write load (even while HW is failing constantly).

MapReduce usage within Google is growing fast - 700 new applications in a recent month at peak, currently around 10k applications.  From 171k MapReduce jobs in March 2006 to 2.2 million jobs in September 2007.  MapReduce is very optimised to keep jobs near the data they need to conserve precious network speed within the datacentre.

Google still has one large shared source base(!), from low-level libraries used by anything to domain-specific libraries to applications.  Benefits are that it's easy to find examples of usage of something so you can use it correctly, and to reuse (ie, as a library).  Drawbacks being that such reuse causes some fairly tangled dependencies.

Language usage at Google: C++ for all high-performance, commonly-accessed web stuff.  Java is used for less-performance-oriented and/or lower-volume applications.  Python is used behind the scenes for things like configuration, administration, &c.

Syndicated 2008-05-29 01:39:39 from Cosmic Seriosity Balance

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!