Rough Estimates of the Dollar Cost of Scaling Web Platforms - Part I
I have been pondering the idea behind this article for a while, and finally had a bit of time to implement it.
The basic idea is this: certain platforms have higher costs in terms of memory per concurrent connection. Those translate into increased costs in dollar terms.
Nota Bene: Having run LangPop.com for some time, I'm used to people getting hot and bothered about this or that aspect of statistics that are rough in nature, so I'm going to try and address those issues from the start, with more detail below.
Constructive criticism is welcome. I expect to utilize it to revisit these results and improve them. Frothing at the mouth is not welcome.
There is something of a "comparing apples and oranges" problem inherent in doing these sorts of comparisons. As an example, Rails gives you a huge amount of functionality "out of the box", whereas Mochiweb does much less. More on that below.
I am not familiar with all of these systems: meaning that I may not have configured them as I should have. Helpful suggestions are, of course, welcome. Links to source code are provided below.
You can likely handle many more 'users' than concurrent connections, which means multiple browsers connecting to the site at the same time.
Programmer costs are probably higher than anything else, so more productive platforms can save a great deal of money, which more than makes up for the cost of extra memory. There's a reason that most people, outside of Google and Yahoo and sites like that, don't use much C for their web applications. Indeed, I use Rails myself, even though it uses a lot of memory and isn't terribly fast: I'd rather get sites out there, see how they do, and then worry about optimizing them (which is of course quite possible in Rails).
All tests were run like so: my new laptop with two cores and four gigs of memory was used as a server, and my older laptop was used to run the ab (Apache Benchmark) program - they're connected via ethernet. I built up to successive levels of concurrency, running first 1 concurrent connection, 2, 10, and so on and so forth. The "server" computer is running Ubuntu 9.10, "karmic".
The platforms I tested:
Apache 2.2, running the worker MPM, serving static files.
Nginx 0.7.62, serving static files.
Mochiweb from svn (revision 125), serving static files.
Jetty 6.1.20-2, serving static files.
Rails 2.3.5, serving up a simple template with the current date and time.
PHP 5.2.10.dfsg.1-2ubuntu6.3, serving up a single php file that prints the current date and time.
Django 1.1.1-1ubuntu1, serving up a template with the date and time.
Mochiweb, serving a simple template (erltl) with the date and time.
Jetty, serving a simple .war file containing a JSP file, with, as clever observers will have surmised, the date and time.
As stated above, it's pretty obvious that using Rails or Django for something so simple is overkill:
Better Tests for the Future
I would like to run similar tests with a more realistic application, but I simply don't have the time or expertise to sit down and write a blog, say, for all of the above platforms. If I can find a few volunteers, I'd be happy to discuss some rough ideas about what those tests ought to look like. Some ideas:
They should test the application framework with a realistic, real world type of example.
The data store should figure as little as possible - I want to concentrate on testing the application platform for the time being, rather than Postgres vs Sqlite vs Redis. Sqlite would probably be a good choice to utilize for the data store.
Since this first test is so minimalistic, I think a second one ought to be fairly inclusive, making use of a fair amount of what the larger systems like Rails, Django and PHP offer.
I'd also be interested in seeing other languages/platforms.
The Holy Grail would be to script all these tests so that they're very easy to run repeatably.
With that out of the way, I do think the results are meaningful, and reflect something of what I've seen on various platforms in the real world.
First of all, here we look at the total "VSZ" (as ps puts it) or Virtual Size of the process(es) in memory. Much of this might be shared, between libraries, and "copy on write" where applicable.
The results are impressive: Rails, followed by Django and PHP eats up a lot of memory for each new concurrent connection. Rails, which I know fairly well, most likely suffers from several problems: 1) it includes a lot of code. That's actually a good thing if you're building a reasonably sized app that makes use of all it has to offer. 2) Its garbage collector doesn't play well with "copy on write". Which is what "Enterprise Ruby" aims to fix. Django and PHP are also fairly large, capable platforms when compared to something small and light like mochiweb.
That said, excuses aside, Erlang and Mochiweb are very impressive in how little additional memory they utilize when additional concurrent connections are thrown at them. I was also impressed with Jetty. I don't have a lot of experience with Java on the web (I work more with J2ME for mobile phones), so I expected something a bit more "bloated", which is the reputation Java has. As we'll see below, Jetty does take up a lot of initial memory, but subsequent concurrent connections appear to not take up much. Of course, this is also likely another 'apples and oranges' comparison and it would be good to utilize a complete Java framework, rather than just a tiny web app with one JSP file.
So what's this mean in real world terms of dollars and cents? As your Rails application gets more popular, you're going to have to invest relatively more money to make it scale, in terms of memory.
For this comparison, I utilized the bytes/dollar that I'm getting for my Linode, which works out to 18,889,040.85 ($79.95 for 1440 MB a month).
As we can see, to have a similar amount of concurrent users is essentially free for Mochiweb, whereas with Rails, it has a significant cost. This information is particularly relevant when deciding how to monetize a site: with Erlang and Jetty it would appear that scaling up to lots of users is relatively cheap, so even a small amount of revenue per user per month is going to be a profit, whereas with Rails, scaling up to huge numbers of users is going to be more expensive, so revenue streams such as advertising may not be as viable. It's worth noting that 37 signals, the company that created Rails, is a vocal supporter of charging money for products.
There's another interesting statistic that I wanted to include as well. The previous graph shows the average cost per additional concurrent user, but this one shows how much the platform costs (using when there is just one user, so it acts as a sort of baseline:
As we can see, Jetty is particularly expensive from this point of view. The default settings (on Ubuntu) seem to indicate that, for instance, the basic $20 a month Linode package would not be sufficient to run Jetty, plus a database, plus other software. I think that the Apache Worker number is off a bit, and may reflect settings made to handle a large number of connections, or perhaps a different MPM would make sense.
Source Code / Spreadsheet
The spreadsheet I put together is here: http://spreadsheets.google.com/ccc?key=0An76R90VwaRodElEYjVYQXpFRmtreGV3MEtsaWYzbXc&hl=en
And the source code (admittedly not terribly well organized) is here: http://github.com/davidw/marginalmemory/
Syndicated 2010-01-12 11:39:37 (Updated 2010-01-12 14:20:27) from David's Computer Stuff Journal