Older blog entries for davidw (starting at number 380)

24 Feb 2010 (updated 24 Feb 2010 at 12:16 UTC) »

Google execs convicted

In an update to an earlier article I posted, it appears that the Google executives in question have been convicted:

http://www.corriere.it/salute/disabilita/10_febbraio_24/dirigenti-google-condannati_29ebaefe-2122-11df-940a-00144f02aabe.shtml (in Italian)

They were convicted for having failed to block the publication of a video showing some teenagers picking on and hitting another minor with Down's syndrome.

It will be interesting to see how Google reacts.  Apparently, the court believes that Google is criminally responsible for videos its users happen to post, which means that they would, in theory, have to personally review every video submitted to determine whether they are going to be infringing on someone's rights because of its content?

Update:

Here's a New York Times link:

http://www.nytimes.com/aponline/2010/02/24/business/AP-EU-Italy-GoogleTrial.html


Update 2:

"cate" posted a link to Google's official response: http://googleblog.blogspot.com/2010/02/serious-threat-to-web-in-italy.html

Also, it's really incredible to read the comments here (in Italian): http://vitadigitale.corriere.it/2010/02/processo_vivi_down_google_cond.html

Most of them are against this ruling, but a significant number think it's a good thing, which just goes to show that you can't put all the blame on politicians for Italy's woes: someone is voting for them, after all.

Syndicated 2010-02-24 09:10:33 (Updated 2010-02-24 11:27:01) from David's Computer Stuff Journal

3 Feb 2010 (updated 3 Feb 2010 at 23:06 UTC) »

Italy vs Google

I'm starting to notice a pattern here:

  • Google executives are on trial because some sorry excuses for human beings picked on a retarded person and posted the video to youtube: http://news.bbc.co.uk/2/hi/technology/8115572.stm - this one is simply preposterous.  Going after the execs of a company who did nothing to aid, abet, condone or in any way facilitate the abuse in question is absurd, and if extended to other industries would mean that you could pretty much attack any company whose products happened to figure in a crime somehow.  Kitchen knives, hunting rifles, golf clubs, even automobiles would seem fair game.
  • Italy is going after "user generated content" sites like Youtube and wants to force them to register with the government if they wish to operate: http://arstechnica.com/tech-policy/news/2010/02/italy-preparing-to-hold-youtube-others-liable-for-uploads.ars
  • And last but not least, this hit piece in the normally respectable Corriere della Sera: http://www.corriere.it/economia/10_gennaio_28/mucchetti_4de4be8a-0be8-11df-bc70-00144f02aabe.shtml
     - it's in Italian, but the gist of it is that Mr Mucchetti really has it in for Google because they operate out of Ireland in the EU, whereas he believes they should be registered in Italy as a publisher, and subject to Italy's myriad rules, regulations, and, of course, taxes regarding publishing.  Despite, well, not really publishing much of anything themselves. He mentions "tax evasion" charges that had been considered, because the Italian division of Google is not where the adsense revenue in Europe goes.  I suppose he figures that since the ads are bought by residents of Italy, the money should somehow stay in Italy?  He also huffs and puffs about Italy's antitrust laws, which, in the same piece, he admits were created with the express purpose of not touching existing companies (the market share limit was set higher than the share of the largest existing company).  Perhaps he would do well to reflect on political schemes and carve-ups like that and think about why companies like Google go to Ireland, rather than Italy.  He also makes some quick mentions of network neutrality, and rambles on a bit about how it's a battle between the "Obamanian, Californian, search engines" versus the telecommunications industry, in "the rest of the world and above all in Europe".  And of course he uses a liberal sprinking of keywords like "globalization", "multinational corporations", and "deregulated" to attempt to paint Google in terms of being a big, evil company throwing its weight around.  One wonders if there aren't more pressing problems with the Italian media industry, such as the prime minister owning a large chunk of it?

One way of seeing things is that politicians and businessmen in Italy noticed Google was actually making quite a bit of money, and even if they don't quite understand this internet thing, they want some of the loot.

And while Google certainly is becoming big enough to be cause for worry and discussion, the moves against them in Italy do not seem anything like a rational response calculated to offset severe failures in the market.

In any case, it will be interesting to see what happens.  Maybe, after China, we'll see Google quit Italy as well?

Syndicated 2010-02-03 21:26:25 (Updated 2010-02-03 22:26:20) from David's Computer Stuff Journal

Flippa experiment

I decided to try a little experiment with Flippa.com, a site where you can auction off domains or web sites.

I put http://www.innsbruck-apartments.com up for auction:


http://flippa.com/auctions/83341/Innsbruck-Austria-rental-listing-site---Ski-Season

We'll see how it goes and whether the site is worth using for other sites that I'd like to sell on.

It's a good test case, because it's a site I threw together years ago simply to aid our search for a new apartment in Innsbruck, and then requested by friends.

Syndicated 2010-01-25 09:34:08 (Updated 2010-01-25 09:36:38) from David's Computer Stuff Journal

12 Jan 2010 (updated 12 Jan 2010 at 15:08 UTC) »

Rough Estimates of the Dollar Cost of Scaling Web Platforms - Part I

I have been pondering the idea behind this article for a while, and finally had a bit of time to implement it.

The basic idea is this: certain platforms have higher costs in terms of memory per concurrent connection. Those translate into increased costs in dollar terms.

Nota Bene: Having run LangPop.com for some time, I'm used to people getting hot and bothered about this or that aspect of statistics that are rough in nature, so I'm going to try and address those issues from the start, with more detail below.

  • Constructive criticism is welcome. I expect to utilize it to revisit these results and improve them. Frothing at the mouth is not welcome.
  • There is something of a "comparing apples and oranges" problem inherent in doing these sorts of comparisons. As an example, Rails gives you a huge amount of functionality "out of the box", whereas Mochiweb does much less. More on that below.
  • I am not familiar with all of these systems: meaning that I may not have configured them as I should have. Helpful suggestions are, of course, welcome. Links to source code are provided below.
  • You can likely handle many more 'users' than concurrent connections, which means multiple browsers connecting to the site at the same time.
  • Programmer costs are probably higher than anything else, so more productive platforms can save a great deal of money, which more than makes up for the cost of extra memory.  There's a reason that most people, outside of Google and Yahoo and sites like that, don't use much C for their web applications.  Indeed, I use Rails myself, even though it uses a lot of memory and isn't terribly fast: I'd rather get sites out there, see how they do, and then worry about optimizing them (which is of course quite possible in Rails).

Methodology

All tests were run like so: my new laptop with two cores and four gigs of memory was used as a server, and my older laptop was used to run the ab (Apache Benchmark) program - they're connected via ethernet. I built up to successive levels of concurrency, running first 1 concurrent connection, 2, 10, and so on and so forth. The "server" computer is running Ubuntu 9.10, "karmic".

Platforms

The platforms I tested:

  • Apache 2.2, running the worker MPM, serving static files.
  • Nginx 0.7.62, serving static files.
  • Mochiweb from svn (revision 125), serving static files.
  • Jetty 6.1.20-2, serving static files.
  • Rails 2.3.5, serving up a simple template with the current date and time.
  • PHP 5.2.10.dfsg.1-2ubuntu6.3, serving up a single php file that prints the current date and time.
  • Django 1.1.1-1ubuntu1, serving up a template with the date and time.
  • Mochiweb, serving a simple template (erltl) with the date and time.
  • Jetty, serving a simple .war file containing a JSP file, with, as clever observers will have surmised, the date and time.

As stated above, it's pretty obvious that using Rails or Django for something so simple is overkill:

Better Tests for the Future

I would like to run similar tests with a more realistic application, but I simply don't have the time or expertise to sit down and write a blog, say, for all of the above platforms. If I can find a few volunteers, I'd be happy to discuss some rough ideas about what those tests ought to look like. Some ideas:

  • They should test the application framework with a realistic, real world type of example.
  • The data store should figure as little as possible - I want to concentrate on testing the application platform for the time being, rather than Postgres vs Sqlite vs Redis. Sqlite would probably be a good choice to utilize for the data store.
  • Since this first test is so minimalistic, I think a second one ought to be fairly inclusive, making use of a fair amount of what the larger systems like Rails, Django and PHP offer.
  • I'd also be interested in seeing other languages/platforms.
  • The Holy Grail would be to script all these tests so that they're very easy to run repeatably.

Results

With that out of the way, I do think the results are meaningful, and reflect something of what I've seen on various platforms in the real world.

First of all, here we look at the total "VSZ" (as ps puts it) or Virtual Size of the process(es) in memory. Much of this might be shared, between libraries, and "copy on write" where applicable.

The results are impressive: Rails, followed by Django and PHP eats up a lot of memory for each new concurrent connection. Rails, which I know fairly well, most likely suffers from several problems: 1) it includes a lot of code. That's actually a good thing if you're building a reasonably sized app that makes use of all it has to offer. 2) Its garbage collector doesn't play well with "copy on write". Which is what "Enterprise Ruby" aims to fix. Django and PHP are also fairly large, capable platforms when compared to something small and light like mochiweb.

That said, excuses aside, Erlang and Mochiweb are very impressive in how little additional memory they utilize when additional concurrent connections are thrown at them. I was also impressed with Jetty. I don't have a lot of experience with Java on the web (I work more with J2ME for mobile phones), so I expected something a bit more "bloated", which is the reputation Java has. As we'll see below, Jetty does take up a lot of initial memory, but subsequent concurrent connections appear to not take up much.  Of course, this is also likely another 'apples and oranges' comparison and it would be good to utilize a complete Java framework, rather than just a tiny web app with one JSP file.

So what's this mean in real world terms of dollars and cents? As your Rails application gets more popular, you're going to have to invest relatively more money to make it scale, in terms of memory.

For this comparison, I utilized the bytes/dollar that I'm getting for my Linode, which works out to 18,889,040.85 ($79.95 for 1440 MB a month).

As we can see, to have a similar amount of concurrent users is essentially free for Mochiweb, whereas with Rails, it has a significant cost.  This information is particularly relevant when deciding how to monetize a site: with Erlang and Jetty it would appear that scaling up to lots of users is relatively cheap, so even a small amount of revenue per user per month is going to be a profit, whereas with Rails, scaling up to huge numbers of users is going to be more expensive, so revenue streams such as advertising may not be as viable.  It's worth noting that 37 signals, the company that created Rails, is a vocal supporter of charging money for products.

There's another interesting statistic that I wanted to include as well.  The previous graph shows the average cost per additional concurrent user, but this one shows how much the platform costs (using  when there is just one user, so it acts as a sort of baseline:

As we can see, Jetty is particularly expensive from this point of view.  The default settings (on Ubuntu) seem to indicate that, for instance, the basic $20 a month Linode package would not be sufficient to run Jetty, plus a database, plus other software.  I think that the Apache Worker number is off a bit, and may reflect settings made to handle a large number of connections, or perhaps a different MPM would make sense.

Source Code / Spreadsheet

The spreadsheet I put together is here: http://spreadsheets.google.com/ccc?key=0An76R90VwaRodElEYjVYQXpFRmtreGV3MEtsaWYzbXc&hl=en

And the source code (admittedly not terribly well organized) is here: http://github.com/davidw/marginalmemory/

Syndicated 2010-01-12 11:39:37 (Updated 2010-01-12 14:20:27) from David's Computer Stuff Journal

Detecting BlackBerry JDE Version

Recently, I went back and added some preprocessor code (it's pretty much necessary in the world of J2ME) to ensure that Hecl would compile with older versions of the BlackBerry JDE. However, I also faced a problem: how to figure out what version of the JDE we're using. It could be my latest cold clouding my mind, but I couldn't find a simple way to do this. It never seems to be simple with the BlackBerry platform, unfortunately.

I did, however, finally find a nice way to obtain this information programmatically: the bin/rapc.jar file, which ships with the JDE, contains a file called app.version, which, indeed, contains the version of the JDE in use. I hacked up this code to read it and print it out:

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!