8 Nov 2004 Bram   » (Master)

Car Emissions

There are two concepts in car pollution which people generally get mixed up. Some exhaust gases are simply stinky and noxious, most notably particulate carbon and carbon monoxide. Those do direct damage to the humans near them and crops which grow nearby and are clearly bad. Pollutants are clearly bad and there isn't much direct economic disincentive for any one person to make their car produce less of them.

The other troublesome kind of exhaust is greenhouse gases, mostly carbon dioxide. The amount of damage caused by these is much less clear, and there's a straightforward economic disincentive to produce them, because they correspond pretty much directly to the amount of gas your car consumes. Carbon dioxide also happens to be produced in mass quantities by respiration.

If you really want to know how clean a car is, look it up on the EPA web site. There are some surprises, for example the honda civic hybrid with a manual transmission has mediocre pollution ratings.

Erasure Codes

People keep asking me about using erasure/rateless/error correcting codes in BitTorrent. It isn't done because, quite simply, it wouldn't help.

One possible benefit erasure codes is that when sending data to a peer there are so many potential pieces that you can send any random one you have and it won't be a duplicate. The problem is that the peer may already have gotten that same piece from another peer, so that benefit is destroyed, and on top of that the overhead of communicating and remembering which peer has what is increased tremendously.

Possible benefit number two is that erasure codes increase the chances that your peers won't already have the pieces which you've downloaded. But simply downloading pieces which fewer of your peers have first handles that problem quite nicely, so a vastly more complicated solution is unwarranted.

Possible benefit number three is that if there's no seed left erasure codes increase the chances that the entire file will be recoverable. In practice, when a file becomes unrecoverable it's because there was only one seed and several downloaders started from scratch, then the seed disappeared after uploading less than the total length of the file. Erasure codes obviously would not help out in that case.

There are other possible benefits and corresponding rebuttals, but they get more complicated. The short of it all is that the possible benefits of erasure codes can be had with much more straightforward and already implemented techniques, and the implementation difficulties of such codes are quite onerous.

While I'm pissing on everyone's parade, I should probably mention another scenario in which everyone wants to use erasure codes and it's a bad idea: off-site backup. If you store everything straightforwardly on each backup site, and each site has two nines (99%) uptime (if it doesn't you shouldn't be using it for backup) then the overall reliability will be six nines (99.9999%). Engineering for more than six nines is nothing but intellectual masturbation, because unforseeable problems completely dominate failure at that point. Therefore one-of-three gets great reliability with unreliable backup sites in exchange for having to store three times the amount of data you're backing up.

With erasure codes, you could make it so that each backup site only had to store half as much stuff, but that two of them would still need to be up to recover data. If you then have four backup sites, there's a savings of 1/3 of the storage versus the much more straightforward approach. This is a pretty small reduction given that the price of mass storage is very small and plummeting rapidly. It also comes at great expense: you have to deal with four backup sites instead of three, and the software is much more complicated. In systems like this, the recovery software not working is a significant part of the chances of the system as a whole failing. Also, any economic benefit of savings on disk space must be weighed against the costs of the software system which runs it. Given the ludicrous prices of backup systems these days, a much simpler albeit slightly less efficient one would probably be a great value.

ECC of course has some great uses, for example data transmission of noisy mediums and storing data on media which can get physically corrupted, and recent developments in it are very exciting, but it's very important to only use sophisticated tools when clearly warranted.

Comments?

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!