23 Jun 2002 raph   » (Master)


Most of the evening was spent in the ER, because Max had burned himself with a pair of tongs. As far as we can tell, he stuck them in the flame on the stove, then touched them to his cheek. We spent over two hours waiting, then the doctor looked at him for about 10 seconds. In any case, it was only a partial thickness burn, so with some antibiotic cream it should heal up nicely.

Stamp idea from Bram

Bram gave me a really nice gift this afternoon: an optimization for the bandwidth and storage consumed by stamps. Basically, his idea starts with stamps of value 2^k, then lets them get split in half up to k times.

In my previous thinking, a stamp contains some random bytes, an issuer id, connection info for the issuer (ip address and port), and an expiration time. When a node creates a new stamp, it uses /dev/random or whatever to make the bytes. Then it stores the stamp in its own database, and "issues" it to a peer, who may then trade it around the net. Finally, it gets sent back to the issuer for "redeeming". At that point, the issuer checks to see that the bytes are in the database (and haven't been redeemed already by someone else).

Bram's idea is simple (one of those "why didn't I think of that myself" ones). Stamp entropy is the root of a k-high tree. For each node in the tree with entropy k, the left child is AES_x(0), and the right child is AES_x(1). Now, instead of an opaque random block, you have a tree id (can be random; is shared by all nodes in the tree), a path (sequence of 0 and 1 bits to choose left and right nodes, respectively), and the result of the hash.

It's a nice technique, and it'll almost certainly be in the prototype when I finally release it. But what makes it a nice gift is that Bram obviously understands what stamps are all about and has been thinking about them.

Haiku licensing

Aaron Swartz has beautiful haiku explaining free software licenses (there's another one linked, but it's not relevant to free software):

MIT: take my code with you / and do whatever you want / but please don't blame me
LGPL: you can copy this / but make modified versions / free in source code form
GPL: if you use this code / you and your children's children / must make your source free

Aaron has placed these in the public domain. In other words:

I could even lie / and say I wrote it myself / the author cares not

Big companies

Somebody (I don't remember who, they've scrolled off the recentlog) said "that large corporations and free software are interested in" me. Sure, I can't help the fact that my altruism benefits these kinds of entities in addition to real people, but the fact doesn't excite me. When dealing with business entities, I prefer a business relationship; I provide them with something of value, they pay me money.

Free software licenses don't promote this goal, but dual-licensed GPL libraries are consistent with it. I recommend the approach highly, and plan to do most of my future work under it.

Graphs are not a precise model for software food chains

David McCusker writes:

Another exception related to performance occurs with caching. A reverse proxy server depends on an origin web server for content, and yet can serve that content faster than the origin web site when either of two things occurs. First, the proxy might be on a better network than the origin web site. And second, the proxy might have content cached most directly in the form for serving.

Don't get me wrong, I'm not overly attached to my theory, but I'm not sure this counts as an exception. Yes, the proxy depends on the origin server for content, but it might also make sense to say that the origin server depends on the proxy for efficient distribution of that content. Here, I don't think a graph edge says enough about the relationship.

Fault-tolerant techniques like RAID are a clear example, though. Here, the dependency relationship is a clean edge. The disk is clearly at a lower level than the RAID cluster, yet disk failure is hidden. Disk bandwidth has a linear relationship with bandwidth at the higher level, but the cluster has better bandwidth than the individual disk. When it comes to latency, though, everything is hard-limited by the disk.

There's another aspect in which simple graphs aren't a powerful enough model. A simple Unix utility may have only two dependencies: to a kernel, and to a C implementation. However, the kernel could be Linux or BSD, and the compiler could be (say) gcc or lcc. So we have four edges, but they're not all the same. The real structure is two groups of two.

An analogous situation occurs in food webs. To a first approximation, food sources are interchangeable. But again, there is structure. Humans need both starch and protein. It doesn't matter whether the starch is wheat or rice, but a diet of starch and meat is considerably more complete than one of two starches.

So a graph is an interesting starting place, I think, but it is an oversimplification. It would be very interesting to see whether people trying to apply network theory to other domains are developing more sophisticated models. I'm not aware of any.

Psyco vs Parrot

I feel that my last entry was a bit unfair. After all, Psyco and Parrot are both speculative projects with the potential to vastly improve the performance of dynamic scripting langauges, but also with a significant risk of failure.

Even so, my gut feeling is that Parrot has a solid chance of success, while Psyco is much more speculative. Parrot is based on fairly conservative ideas. There's a lot of literature on virtual machines, and the Parrot people seem fluent in it. By contrast, the Psyco webpage and distribution contain absolutely no references to any related work (that I could find). Maybe it's just my academic background, but it strikes me as a red flag.

In any case, I wish both the Parrot and Psyco teams huge success. Having to choose between multiple high-performance dynamic languages would be a good problem to have.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!