Recent blog entries for lucasgonze

CC Mixter got Slashdotted today. CC Mixter is a place for 'mixversation', a kind of conversational flow of remixes. The music is all under the Creative Commons sampling license. My contribution to this project was to create the base code and proof of concept.

XSPF version 0, a playlist format that I and others have been working on for about a year, is frozen and finalized. I have reformatted the specification as an internet draft and posted the first draft of XSPF version 1. My contribution to XSPF is to lead the project.

While I was adding MusicBrainz metadata to the playlist format survey yesterday I realized that it was the only playlist format to follow best practices. It invents a minimum of new names (though there's a bit of overlap with the RSS 1.0 audio namespace). It uses very precise definitions of data elements. It doesn't abuse external namespaces. The documentation is clear and unambiguous.

Given what a disaster music metadata formats as a whole are, this is a pretty big accomplishment.

I am happy to say that my Creative Commons SMIL module seems to have gained consensus acceptance. Several playlist authors are using it, Oyez.org is applying it to all their SMIL presentations, and Creative Commons recommends it on their how-to page.

My main interest these days is playlists, particularly playlists with URLs instead of local paths in them. I realize this is a hopelessly obscure interest.

It struck me that my loose notes on different playlist formats might be useful to people who work with playlists, so I made them into a formally structured document. The results are at http://gonze.com/playlists/playlist-format-survey.html.

1 Jun 2003 (updated 1 Jun 2003 at 19:00 UTC) »

I spent the weekend looking at WASTE. I am afraid that this will be yet another instrument for whipping up anti-computer hysteria.

28 Apr 2003 (updated 29 Apr 2003 at 02:34 UTC) »

Posted a new tool to Freshmeat:

m3udo allows you to apply a command to each line of a text file in the M3U format. It is similar to the xargs command, except that it supports of number of niceties useful for batch processing. Things you might do with it include moving a collection of files to a common directory, converting all files from one format to another, or calculating FFTs of an entire album.

This is part of an ongoing project to build useful Unix utilities that are small, self contained, ultra robust, and observe conventions. That's hard to do, even when the application is as simple as m3udo, but I feel a little dumb doing something so simple.

One of the things stopping me from doing more of these is that loose utilities have a low chance of being picked up by mainstream distributions. 'lockfile', for example, spread around because it is part of procmail.

I find it incredibly satisfying to do something so concrete.

I have a comment to make on the thread about flaws in the certification system, but I can't do it because the certification system limits me to writing in this diary. Diary writing is talking to yourself, which I only do (1) in private or (2) when the voices won't leave me alone.

So, no diary.

Per Raph's comments (here) on using spanning trees to make a scalable Gnutella-like network:

====
Thus, in order to make a fully decentralized Napster-like service work, you need to do intelligent distribution of the searches. Specifically, while the search metadata needs to be distributed across all servers in the system, only a small number of servers should be needed for any one search.

Here, I'll outline a very simple approach for single- keyword searching. Assume that each server has a hash- derived ID as in Mojo Nation. Hash the keyword. All servers whose id's match the first k bits are authoritative for that keyword. If you want to query based on that keyword, you need only find a single such server and query it. If you want to publish an item containing that keyword, you need to notify all such authoritative servers.
===

Point #1 is on. The only way to reduce inefficiency is to minimize pathlengths, which means that you have to avoid random searches, which means that you have to find ways to predict which resource providers might do the best job.

Point #2 is one idea among many. The goal is right - to map resource requests to likely providers with the greatest possible accuracy. But the approach is funny, because it doesn't take into account all the possible reasons why one node should be providing resources rather than another. Maybe the serving node should be the one with the most available connection slots, or it should be the one with the highest quality data, or it should be the one that is most interested in serving the data.

My point is that improving the mapping method is a good idea, but there should be qualitative reasons for mapping to one node rather than another.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!