Older blog entries for adubey (starting at number 1)

Random thoughts... does anyone else find advogato's interface a bit lacking?

Often, I find I have to click through 2-3 pages to do common things like post a diary entry. In general, I feel like it's harder than it should be to do the things I want to do, or that I do often. While this feeling might go away after I get used to it, perhaps it isn't as welcoming to new users.

Other random thoughts... sometimes people post diary entries that have interesting ideas that are worth discussing. While email is a suitable route for that, wouldn't it be cool if you could reply to those?

Even more random... first there were linear news readers, then there were "threaded" news readers. These are essentially trees, where each node has one parent. Now, the web does threaded discussions one up: we have directed graphs instead of trees. But short of a WikiWikiWeb-type CGI program, the web is geared to "read this" rather than "comment on this". Mightn't it be cool if you had some kind of "threaded" system where posts may have more than one parent? Someplace where you could bring separate discussions that are mutually relevant together instead of forever splitting them. For example, a discussion hanging off an article with one hanging off a diary entry. Ah... if I only I had the time... anyways, I'm getting far too random for my own good.

Anyways, my prob parser is stuck right now; the training data is in an ass-backwards format in which words aren't necessarily given a part-of-speech. In other words, rules are in the form Nonterminal->(Terminal|Nonterminal)* rather than Nonterminal->Terminal* | Nonterminal->Nonterminal*. Of course, I could split things up myself (by putting the grammar in CNF), but then there I will loose some generalizations (ie in AP-> NP and NP | AP-> VP and VP NP, each "and" will get a different non-terminal, but I want only one. Sometimes the POS will be different, so I can't say "always make 'and' an adjunct".) Grr... I could use a part-of-speech tagger, but then I have to 1) link the 'C' POS tagger to ML or 2) play around with the training data's nonterminals to be compatible with the tagger's tags. Alternatively, I could shell out $2500 for a good training set...

This is my first diary entry. I've just added three "projects" although I'm only actively working on two of them.

They are Simulus (a game), a really bad first attempt at making a speech interface to a software application - in this case gnumeric (this is the one that I've abandonded for now) and finally, my probabilistic parser.

My short-term goal is getting the parser to a workable state. When that's done I'll probably post another diary entry. I don't intent to touch the game until september.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!