2 May 2000 mojotoad   » (Journeyer)

It's storming outside. Sounds very nice; I have the windows cracked open so I will no doubt sleep quite soundly.

I've been retooling HTML::TableExtract in a major way. I've fixed header extractions to account for the nastiness you get from colspan and rowspan effects, so that the columns you extract are the columns you would expect when looking at the table visually. (such is the fair of those who deal with sparse trees representing grids). In more exciting realms, I've been implementing search chains which allow you to yank tables relative to other tables using lists of checkpoints, in terms headers, depths, counts, or some arbitrary chain thereof. Time permitting, the new release should be set loose within a couple of days after I've tested it to my satisfaction.

Viva data mining, HTML context free. Other than being in a table somewhere on a page, of course.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!