Advogato: Blog for mojotoad

It's storming outside. Sounds very nice; I have the windows cracked open so I will no doubt sleep quite soundly.

I've been retooling HTML::TableExtract in a major way. I've fixed header extractions to account for the nastiness you get from colspan and rowspan effects, so that the columns you extract are the columns you would expect when looking at the table visually. (such is the fair of those who deal with sparse trees representing grids). In more exciting realms, I've been implementing search chains which allow you to yank tables relative to other tables using lists of checkpoints, in terms headers, depths, counts, or some arbitrary chain thereof. Time permitting, the new release should be set loose within a couple of days after I've tested it to my satisfaction.

Viva data mining, HTML context free. Other than being in a table somewhere on a page, of course.

2 May 2000 mojotoad » (Journeyer)