12 Apr 2004 slef   » (Master)

schycyroll and missing dates

I have a working schycyroll running locally, updating pages from local mirrors of RSS files. I still run it by hand, until I'm happy I've caught most of the deaths from evil invalid RSS files.

For some reason I don't understand, it doesn't like livejournal's html. I can understand it not liking advogato's old html4, but even when the lj looks like xhtml, it's refusing to parse it. I'll look at that Real Soon Now.

I had trouble with feeds that have no/invalid date stamps. If the date stamp is invalid, the mabloss library makes a new date stamp. As a result, all entries without a good stamp got a new stamp on each run, which wasn't the desired effect. My current solution is only to update the description of existing entries from the RSS, but I'm sure that will bite me later.

I guess I should look at planet and spycyroll to see what I can learn from them, now that I have a basic design based on the set theory I posted earlier. I wonder how they handle these problems and what other problems I still need to handle. From a quick glance, spycyroll doesn't seem to handle a basic problem: when do you remove old blog items?

Time to update MaBloss with a package including schycyroll (1.1.1). Next trick will be to build some planets on a public-accessible server.


Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!