schycyroll and missing dates
I have a working schycyroll running locally, updating pages from local mirrors of RSS files. I still run it by hand, until I'm happy I've caught most of the deaths from evil invalid RSS files.
For some reason I don't understand, it doesn't like livejournal's html. I can understand it not liking advogato's old html4, but even when the lj looks like xhtml, it's refusing to parse it. I'll look at that Real Soon Now.
I had trouble with feeds that have no/invalid date stamps. If the date stamp is invalid, the mabloss library makes a new date stamp. As a result, all entries without a good stamp got a new stamp on each run, which wasn't the desired effect. My current solution is only to update the description of existing entries from the RSS, but I'm sure that will bite me later.
I guess I should look at planet and spycyroll to see what I can learn from them, now that I have a basic design based on the set theory I posted earlier. I wonder how they handle these problems and what other problems I still need to handle. From a quick glance, spycyroll doesn't seem to handle a basic problem: when do you remove old blog items?
Time to update MaBloss with a package including schycyroll (1.1.1). Next trick will be to build some planets on a public-accessible server.