6 May 2002 markpasc   » (Journeyer)

I've started planning for the next version of Stapler, in which everything old is new again under a different name and in a different place. Meanwhile the version of Stapler on my desktop and the one on the website are different, so I release the former as a "bugfix" version, 1.7.4.

One big idea (as in "What's the big idea?") will cause most of the change and provide a convenient excuse for the rest: eliminating the source-feed dichotomy. Since this is quite a big change, the next version of Stapler will, at least for now, be numbered 2.0 (0 as in "oh, boy").

Most sources required a corresponding feed, which I obviously realized since I added a "Make feed for this source" button not too long ago. However, the entire difference is a holdover from Stapler's original purpose being a feed of web comics, one of the few cases where it's better to have multiple sources in one feed.

So out go sources vs feeds--but you'll still be able to do the same thing, of course. (I'm not giving up my web comics feed yet.) Stapler 2.0 will allow users to disable writing feeds to disk independently of toggling their actual updating, and will include an "aggregate" scraper that aggregates the items of other feeds--presumably ones with disk writing turned off--into one feed. Literally where you had a feed for one source because of Stapler's design, you'll have one feed, and where you aggregated four sources into one feed for some value <dfn>four</dfn>, you'll have 4+1 feeds, only one of which has disk-writing enabled.

So maybe it's not such a hot idea, having a sourcefeed that can be sourcelike or feedlike or both; but it seems like a good idea at the moment.

In addition to that change, some things are changing name to make for (I hope) clearer nomenclature. Instead of the antiquated and scary <dfn>scraper</dfn>, feeds will have <dfn>extractors</dfn>. Instead of having <dfn>document types</dfn>, feeds will have <dfn>formats</dfn>. Those are the name changes I foresee now, but I'm sure one or two more will sneak in.

Oh, and the "ByNumbers" extractor becomes "By selector." Duh.

Ideally, of course, I would write a script that converts a 1.7.4 StaplerData table to a 2.0 one. In fact, that's how I refined the new data model, figuring out how to turn the old into the new. But I'd really rather not, since it's complicated, and anyone with custom scrapers or document types will have work to do anyway. (But then, I suppose that's actually very few people, so perhaps it is worthwhile.)

As is apparent, 2.0 is still very much in the planning stage, though it would be nice to have a copy to release 17 May, since that's the day I release version 1.0.1 last year. (I'm not sure when I released 1.0; I guess I could look it up in my blog archives, but I can't be arsed just now.) Just a heads up for y'all who actually care.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!