markpasc is currently certified at Journeyer level.

Name: Mark Paschal
Member since: 2001-12-20 05:59:08
Last Login: N/A

FOAF RDF Share This


Recent blog entries by markpasc

Syndication: RSS 2.0

I don't think I've mentioned it, but my current project is winget, a Windows port of GNU wget with an actual Windows interface. This is about the biggest thing I've done to date, being an actual software project, so I'm very pleased with even just how much I've done so far!

Kit 1.1.6 is out. It adds a Radio to the Past form bit to the weblog post page, and incorporates a couple minor minor fixes I'm going to let the Kit page claim I released as 1.1.5.

I've not been spending a lot of time in Radio-land lately, and will have to carefully consider it, since I may be ditching Windows in the not too distant future. I've invested enough in Radio that I should probably keep using it, but sunk time is a bad decision-making factor.

24 May 2002 (updated 24 May 2002 at 06:24 UTC) »

The next version of Stapler is chock-full (chockful?) of HTTP headery goodness.

So find some more bugs so I can put it out.

The headers in question are If-Modified-Since and User-Agent. Stapler identifies itself to the server as Stapler/x.y.z, and remembers the Last-Modified and Date headers (actually, all of them) so it can parrot it back for a 304 Not Modified as the spec suggests. Voila:

x.y.z.w - - [24/May/2002:01:58:29 -0400] "GET / HTTP/1.0" 304 - "-" "Stapler/2.0.1"

Next step would be to honor robots.txt files. Suppose I should put a referrer in, too, hmm. Might also be nice to say I'm using HTTP/1.1, but I'm not sure if I can.

23 May 2002 (updated 23 May 2002 at 05:43 UTC) »
Radio to the Whatever

It's rather depressing to find such a showstopping bug in Kit's Radio to the Past tool. I hadn't heard about it and didn't realize it was there, so that means no one whosoever used the thing and had the decency to drop me a note about it. After all the noise in the groups about it I figured someone might at least try the thing... but not so.

I've started planning for the next version of Stapler, in which everything old is new again under a different name and in a different place. Meanwhile the version of Stapler on my desktop and the one on the website are different, so I release the former as a "bugfix" version, 1.7.4.

One big idea (as in "What's the big idea?") will cause most of the change and provide a convenient excuse for the rest: eliminating the source-feed dichotomy. Since this is quite a big change, the next version of Stapler will, at least for now, be numbered 2.0 (0 as in "oh, boy").

Most sources required a corresponding feed, which I obviously realized since I added a "Make feed for this source" button not too long ago. However, the entire difference is a holdover from Stapler's original purpose being a feed of web comics, one of the few cases where it's better to have multiple sources in one feed.

So out go sources vs feeds--but you'll still be able to do the same thing, of course. (I'm not giving up my web comics feed yet.) Stapler 2.0 will allow users to disable writing feeds to disk independently of toggling their actual updating, and will include an "aggregate" scraper that aggregates the items of other feeds--presumably ones with disk writing turned off--into one feed. Literally where you had a feed for one source because of Stapler's design, you'll have one feed, and where you aggregated four sources into one feed for some value <dfn>four</dfn>, you'll have 4+1 feeds, only one of which has disk-writing enabled.

So maybe it's not such a hot idea, having a sourcefeed that can be sourcelike or feedlike or both; but it seems like a good idea at the moment.

In addition to that change, some things are changing name to make for (I hope) clearer nomenclature. Instead of the antiquated and scary <dfn>scraper</dfn>, feeds will have <dfn>extractors</dfn>. Instead of having <dfn>document types</dfn>, feeds will have <dfn>formats</dfn>. Those are the name changes I foresee now, but I'm sure one or two more will sneak in.

Oh, and the "ByNumbers" extractor becomes "By selector." Duh.

Ideally, of course, I would write a script that converts a 1.7.4 StaplerData table to a 2.0 one. In fact, that's how I refined the new data model, figuring out how to turn the old into the new. But I'd really rather not, since it's complicated, and anyone with custom scrapers or document types will have work to do anyway. (But then, I suppose that's actually very few people, so perhaps it is worthwhile.)

As is apparent, 2.0 is still very much in the planning stage, though it would be nice to have a copy to release 17 May, since that's the day I release version 1.0.1 last year. (I'm not sure when I released 1.0; I guess I could look it up in my blog archives, but I can't be arsed just now.) Just a heads up for y'all who actually care.

10 older entries...


markpasc certified others as follows:

  • markpasc certified markpasc as Apprentice
  • markpasc certified jimw as Master
  • markpasc certified aaronsw as Journeyer
  • markpasc certified rcaden as Apprentice

Others have certified markpasc as follows:

  • markpasc certified markpasc as Apprentice
  • jimw certified markpasc as Apprentice
  • josh certified markpasc as Apprentice
  • mdekkers certified markpasc as Apprentice
  • myelin certified markpasc as Journeyer
  • whytheluckystiff certified markpasc as Journeyer
  • rcaden certified markpasc as Apprentice
  • aaronsw certified markpasc as Apprentice
  • rvr certified markpasc as Journeyer
  • aramin196 certified markpasc as Apprentice
  • dltxprt certified markpasc as Journeyer

[ Certification disabled because you're not logged in. ]

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

Share this page