19 Jan 2001 dchud   » (Journeyer)

Never thought much about making diary entries a regular thing, but the new setup enables less direct interaction with other people so maybe it's a good time to post a daily run-on or two here.

First things first. I've left the job at Yale, effective last week. Spent most of last week catching up on sleep, moving email archives around, leaving/joining lists from old/new accounts, and generally letting the transition take hold psychically. I don't really have much time to spare but in retrospect it was a good idea to allow some buffer time.

Here's the plan: I'm forming a non-profit corporation. Have the board and a lawyer and everything. The broad goal I want to work toward is making the net work much more like a big library. The purpose of the organization will be to seed/support a handful of projects which provide free pieces of that big global library infrastructure, starting with jake. The general shape of a project the company will take on is anything which enables use of functional metadata. By functional, I mean the explicit reorganization of (usually) biblbiographic information into structures which can be generically useful in an unbounded range of software or publishing projects. In support of this projects will have an information gathering component, a collective data diff/patch structure (ie open source data maintenance), and a collection of well-defined APIs and free code libraries for access.

I can explain this a bit further in the context of the jake project. The data in jake exists elsewhere... in MARC/AACR2 catalog records, in Ulrich's International Directory of Periodicals, in proprietary content services. But nowhere is this information (which is largely factual and therefore arguably public domain) either architected for modular use in a wide range of applications or freely available under an open source-style license. Basically MARC/AACR2 is difficult to hack because it lives in hard-to-hack access systems (Z39.50 doesn't scale well in today's implementations), its content rules are often implicit syntax and not explicitly tagged, and its useful metadata components (such as fields for ISSNs, ISBNs, and the like) often reference external naming systems whose content are only accessible under license (and usually not any more hackable).

For jake we're removing each of those problems by putting the information most generically useful for hacking journal access systems into a generic data structure with obvious hooks for other applications. We reference external identifiers but generate our own internally. And even though the project's only halfway to 1.0 there are people using it in ways we never predicted.

So that's the general idea. Libraries need to expose their data to the hacker community better. Hackers need to understand that much work of librarianship, such as authority control in cooperative cataloging, are absolutely vital pieces of the puzzle. By seeding a few projects that demonstrate this to both communities hopefully the company will define a niche area where immediate collaboration is necessary.

Right now I'm setting up a dedicated jake site and migrating it out of Yale. It's going fairly well but there's a lot to deal with, including a site redesign, moving the data, rewriting code to build a cleaner query environment, and timing support requests to the very gracious provider so's not to interfere with their own internal hardware upgrade. Hopefully we can shoot for the end of the month for the new site and an 0.6 release.

So I'm working at home on this stuff, along with putting together paperwork for the company. It's funny deciding which lists to subscribe to at this point. Because I don't work in a library anymore I think a lot of the lists I used to follow aren't really germane to someone whose work now revolves around thinking of the net as one big library. :)

Hmm. Feels good to blabber on here.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!