22 Jul 2006 apenwarr   » (Master)

Grand Old Unified Relational Documentation

I'm pleased that what is probably my last major innovation on behalf of NITI, the new Knowledgebase system codenamed GourD, has finally been opened to the public. Even though the code was rewritten almost from scratch after I last had my dirty fingers in it. Thank chrisk for that - it's much better now.

But I still claim credit for the idea, and even if I don't, it's a cool idea anyhow. Among the features of the new system are:

  • It's based on mediawiki, so its UI is designed for collaboration.
  • We greatly improved the search engine, whose ranking and summarization algorithms are better than any other knowledgebase I've seen (and a major improvement over standard mediawiki, too).
  • It has our whole searchable user manual in it, not just individual knowledgebase articles.
  • It has a really neat AJAX "rate this article" system that's actually slightly less obnoxious than other such systems.
  • You can subscribe to articles or whole categories, and get a nightly notification of which articles have been newly published or changed in a non-minor way: this is less confusing for newbies than RecentChanges type pages.
  • It automatically produces a remarkably accurate and useful "similar articles" list based on a variant of my document correlation algorithm that I wrote about so long ago.

But my favourite part is really:

Separation of style, content, and... structure?

So everybody knows that the big glamourous thing nowadays is HTML4, XML, CSS, and so on, the point of all of which are to be able to separate "style" from "content." That is, someone correctly noticed that most writers are poor layout artists and vice versa, so why not do the jobs separately instead of what computers normally encourage you to do, which is spend more time fiddling with fonts in Word than actually writing?

Okay, so that's sort of worked out for some people, and we get a bit closer to the ideal every time HTML makes another evolutionary step.

But anyway, what if we took it one step further? What if we split out the high-level structure of your documentation, and made it somebody else's job?

What? Okay, this sounds a bit confusing. But think of it this way. The world didn't just lose layout experts when it started forcing writers to format their own documents; the world is also losing professional copy editors as more and more people can publish their work directly and have it look half-decent. That means more and more documents are... kind of crappy.

Think of a big 500-page technical manual (our product has one, for instance). Nowadays, those 500 pages won't be written by a single person; not at all. They're written over the course of many years by several different people, where some of those people are actually joining and leaving the company while the book is evolving. The result? Inconsistent drivel, like almost all modern technical documents have become.

The rare exceptions are documents that are written quickly enough, by a single person with a good sense of style, that they're actually good. Except they rapidly get outdated and can never be very long (ie. complete and thorough) because both of those preclude having a single writer enforce his/her artistic integrity; artistic integrity requires a pass over the entire document, by a single person, in a relatively short time.

Incidentally, all this is why books like the "for dummies" series are so popular. They're written by one or two authors in a relatively short time, edited by professional copy editors, laid out by professional layout artists, and rewritten from scratch every few editions. That produces (comparatively, at least) high quality work that really beats what most companies produce.

So what I observed from all this was simple: most KB systems already abstract style from content; generally, you enter plaintext and they format it like a KB article automatically. Good, so your layout artist isn't the same as your tech writer.

But I've never seen a technical documentation system that separates content from structure. That is, there are generally a variety of writers - which is mostly unavoidable in a modern long-term development project, unless you want to rewrite from scratch all the time, which is horrendously inefficient. But there is no editor. So the writers tend to each contribute a new chunk of prose (if you're lucky, it's a whole chapter), and then some unlucky soul - possibly one of your writers - gets to tack it into the big unwieldy book somewhere, until the big unwieldy book gets so big and unwieldy that nobody could possibly read it from cover to cover, then nobody even reads it at all, then you stop including printed copies with your product and just give them a pdf, and then you kind of stop shipping the pdf and just leave it on your web site, and then eventually people don't read documentation at all and resort to forums and knowledgebase searches.

Sound familiar?

Well, GourD has the solution. Documentation is written in convenient, bite-sized chunks, by anyone who wants to contribute it. These start off life as searchable, categoriezed knowledgebase articles with various writing styles. But then our "artistic integrator" (currently apm) can go through and clean up any stylistic inconsistencies between articles - in articles that matter.

She then has the ability to create "books" out of a collection of similarly-styled articles. Where necessary, she can edit articles to make them read better in the sequence of the book. When the articles are not in book form, these changes don't disrupt anything because any confusion can be resolved by just hyperlinking terminology from one article to another or using the "similar pages" or "categories" features; welcome to the web.

The "structure" that the artistic integrator adds is through the process of weaving together a collection of articles. Imagine you had a big bucket of pearls collected over a period of time. To make a good necklace, someone has to choose pearls of similar size and shape and assemble them in the appropriate sequence, maybe polishing a few here and there. The result is beautiful, but not because the person who made the necklace spent dozens of clam-years growing pearls; no, the pearls were produced by someone else, and the final beauty comes just from combining them the right way.

You can produce a series of 100-page books on useful topics just by mixing and matching and polishing GourD (it's much more than just a "knowledgebase" when used like this) articles - and you can do it very quickly, and the artistic integrator doesn't have to write them all from scratch, so it's highly efficient and high parallelizable.

And your books will be better than any of your competitors'.

Wouldn't this be a great way to write documentation for Open Source projects?

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!