4 Aug 2002 raph   » (Master)

Proofs

I got some very nice email from Konrad Slind and Ken Friis Larsen about how to make portable proofs, and about HOL. HOL stands for Higher Order Logic, which basically means you can quantify over functions and predicates. You can't do this straightforwardly in first-order logics.

Given this power, you can make nice abstract proofs that don't depend on any particular model. Basically, your theorem is of the form "for all systems of numbers that satisfying these axioms: ...". If you want to apply the theorem to a specific model, you just plug it in, and prove each of the axioms as a theorem. The general technique is called "theory interpretation". In its most general form, it's a transformation of abstract proofs into concrete proofs expressed in terms of a particular model, but in a nice system you can do the deduction directly, by taking one instance of the (polymorphic) universal quantification. Theory interpretation is one of the ideas in QED.

HOL is based on ML, which is really a very nice language for this sort of thing. I'm not yet convinced that the whole business of "tactics" and "tacticals" are essential in a proof interchange format, but they do seem to be a good way to partially automate the process of producing proofs.

Document formats

jfleck took my bait. Ok, I will explain why I think structural markup is not always the best approach.

In the simplest model, an author creates a document in a document format, then transmits the file (now a sequence of bytes) to a recipient. The recipient is in possession of a "viewer", which is a transform from the document format into a human-readable presentation. The usual viewer displays the document on the screen or prints it out on the printer, but it doesn't have to be that way. In particular, blind people will generally want the text read aloud.

So now we have the issue of variability of the viewer transform. Some document formats (PostScript, say) nail down the presentation quite precisely, while others (HTML, say) might appear very different in different viewers. Which is better?

Many people take a religious position on this issue. But I prefer to look at it in the framework of presentation quality. You have a viewing context: paper vs screen vs audio, window size (or screen size), color vs monochrome, etc. These are all legitimate sources of variability. You also have gratuitous variability, very commonly different sets of available fonts. You can solve this variability a number of ways: restricting the allowable fonts in a document, using freely licensed fonts, or allowing "embedding" of the font's glyphs (but not the complete font) in the document. The latter option is interesting because it restricts editability.

Now you can ask the question: over the space of viewing contexts, what is the quality of the presentation. As always, there are many aspects to quality: aesthetic beauty, ease of reading text, ease of finding something in the text, consistency, etc. The art of graphic design is all about making a high quality presentation. A typical rule of thumb is to place a graphic either on the same page or the facing page as the text which describes it. But the rules of thumb often conflict, and it's a judgement call to decide which ones are more important.

And now we can address the question of structural markup vs a document format that emphasizes presentation. Especially for documents that fit the "structured" model, the former can produce reasonably good presentations across a wide variety of viewing contexts. PostScript, by contrast, can represent a stunningly beautiful presentation, but only in a very narrow context.

There are, of course, many other factors that affect quality. Of popular document formats, HTML is particularly bad when printed. It's good enough on the screen, though. In fact, it could be better for screen viewing than PostScript or PDF, because it could use screen-optimized fonts, while the latter often forces printer-optimized fonts onto the screen. In practice, though, Web browsers haven't spent as much attention on rendering quality as, say, Adobe Acrobat, so the advantage is only potential.

Thus, I hope I've conveyed why I don't think that pure structural markup is always better than pure presentation, or vice versa. The main trend seems to be toward hybrid approaches. For example, PDF 1.4 allows you to specify one layout with great precision, but also includes parallel more-or-less structural markup so that it can reflow the text for small windows, etc. This adds considerable complexity, but one can make the argument that it serves readers of the document well. Similarly, the purity of structural markup is often bent, incorporating directives and hints with the intent of improving the presentation (style sheets are one such technique). However, these hints almost always fall short of pinning down the exact layout of text on the page, so you really can't do as well in print as a presentation-centric approach.

word2pdf

The discussion of document formats reminds me of a wishlist item I haven't mentioned here before: a batch converter from Microsoft Word format(s) directly to PostScript or PDF, faithfully reproducing the original layout. Obviously, wvware goes part of the way, but the print output goes through TeX, so the formatting gets pretty badly mangled.

Doing it as a batch converter is a lot easier than a full GUI word processor. That makes it realistic to focus on faithfulness. Over time, the existence of such a batch renderer would be helpful for building high quality GUI editors, but in the near term, a lot of the time people just want to view or print the documents anyway.

I'm pretty sure that we (Artifex) could sell such a tool, as it fits in with the Ghostscript business, and we've already seen some customer interest. So, if anyone out there is seriously interested in working on something like this, get in touch.

(also see wvWare2 design document)

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!