Older blog entries for raph (starting at number 242)

Ghostscript

The HEAD branch has now been merged over the DeviceN branch, but the latter is not public yet. Well Tempered Screening works just fine after the merge, and I cleaned up some bugs. Among other things, the shading code was poking into the device halftone internals. Yucch.

What's the best primitive for formal math on the Web?

There are several systems of primitives that make sense as the basis for formal math. Metamath is based on ZFC set theory, but that's far from the only primitive.

In fact, under the hood, Metamath is actually based on generic axiom systems. You don't have to start from ZFC if you don't want to; you can put in your own axioms. But then you can't use the proofs already done. This is the old problem of "runtime incompatibility" again.

Another basis seems to be predicate calculus. A major justification is that boolean-valued functions are basically equivalent to sets. The "comprehension" axiom in ZFC set theory relates the two. Dijkstra's formulation of weakest precondition semantics favors predicates over sets as the most primitive operation.

Then, of course, you have the lambda calculus. This is, in many ways, more appealing because it's closer to real programs, and also has a more concrete flavor. The Church numerals are a particularly pretty way of encoding the natural numbers.

In fact, natural numbers alone are universal. You can encode anything you like as Diophantine equations. However, these encodings are really painful. They're useful as logic or philosophy, but not really useful to make real proofs.

A good proof language for the Web may have the TIMTOWDI philosophy, and include mutiple primitives. Having more than one is redundant, but may make proofs considerably easier. For example, in any of the systems it's easy to imagine having numbers so you don't have to prove 2+2=4 every time you need to calculate.

So the criterion for including a primitive should be whether it makes proofs easier, or whether it's just as easy to do the proofs using some other primitive. I'm not sure what all the relations are here.

Another primitive I like is tree constructors, just like ML datatypes. I think that these are easier to work with than many other primitives, but on the other hand each datatype basically creates its own set of axioms, which keeps you from having the same kind of purity as in Metamath. It might be a reasonable tradeoff, though.

One reason I like tree constructors is that they keep types cleanly separated. In the ZFC formalization of mathematics, you can say {0} + {0, 1}, and it makes sense (the answer is 3, by the way). I'd like the concept of integers and the concept of sets to be obviously disjoint.

Spam and trust

John Patrick writes: The Spam Has Got To Go. Amen to that.

But what's the solution? There are a lot of things you can try: legislation, informal sanctions, blacklists, spam filters, hashcash schemes, p2p spam reporting networks, etc. Many of these are being tried. Jon Udell and others suggest that digitally signing emails will cut down on spam, but it seems to me just one more barrier for spammers to overcome. Tap into an unlimited supply of digital id's, and you just send signed spam.

I think I know how to stop spam, while preserving the wonderful openness of email. Longtime readers won't be surprised to hear that my idea is based on attack-resistant trust metrics. In fact, longtime readers are probably sick and tired of me harping on this damn trust stuff all the time. But harp I will, for the message needs to get out there. There is now research on how to build spam-resistant messaging networks. Quite a bit more work is needed to make it practical, but I think the usual technique of paying smart people to think about it has a good chance of working.

Anyone who's interested should read Ch. 7 of my thesis-in-progress.

SourceForge

I had a nice interchange today with Jacob Moorman of SourceForge, about my miffed feelings regarding the language on their peer certification page. Actually, the peer certifications are going away, but more important, I get a renewed sense of an open channel with the SourceForge guys. In fact, complaining in my diary rather than contacting them directly was unfair, and I apologize.

I really hope SourceForge manages to succeed in spite of the current harsh business climate. They're doing an immense public service to the cause of free software.

A New Kind of Science

I power-skimmed the book again tonight. There's no question about Wolfram's ego. There's precious little actual science in the book, but I'm still convinced that there's something there. Exactly what, I'm not quite sure. My latest stab at expressing his central point is: "a lot of shit is universal".

Even so, I'm not really swayed by his arguments. One of his points seems to be that people thought about universal computation ability as the exclusive province of very complex, specialized equipment, while in fact, many simple systems can express universal computation. Sure, sure. But his Turing machine construction on rule 110 requires enormously complex and specialized software (the initial state), even if the hardware is ridiculously simple compared with previous universal constructions. If you look at the sum complexity of hardware and software, then there's a good argument to be made that Wolfram has taken a big step backwards.

Distributed proof repositories

AaronSw writes: "While hashes are useful, one thing you often here in W3C circles is that local file storage should keep track of the URIs from which it got the files and cache them so. So you'd just look for the cached file with that URL instead of the one with that hash."

Glad to see the idea is resonating for other people. I'm not sure what UR? scheme is best for these proof files, but my hunch is that hashes will be important. For one, definitions, theorems, and proofs are probably a lot more immutable than most of the other objects W3C people deal with. I'm also envisioning a Napster-style propagation of referenced objects. In typical usage, I can imagine a person working on a proof in conjunction with a client which snarfs proofs and lemmas from the rest of the Web, then, when it's done, uploads everything to a static web space. Thus, URL's move around fluidly, but the hash is constant.

I think I'll write about choice of primitives over the coming days, unless people beg me to stop.

Math

My regular readers are no doubt wondering what it was that Bram and I talked about yesterday evening that was so much fun it left my diary so inarticulate. I'll try to summarize.

Rigorously formal mathematics is interesting, but has a well-earned reputation for being too painful and tedious to be really practical for anything. So far, it's been much more the domain of philosophers than engineering.

Bram and I feel that the Web has the power to change that. For the most part, tools for mechanically checking proofs have been basically been solo tools. For one person to get through a significant slice of mathematics is an ordeal. But if the work is split up among many, it's a lot more reasonable.

Collaboration requires standard file formats. Bram and I are thinking along the lines of a low-level, simple formal proof format. The main thing is to apply inference rules. Then, there's a higher level metadata format for saying things like "this is a proof of Fermat's Little Theorem using the Peano axioms for natural numbers".

The design of such languages parallels the design of computer languages. Names and namespaces are particularly important. Proofs should be written as stand-alone modules, but also so that you can build up a structure by reusing the results in other proofs. Since these file formats are intended to live on the Web, links are critically important too. In particular, you want to be able to import other namespaces. Thus, in the example above, "the Peano axioms for natural numbers" would be a link.

A link can be a URL, but it can also be an immutable hash of the content. That way, if you have the file locally, you don't need to go to the net to resolve it.

Typically, these formal systems have set theory as a primitive (Metamath uses ZFC). Other objects are encoded in the primitives. For example, the usual encoding of natural numbers is: 0 = {}, i + 1 = i \union {i}. But there are other ways of encoding numbers, some of which might make more sense. For example, if you're doing calculations, a sequence of bits might be better. Of course, all the encodings of integers are equivalent, but if two proofs use two different encodings, it might be hard to knit them together.

So I think it's important to make it easy to convert between these different conventions. There are a couple of techniques I can think of. First, you might be able to make inferences based on models: "if P is a proof of theorem T using encoding E0, and E1 is a model of E0, then (some simple transformation of T) is also true". Another way would be to parameterize proofs, so they take, say, the integers as arguments. Any encoding of the integers which satisfied the Peano axioms would do; to use a proof with a different encoding, you just prove the Peano axioms in your new encoding.

The higher-level format is useful for humans wishing to browse the space of proofs, and also for clients to automatically verify proofs. At the very least, you want to let them crawl enough of the proof web that the expansion of the proof is complete. I'd also expect the usual Web patterns to show up: search engines, directories, pages full of annotated links and commentary.

The exact choice of lower-level language is probably not that important. You can always use tools to translate more convenient proof formats into the lower-level one, or even to automate part of the process. By using a common lower-level format, people using different tools can still collaborate. One important test of this principle is whether the Metamath proofs can be so translated.

Formal logic for proofs is not new, of course. The idea of a proof repository isn't new, either. But I am unaware of anyone who has designed languages for formal proofs with the explicit goal of fostering collaboration through the Web.

This Introduction to Mathematical Logic may be rewarding to those intrigued by these ideas.

14 Jul 2002 (updated 14 Jul 2002 at 07:53 UTC) »

I've been having a lot of fun discussing with Bram the ideas mentioned yesterday. Thinking about formal math in the context of the Web is a lot of fun.

Fitz Wiki

The Fitz Wiki actually has some interesting content now, although of course it's still very new.

Metamath

Bram brought up an intriguing idea: an online repository for theorems and proofs, in which the proofs are in a formal language and can be checked automatically.

This is in many ways an easier problem than automated theorem proving, an area that's soaked up a lot of research attention without all that much to show for it. Here, we optimize for proofs written by people.

Rigorously formal proofs have a reputation for being tedious and verbose. I'm not sure they have to be, but even so I think it makes sense to store a low-level proof format in which all steps are spelled out. People developing proofs might work in a higher-level language, and have it compiled down. An appealing consequence is that people can experiment with higher-level proof systems.

There are still a lot of challenges, though. For one, mathematicians are not able to agree on a set of fundamental axioms that are "right". In fact, the debates over various axioms runs almost as hot as Perl vs Python.

For two, there are many different ways of encoding the same idea into mathematical logic. Plain-old unsigned integers, for example, readily admit Peano encodings, Church encodings, and so on. None of these logic-based are at all efficient if you're dealing with big numbers, so you probably want another encoding as a sequence of binary digits. But if you prove "2+2=4" in one encoding, how can you use the result in another?

I think this question is very analogous to runtime incompatibilities between various programming languages, or even within a single programming language. In C, you can have many different implementations of a simple list.

One of the best-developed formal proof repositories is Metamath. It deals with the runtime incompatibility problems by just choosing one. Overall, proofs don't look that scary, but I still think there are language design issues that get in the way of larger-scale use. The syntax is twisted a bit to look like math. For example, '+' looks like an infix operator. To me, it makes much more sense to write +(x, y) in the low level language, and possibly compile infix notation down to the lower level.

It's pretty easy to prototype something like this. You want search and directory capabilities to get a handle on the potentially huge number of theorems, but that's all pretty well understood in the Web world. My guess is that we'll start to see more experimentation in this area, and possibly a useful proof repository within a few years.

Fitz

The mailing list and Wiki for Fitz are now instantiated, thanks to rillian. If you're interested, you might want to check the archives; they contain some relevant discussion.

The next step is to populate the Wiki. I'll try to write a little something every day, just like I do here.

Anthrax

There's something fishy going on with the Anthrax investigation. All the evidence points to the culprit being someone hooked into the US defense establishment. At the same time, the FBI's investigation has been laughably incompetent.

Then, most remarkably, the New York Times published an editorial (reg required) talking about an individual who many believe is the culprit, but who has barely been investigated. The editorial itself doesn't name the name, but a few minutes searching on Google reveals it to be amost certainly one Steven J. Hatfill.

If this is true, it's a pretty big story. For one, it drives home the need for transparency and accountability in our government. The defense establishement has been largely exempt, because we have generally trusted that they need their secrecy to protect us from the bad guys. But if they are the bad guys, everything changes.

This story has been extremely differentially reported in the US and overseas media. Aside from this one editorial in the NYTimes, I've been able to find almost no coverage in the US mainstream press. If this is "objective journalism", fuck it, I'd much rather have Pravda. Fortunately, we have a vital independent press on the Web. I've been tracking this story through rotten.com.

I don't know that Hatfill did it, but it seems like by far the best lead the FBI has. For them not to be pursuing him vigorously, and for the US mainstream media not to be reporting the story, verges on the criminal.

Well Tempered Screening

I got my first color output today. Wanna see a sample? (note: 1.4M)

Also see these fun bug images: 1 2

Fitz

Between Antony and tor, the discussion is really heating up. I should set up a real mailing list, so it gets properly archived and searchable and so on.

Google

AaronSw pointed out that searching on "advogato" in google yields a "Google is hiring" adword. Considering the otherwise absolute radio silence from them, it's a very cool bit of feedback.

Ghostscript

Woohoo! I got my first Well Tempered Screening output this morning. If you like, you can take a look at before and after. If you want to print them, try: pngtops golfer.png | pnmtops -equalpixel -dpi 300 | lpr

WTS solves three problems. It replaces the (possibly objectionable) patterning artifacts with a finely dispersed noise. It renders angles extremely precisely, where Ghostscript's existing halftones, even in "AccurateScreens" mode, are much less accurate than Adobe's. Finally, Ghostscript uses a "tile cache". As the cell grows in size, it can overflow the tile cache. In that case, performance can degrade horribly. By contrast, WTS thresholds on the fly.

There's more work to do, but I'm quite pleased.

Trustworthy names

Name services are an important battleground for trust. The question, "who deserves the right to name X?" is a deep one, and the answer is far more social than technical.

The way DNS is being run now sucks sucks sucks. One symptom is that there are no less than four services offering non-global DNS root service.

Much of the original motivation for my thesis work was a desire to build a better name system. It's a hard problem, though. More research is needed, or so I'm fond of saying.

What if you used the eigenvector metadata engine? Assertions would be of the form "name X belongs to public key 0x12345". When more than one such assertion exists for any given name, the one with the higher confidence value wins.

Obviously, this has the risk of "balkanizing" the namespace. Certain "controversial" names would resolve differently from community to community. While this isn't a serious problem for RealNames-like keyword searching, it is a problem if you're trying to publish stable URLs.

But here's the big question: even though the system is capable of balkanization, would it be a problem in reality? It's not clear to me. People can make local decisions to reduce the ambiguity in resolving names. For one, "registrants" can choose names less likely to be challenged. For two, when choosing the trust links for your community, you can emphasize people who take the game seriously and try to bind names to the rightful owner.

If the controversy can't be resolved, then the name is simply not useful as a global resource. Is this so bad? There are certainly a ton of names owned by squatters now, and can't be used, so it's not sure that the proposal would be any worse.

I'm not convinced that this is the best naming scheme. I personally favor first-come, first-served naming policies, and my PhD research goes into some detail about how to implement those. But I do think it's worth thinking about alternatives. What we have now isn't very good, and it isn't going to get any better by itself.

MuPdf and Fitz

I took a look at tor's MuPdf. It's very unfinished, but it's still very impressive. The unhinted, antialiased text is very pretty, and performance seems to be excellent. In many ways, it's a model for what I want Fitz to be. In many other ways, Ghostscript is the model, because it's already solved the problem of rendering the full PDF imaging model.

Antony Courtney has posted an architectural goals document for Fitz, with edits from me. There's a lot more I want to write, but I don't have big gobs of time right now. Also, it's obviously getting near time to set up the public infrastructure for the project: CVS and mailing lists as bare minimum, but I also want a Wiki.

Two degrees of separation

Gene Kan died last week, probably a suicide. Sad.

Ghostscript

Developer release 7.21 is out. We're going to merge in the DeviceN code next, so this is probably an "island of stability".

Deep thinking vs not

Bram has been thinking deeply about how to efficiently move around large quantities of data in a robust p2p way. The result is BitTorrent, which is now in a good beta release.

By contrast, here we see "oopster", which is a specialization of Jxta to move around the OpenOffice binaries (which needs this kind of thing because of bloat, but that's another story). They're just beginning to think about the problems that Bram has already solved.

In the thinking world, one of the great applications of BitTorrent will probably be ISO's of free software distributions.

tromey: Cool.

sej: What you are proposing (adding transparency to PostScript) is adding nonstandard extensions to PostScript. It's possible, but I'm not sure what the real advantage is over PDF.

Jeff Darcy has written an interesting piece on protocols that contain reimplementations of TCP. Looking through the archives, Jeff has a lot of good stuff on protocols. Designing good protocols is an esoteric art, and should probably get more attention. It's nice to see writing in the open, even though Jeff works for a proprietary company.

Roughly, Jeff is to protocols as David McCusker is to trees. I think we should do a Vulcan mind-meld of the two, because I think an efficient protocol for remote access to trees is interesting. It's a central component of the Athshe vaporware project, but I freely admit that there many aspects I don't how to do right. Maybe in the absence of mind-meld technology, we can just bounce ideas off each other.

Dirt and noise followup

I wrote a few days ago about "dirt and noise". One of the things I was thinking about is the demonstration in Wolfram's book that the simple "rule 110" automaton can run a universal Turing machine. The straightforward implication is that, even for such ridiculously simple automata, if you know the initial state, you cannot predict whether a certain pattern (corresponding to the halting of the Turing machine) will ever occur. Simple universal machines are not new, but Matthew Cook's proof that rule 110 qualifies certainly pushes the simplicity envelope.

But here's a question. What if your cellular automaton didn't proceed purely mathematically, but had occasional bit errors? Does this make the halting problem go away? If so, does predicting the presence or absence of patterns become tractable?

Incidentally, the best review I've seen so far is this one, posted on the MAA website.

The book may be more interesting as art or literature rather than science. Certainly, the argument for deep equivalence in all fields of human intellectual endeavor is at the heart of Magister Ludi (Das Glasperlenspiele), by Herman Hesse.

233 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!