Older blog entries for raph (starting at number 248)

rillian: I'm not really trying to argue that the AFPL is a superior license to the GPL. The question is whether AFPL licensed software is, literally, an unspeakable evil. I think there's a pretty strong case that it's not.

In any case, the GPL won and the AFPL lost. As a result, software is freer, but it's harder to make a living doing it. On balance, this is probably a good thing. Anyway, it's certainly not the case that the AFPL by itself would guarantee a good living wage to developers. In fact, the model of GPL'ed libraries + proprietary licensing seems to work as well as the AFPL, if not better. he discussions with RMS continue.

In any event, the discussions with RMS continue.

Philanthropy

Now that the vast majority of "business models" for making money doing free software have been discredited, good ole fashion philanthropy is starting to look more like a viable option. You could even say that a good deal of free software has been financed this way - of the billion dollars or so the US spends subsidizing advanced computer science research, at least a percent gets embezzled towards free software development. In our world, that's good money.

I'll note that Advogato itself is supported in this way: the server is hosted at Berkeley's XCF. I'm not sure exactly who pays for the bandwidth, but it's probably some grant or other.

Recently, I've noticed quite a few software projects and services are begging for money: OPN, kuro5hin, LWN on the services side, and Dillo as one example of software.

I think this is actually a reasonable model. The money brought in from voluntary contributions is very small by proprietary standards, but maybe enough to keep a lean, mean free software project going.

The biggest problem with the model is the lack of connection between merit and funding. The main factors in allocating funds are fundraising ability and visibility. Again, I see this as just a fact, rather than something that needs fixing (or can be fixed).

But here's a thought I had today. How much money do you think Microsoft spends preventing free software? The roughly half-million "donated" to Peru almost certainly falls into this category. If there were an equal amount allocated among the most worthy projects, it would make a big difference.

Meetings

I met with my advisor, Alex Aiken, today, and had dinner with Peter Deutsch (I'm staying in his guest cottage as I type). Tomorrow, I'll attend the seminar Peter is giving at Adobe, on the history of Ghostscript, and will probably meet lots of Adobe folks. Monday is our Artifex staff meeting. I don't consider myself much of a people person, but I'm enjoying the contact and the sense of connection. Even so, I'll be glad when things settle down and I can concentrate on coding again.

San Diego

Quaker meeting was good. The kids, especially, had a good time in San Diego, and (mostly) didn't get fried or too melted down.

In Pacific Yearly Meeting, I'm on Secretariat Committee, which basically means I put out the daily newsletter during meeting. This is a great way to get to know lots more people, especially for someone who feels as new and shy as I did when I started. Also, I like doing publishing.

I used LaTeX, mostly because I'm so familiar with it. Almost all the stuff coming in electronically was in Microsoft Word. I just converted everything to text (on the Windows box) and redid the formatting by hand.

It's always a humbling reminder to see how difficult computers are for people. There was a lot of tech support provided in the Secretariat office last week. A lot of the problems were things that should have just worked. One of the more frustrating was a brand new iBook running Mac OSX, which crashed every time you tried to add a new printer in the "Print Center". Eventually, someone looked through the console logs and found that it was an unresolved symbol in a Canon printer driver bundle. Since we didn't care about Canon printers at all (the printer was an Epson C60), deleting the bundle solved the problem. How the hell are people supposed to figure that out? It took three or four clueful people, and quite a bit of time.

Richard Stallman

Since OSCON was happening concurrently, I managed to meet a few free software people. Tuesday evening, Richard Stallman came by USD to talk to me. I'm still wrestling with the issues discussed. I deeply respect and admire Richard's vision and his principled approach, but there are also some things I simply do not agree with, and that is causing tension.

Essentially, the problem is that the GNU Ghostscript web pages and release mention the AFPL version, which goes against the GNU coding standards. According to those, it's ok to mention "well known" programs like Windows and OS X, but not to recommend them, or to mention less well known programs, such as AFPL Ghostscript.

I have two problems here. First, I don't believe in witholding information. Since Ghostscript is continually improving, the AFPL releases really are better than the GNU ones. In particular, if people are having problems, moving to the AFPL release might be a good way to solve them quickly. Also, in this age of Google, holding back on information is rather futile.

Second, I don't consider the AFPL version to be unethical. The primary freedom given by the GPL but not the AFPL is the right to sell the program for money without compensating the author. The other important rights, such as use, modification, and improvement without discrimination, are intact, as is gratis distribution.

It is intriguing to imagine how things might have turned out differently if the AFPL took hold, rather than the GPL. For one, it would address a big problem in free software, the lack of compensation for developers. Red Hat-style distributions would be at a disadvantage, because they'd have to find the authors and pay them, and Debian-style distros would probably have a wider role. I think that the community spirit of free software could work under the AFPL as well.

As it is, the question is basically moot, largely because the GPL community is huge, while there is not much of an AFPL community aside from Ghostscript. Both licenses are thorny and prickly when it comes to intermixing with other licenses, so it's really just as well that only one is popular. Otherwise, there's a pretty big risk of dividing the community. That looked like a real risk when the NPL came out, but again, the GPL won.

In any case, I promised Richard that I would fix up the current GNU release. However, I realize I don't really have my heart in being a GNU maintainer. There are a few possibilities. One is for some other volunteer to step up to the plate. Another is to amicably part ways with the FSF and do GPL, but not GNU, releases of Ghostscript in the future. This would be a shame, because I feel that GNU is a worthy project and needs support. Fortunately, there's no urgency; there's plenty of time to figure out the Right Way.

I tried to talk to Richard about the GNUstep project and its relation to Mac OSX, but he's not following the project. I find this quite disheartening. I feel that Richard was far more effective as a leader when he was actively designing and writing the software to make the GNU project real.

Ankh, simonstl, and Paul Prescod

I had considerably more fun chatting with Ankh and simonstl on Saturday evening, and meeting Paul and his wife Lilia at the San Diego Zoo with the family on Sunday. Our conversations were deep and wide-ranging. I am more and more convinced of the importance of personal connections. In particular, it's really important to have friends inside the W3C, especially because I've been so critical of the organization. In the past, DV has filled that role, but he's since gone to Red Hat. Now, I feel that if I really have something to say, Ankh will make sure it gets heard by the right people.

Quaker Meeting

I'm at Pacific Yearly Meeting. It's a great family time together, and I'm really enjoying building and renewing my ties to other Quakers in the area.

Advogato DNS

Sorry about the outage over the past couple of days. UC Berkeley changed the IP addresses, and, being on vacation, I managed to goof up the DNS updates. It should be fine now.

I think I want to figure out a better process for getting the updates.

RMS

Richard Stallman, in town for OSCON, stopped by a couple of evenings ago, largely to ask me to remove references to AFPL Ghostscript from the GNU webpage and in-release documentation. He has an amazing presence, and I want the GNU release to be consistent with his vision. I still have some concerns, though. For example, sharing the bug tracking between the GNU and AFPL versions has obvious advantages. Would that be an unacceptable compromise? I'm not a big fan of witholding information, but that seems to be what the new GNU coding guidelines seem to require, in order to avoid promoting non-free software.

I also enjoyed meeting Bradley Kuhn and Henri Poole (of the Affero project). I'll probably write more about the meeting; there was some interesting discussion.

Fitz

My Fitz prototype is now released.

It renders the tiger, but otherwise is more of a code sketch. Perhaps some people will find the Python bindings interesting.

Well Tempered Screening

I sent a snapshot to our internal mailing list. It's getting to the point where it actually works and people might want to use it. A bit more cleanup, though.

I'm trying to wrap things up before my trip to San Diego, so I'm not wrapping my mind around more speculative things like proof systems right now.

San Diego

I won't be at OSCON, but as it turns out I'll be in San Diego from Monday through Saturday next week anyway, for Pacific Yearly Meeting of the Religious Society of Friends (Quakers).

I might be able to get away for a few hours. It would be really fun to meet with friends. I probably won't be in ready phone or email contact there, but get in touch and we'll see if we can set something up.

Proofs

I think that computer-checked proofs are inevitable, but not particularly soon. None of the existing formal proof projects are Web-aware in the sense of fostering distributed collaboration. The big step, I think, will have to be taken by someone who understands the Web deeply, and enough formal logic to get by. I have the feeling that huge technical prowess is not necessary. Making a usable system doesn't have much to do with "corner cases" in logic.

As Bram points out, I think the big technical issue is making proofs robust with respect to choice of details. In existing systems, they seem to be fairly brittle. For example, Metamath uses the standard set-theoretic encoding of numbers. Proofs of theorems in number theory work no matter what encoding you choose, or even if you use an axiom system rather than an encoding. It shouldn't matter which axiom system you use either. In a sense, number theory is in terms of a "Platonic" ideal of numbers.

But you don't see that from looking at the proofs in the Metamath database. There, proofs are in terms of the specific encoding. Trying to adapt these proofs to a different encoding would probably be difficult, and would more likely than not involve digging into the proof itself.

For example, this is a theorem in the standard set-theoretic encoding of naturals: forall x and y, x > y <=> x = x \union y. However, in other encodings it will be false, and in others it will have no meaning. Somehow, you have to show that the proof is respecting the abstraction, not piercing it and accessing the encoding. Metamath, while brilliant in many other respects, doesn't seem to have an easy way of doing this.

In logic, "any theorem over naturals in one encoding is also a theorem in any other encoding" is known as a meta-theorem. Ordinarily, you can apply a theorem using plain old substitution (and, usually, some constraints that variables are disjoint). However, the rules for meta-theorems are hairier.

I'm convinced, though, that the understanding of the Web must run deep, or the project has little hope. In fact, there seem to be two fairly large-scale projects to formalize math in a machine-checkable form, both initiated by mathematicians, and neither with a high profile today. One, Mizar has nonfree software, and uses the structure of a refereed journal. Another, QED, doesn't look like it's been updated in a while. I haven't seen anyone even try to adapt a proof done in one system into another.

I should also mention NuPrl, which is GPL. The focus is a bit more on automated theorem proving, although they do seem to have nice library of proofs. Also, NuPrl is based on constructive logic.

A good blog

Phil Windley is the CIO for the State of Utah. His blog is well worth reading. We need more people like him in our government.

Twelf

Antony Courtney points me to Twelf, a formal proof system from CMU. It looks interesting, although I haven't dug into it at all. For one, it's pretty difficult to figure out what license the software is released under.

Trustworthy metadata

I had a nice chat last night with Bram about the eigenvector metadata engine. He suggests that using medians rather than means as the way of combining ratings would be more attack resistant. I think he's right about that. With medians, if more than half of the confidence value comes from "good" nodes, then the result is guaranteed to be within the range of ratings published by "good" nodes. In particular, if "bad" nodes publish extreme high or extreme low ratings, then with a median they won't affect the outcome much, while with a mean they can.

Bram also reminded me of his Post from over a year ago. His design is actually fairly similar to the metadata engine as currently implemented in Advogato, but with some twists. For one, it may help with the scaling, because the confidence values go linearly down to 0 (at which point they need not be further propagated), while in the eigenvector formulation they tail off exponentially.

The analysis of his design is going to depend greatly on how the "distrust levels" are chosen. The subgraph of zero-distrust edges has no attack resistance among strongly connected components, so you probably want to avoid those. With distrust p, assertions travel over at most 1/p hops. If your distance to a "bad" node is greater than that, you only get recommendations from good nodes. That's not at all a bad property, although it's still a very good question how well it would work in practice.

It's really nice to have somebody to talk to who understands this stuff as deeply as Bram.

Expert witness

If you, or someone you know, is interested in being an expert witness in the area of database schema design, let me know. I was an expert witness for a halftone screening patent lawsuit a few years back, and I'm really glad I had the experience. I learned a lot about the way IP law works, and the pay can be pretty good too.

Kids

Alan's anxiety is much, much better now, and he seems overall happier.

Watching Max develop language ability continues to be a wonder. Typical sentences include "I take a picture at you, mommy" (holding a camera), "Put it on top your nose", and "I can't see his tail" (a cat in his favorite I Spy board book, with the tail indeed obscured). He's also starting to get into video games, including Galactic Patrol.

Ghostscript

The HEAD branch has now been merged over the DeviceN branch, but the latter is not public yet. Well Tempered Screening works just fine after the merge, and I cleaned up some bugs. Among other things, the shading code was poking into the device halftone internals. Yucch.

What's the best primitive for formal math on the Web?

There are several systems of primitives that make sense as the basis for formal math. Metamath is based on ZFC set theory, but that's far from the only primitive.

In fact, under the hood, Metamath is actually based on generic axiom systems. You don't have to start from ZFC if you don't want to; you can put in your own axioms. But then you can't use the proofs already done. This is the old problem of "runtime incompatibility" again.

Another basis seems to be predicate calculus. A major justification is that boolean-valued functions are basically equivalent to sets. The "comprehension" axiom in ZFC set theory relates the two. Dijkstra's formulation of weakest precondition semantics favors predicates over sets as the most primitive operation.

Then, of course, you have the lambda calculus. This is, in many ways, more appealing because it's closer to real programs, and also has a more concrete flavor. The Church numerals are a particularly pretty way of encoding the natural numbers.

In fact, natural numbers alone are universal. You can encode anything you like as Diophantine equations. However, these encodings are really painful. They're useful as logic or philosophy, but not really useful to make real proofs.

A good proof language for the Web may have the TIMTOWDI philosophy, and include mutiple primitives. Having more than one is redundant, but may make proofs considerably easier. For example, in any of the systems it's easy to imagine having numbers so you don't have to prove 2+2=4 every time you need to calculate.

So the criterion for including a primitive should be whether it makes proofs easier, or whether it's just as easy to do the proofs using some other primitive. I'm not sure what all the relations are here.

Another primitive I like is tree constructors, just like ML datatypes. I think that these are easier to work with than many other primitives, but on the other hand each datatype basically creates its own set of axioms, which keeps you from having the same kind of purity as in Metamath. It might be a reasonable tradeoff, though.

One reason I like tree constructors is that they keep types cleanly separated. In the ZFC formalization of mathematics, you can say {0} + {0, 1}, and it makes sense (the answer is 3, by the way). I'd like the concept of integers and the concept of sets to be obviously disjoint.

Spam and trust

John Patrick writes: The Spam Has Got To Go. Amen to that.

But what's the solution? There are a lot of things you can try: legislation, informal sanctions, blacklists, spam filters, hashcash schemes, p2p spam reporting networks, etc. Many of these are being tried. Jon Udell and others suggest that digitally signing emails will cut down on spam, but it seems to me just one more barrier for spammers to overcome. Tap into an unlimited supply of digital id's, and you just send signed spam.

I think I know how to stop spam, while preserving the wonderful openness of email. Longtime readers won't be surprised to hear that my idea is based on attack-resistant trust metrics. In fact, longtime readers are probably sick and tired of me harping on this damn trust stuff all the time. But harp I will, for the message needs to get out there. There is now research on how to build spam-resistant messaging networks. Quite a bit more work is needed to make it practical, but I think the usual technique of paying smart people to think about it has a good chance of working.

Anyone who's interested should read Ch. 7 of my thesis-in-progress.

SourceForge

I had a nice interchange today with Jacob Moorman of SourceForge, about my miffed feelings regarding the language on their peer certification page. Actually, the peer certifications are going away, but more important, I get a renewed sense of an open channel with the SourceForge guys. In fact, complaining in my diary rather than contacting them directly was unfair, and I apologize.

I really hope SourceForge manages to succeed in spite of the current harsh business climate. They're doing an immense public service to the cause of free software.

A New Kind of Science

I power-skimmed the book again tonight. There's no question about Wolfram's ego. There's precious little actual science in the book, but I'm still convinced that there's something there. Exactly what, I'm not quite sure. My latest stab at expressing his central point is: "a lot of shit is universal".

Even so, I'm not really swayed by his arguments. One of his points seems to be that people thought about universal computation ability as the exclusive province of very complex, specialized equipment, while in fact, many simple systems can express universal computation. Sure, sure. But his Turing machine construction on rule 110 requires enormously complex and specialized software (the initial state), even if the hardware is ridiculously simple compared with previous universal constructions. If you look at the sum complexity of hardware and software, then there's a good argument to be made that Wolfram has taken a big step backwards.

Distributed proof repositories

AaronSw writes: "While hashes are useful, one thing you often here in W3C circles is that local file storage should keep track of the URIs from which it got the files and cache them so. So you'd just look for the cached file with that URL instead of the one with that hash."

Glad to see the idea is resonating for other people. I'm not sure what UR? scheme is best for these proof files, but my hunch is that hashes will be important. For one, definitions, theorems, and proofs are probably a lot more immutable than most of the other objects W3C people deal with. I'm also envisioning a Napster-style propagation of referenced objects. In typical usage, I can imagine a person working on a proof in conjunction with a client which snarfs proofs and lemmas from the rest of the Web, then, when it's done, uploads everything to a static web space. Thus, URL's move around fluidly, but the hash is constant.

I think I'll write about choice of primitives over the coming days, unless people beg me to stop.

Math

My regular readers are no doubt wondering what it was that Bram and I talked about yesterday evening that was so much fun it left my diary so inarticulate. I'll try to summarize.

Rigorously formal mathematics is interesting, but has a well-earned reputation for being too painful and tedious to be really practical for anything. So far, it's been much more the domain of philosophers than engineering.

Bram and I feel that the Web has the power to change that. For the most part, tools for mechanically checking proofs have been basically been solo tools. For one person to get through a significant slice of mathematics is an ordeal. But if the work is split up among many, it's a lot more reasonable.

Collaboration requires standard file formats. Bram and I are thinking along the lines of a low-level, simple formal proof format. The main thing is to apply inference rules. Then, there's a higher level metadata format for saying things like "this is a proof of Fermat's Little Theorem using the Peano axioms for natural numbers".

The design of such languages parallels the design of computer languages. Names and namespaces are particularly important. Proofs should be written as stand-alone modules, but also so that you can build up a structure by reusing the results in other proofs. Since these file formats are intended to live on the Web, links are critically important too. In particular, you want to be able to import other namespaces. Thus, in the example above, "the Peano axioms for natural numbers" would be a link.

A link can be a URL, but it can also be an immutable hash of the content. That way, if you have the file locally, you don't need to go to the net to resolve it.

Typically, these formal systems have set theory as a primitive (Metamath uses ZFC). Other objects are encoded in the primitives. For example, the usual encoding of natural numbers is: 0 = {}, i + 1 = i \union {i}. But there are other ways of encoding numbers, some of which might make more sense. For example, if you're doing calculations, a sequence of bits might be better. Of course, all the encodings of integers are equivalent, but if two proofs use two different encodings, it might be hard to knit them together.

So I think it's important to make it easy to convert between these different conventions. There are a couple of techniques I can think of. First, you might be able to make inferences based on models: "if P is a proof of theorem T using encoding E0, and E1 is a model of E0, then (some simple transformation of T) is also true". Another way would be to parameterize proofs, so they take, say, the integers as arguments. Any encoding of the integers which satisfied the Peano axioms would do; to use a proof with a different encoding, you just prove the Peano axioms in your new encoding.

The higher-level format is useful for humans wishing to browse the space of proofs, and also for clients to automatically verify proofs. At the very least, you want to let them crawl enough of the proof web that the expansion of the proof is complete. I'd also expect the usual Web patterns to show up: search engines, directories, pages full of annotated links and commentary.

The exact choice of lower-level language is probably not that important. You can always use tools to translate more convenient proof formats into the lower-level one, or even to automate part of the process. By using a common lower-level format, people using different tools can still collaborate. One important test of this principle is whether the Metamath proofs can be so translated.

Formal logic for proofs is not new, of course. The idea of a proof repository isn't new, either. But I am unaware of anyone who has designed languages for formal proofs with the explicit goal of fostering collaboration through the Web.

This Introduction to Mathematical Logic may be rewarding to those intrigued by these ideas.

14 Jul 2002 (updated 14 Jul 2002 at 07:53 UTC) »

I've been having a lot of fun discussing with Bram the ideas mentioned yesterday. Thinking about formal math in the context of the Web is a lot of fun.

239 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!