Advogato: Certifier Nullification and the Advogato Trust Metric

This article is my response to the recent article and subsequent discussion about the trust metric--you'll note the influence here of several of the good replies posted there.

Journeyer status has become far too common. People's certifications tend toward Journeyer status, seemingly independent of whether they deserve it.

When I first signed (back) up, and Denny certed me as an Apprentice, I figured that that was the most appropriate slot for me to be in. By my own personal metric (largely influenced by Software Craftsmanship by Pete McBreen), I'd be a strong journeyer, in terms of my technical abilities to get software work accomplished, but just about all of that skill is applied for closed-source work for my day job, so according to Advogato's metric, my occasional helping out with Debian isn't enough for a higher cert. So I'm not an apprentice, but I am an Apprentice. Fair enough.

Then I started receiving a few Journeyer certs, and my bright green changed to cyan. I'm grateful that a few people think I'm Journeyer material, and I agree that on a broader level that's where I belong, but from a Free Software perspective I don't merit that title at all. Maybe that means that the definition needs to change, or else extra ranking options need to be added in parallel to what we have right now--to account for general technical skill along with prominence in the Free Software community. But more on all that later...

Having people who are certified higher than they should be is actually just the general case of having people being certified who should never have been--in that case they shouldn't be certified above Observer!--which was the focus of the trust metric at its inception. This means we can examine the trust metric and see what wisdom we can glean from looking at it in the light of the more general case.

In terms of the trust metric security proof, our problem is that there are a lot of "confused" nodes who are certifying "bad" nodes. Anybody who is giving out Journeyer or Master certs to one or more users who shouldn't be receiving any is "confused"--which is a generalization of the original specific case that the trust metric definition paper discusses, of should-be Observers being certified at Apprentice or above by "confused" users.

In the trust metric definition paper, Raph proves that the number of bad nodes certified is bounded by the number of confused nodes (Theorem 1). The point is that it's not ultimately a function of the number of bad nodes--so Bad Guys(TM) can create as many accounts as they want, but unless they can trick other, certified users into certifying them it won't make a difference.

However, it points to another problem one level removed from that one: The number of confused nodes isn't itself bounded! So if a large number of people are confused and start certifying people they shouldn't (and/or, same thing, certifying them higher than they should be), then we're going to have a (potentially) large number of certified bad nodes. So the plain answer is, we need to reduce the number of confused nodes in the graph, in order to reduce the number of certified bad nodes.

But wait a minute... let's go back to basics and review the key definitions here. Raph's distinction between "good", "confused", and "bad" wasn't a part of the formal mathematical definition of the trust metric--it was introduced for the security proof--so we need to scrutinize carefully what he, and we, mean by those terms. Saith he:

The bad nodes are under the attackers [sic] control. The confused nodes themselves represent valid accounts, but may contain certificates to the bad nodes. The good nodes are both valid accounts and have certificates only for other good nodes and confused nodes.

Raph's accidental omission of the apostrophe here is interesting: We presume it should be "attacker's", meaning one malicious individual is creating a bunch of accounts to try to wreak havoc. However, it could easily be "attackers'", if there is a large group of people, creating one account per person, performing the attack. It makes no difference to the resilience of the underlying mathematical model.

The key insight comes from that observation. What is an attacker, anyway? "Well, an attacker is, um, one who attacks." Spiffy. What do you mean by "attack"? The word isn't explicitly defined in the paper. We can always fall back to its dictionary definition, but... what does that have to do with the model? Nothing. The paper didn't define it because it didn't need to. The security proof holds independent of what you're securing against.

Instead of saying that the bad nodes are "attackers", let's generalize it to say that the bad nodes have Property X. Confused nodes don't have Property X (or else they'd be bad), but are certified themselves and have certified (or overcertified) bad nodes; good (certified) nodes never do so. Let's say that the certified users that are confused about Property X will themselves have Property C(X). By the generalized trust metric security proof, reducing the number of users who have Property C(X) reduces the number of (over)certified users who have Property X.

Let's say that Property X ceases to be "is actively attempting to ruin the Advogato experience maliciously or for personal gain" and is replaced with "does any sort of closed-source software development". That makes me a bad node, and all those people that certified me are confused. Use instead "works on *BSD", and a raft of other people become bad, and I become good once again (unless I happen to have certified a *BSD developer, in which case I'm now confused). Or we could use properties about where people live, their hat size, or, more ominously, their skin color or such like. If Property X is defined to be "is a kernel developer", then alan is a bad node, and the root node is confused for certifying him as a Master!

What does all this mean? The trust metric is value-neutral per se. The values that Advogato chooses to hold to come from the users--the certifiers--not the trust metric. The officially endorsed values of Advogato aren't encoded in the trust metric, and so can only be enforced by the individual users. (You could say that there are some value choices implied by the selection of certain users as trusted seed users, but that selection is an external process, and not inherent in the trust metric itself. The trust metric mandates the existence of seed users, but it doesn't dictate who they are.)

This has two implications:

All complaints along the lines of, "The trust metric is broken because we're seeing $CLASS_OF_PEOPLE being certified", in fact say nothing whatsoever about the trust metric itself. The trust metric ain't broke; if anything, the trusters are.
The things that Advogato values and rewards is a function of the users of the system, up to the limits of what their individual certification levels allow and disallow. It has nothing to do with the published guidelines of who should be certified how, unless the certifiers decide that it should.

That second point is worth a bit more examination. In the United States and elsewhere there's a legal principle called the right of jury nullification, which allows jurors in a trial to acquit a defendant of violating a law that they believe to be morally wrong. In effect it allows them to judge the law, and render it ineffective (nullify it) by refusing to convict even if the defendant did what the law said not to do.

This is effectively what's going on on Advogato. Even though the law has been laid down concerning whom we should certify and how, it's ultimately up to "We, the People" to make judgments in individual cases whether somebody ought to be certified or not. If we decide to ignore the law, then people will be certified as Journeyers and Masters when by the rules they belong at most as Apprentices--and there's nothing that the rules (ultimately just an HTML file on a server somewhere) can do about it.

So what do we do? I see four options. We can handle the problem of too many confused users:

...by proclaiming it "not a problem": Continue with the existing system as is. Stop complaining and get on with our projects and our lives.
...by reducing confusion: Initiate a massive campaign to educate the userbase as to what the certification guidelines mean, how to apply them in particular situations, encouraging and thanking them when correct decisions are made, etc., etc.
...by removing their certification: Modify the certification guidelines to make explicit what's already implicit in the trust metric: When you say that somebody is a Journeyer, you're not just making a judgment about their contributions to Free Software, you're pronouncing them fit and capable of competently judging others the same way. In other words, Property X (badness) becomes "not (works on Free Software AND doesn't have Property C(X) (confusion))". In other words, what's "confused" today becomes "bad" tomorrow--which makes sense, if we esteem accurate handling of certifications, and not just involvement in a Free Software project. If somebody you certified starts certifying people in a way you don't think is right, yank their cert.
The present glut of Journeyers means we've probably got at least a few users with bad judgment, including some somewhere close to the root node, who need to have their own Journeyer certs downgraded or removed altogether.
...by changing the trust metric to make them a non-issue: Along the lines of what was mentioned in passing above (and what has been discussed several times over Advogato's history), modify the trust metric to split between the functional certification of "you work on an important Free Software project" and the trust certification of allowing their certs to have an impact on others.

I advocate option 4. It doesn't do away with the problem of certifier nullification (if it's in fact a problem--if essentially nobody agrees with Raph's guidelines, then maybe they should be trashed!), but it makes it easier to comply with the letter of the law without other consequences rippling throughout the trust network: "Yes, I believe you're a Journeyer according to the Free Software measure, but no, I don't believe you have enough good sense to make accurate judgments of other people!"

Option 3 has some problematic Gödelian implications, since if confused nodes are now relabelled "bad", then the formerly good nodes that had certified them must now be relabelled "confused". Another iteration is required so that we can relabel those nodes as "bad", and propagate the confusion further up the chain of formerly good users towards the root of certification. Ultimately the seed users themselves must be labelled "bad" (sorry, Raph!).

Besides which, any sort of attempt to implement option 3 would still require that sort of ranking, on both criteria, on the part of the certifier, but keeping today's certification system means he/she must necessarily choose the lower of the two ranks as the ultimate rank to give. In the face of such a task, a certifier is much more likely to say, "No, forget that, they deserve more than Apprentice," and overcertify. Option 4 doesn't require any more thought on the certifier's part than Option 3; it merely allows for the expression in the system of the full results of that thought process, so no such compromises need to be made in the process of force-fitting them to one metric. Option 4 also offers the possibility of requiring less effort than option 3, if it's implemented in such a way that either of the two rankings can be defaulted to "no opinion". So any concerns that option 4 would require "too much effort" are largely unfounded.

In today's reality, there are probably several factors (such as overall technical competence, entertainment value of diary and article postings, etc.) that influence each individual decision. Breaking out several rankings from today's solitary Observer->Master scale would allow more fine-grained evaluation of each other's abilities. (The comparatively new diary rating system is a step in this direction.) How to integrate multiple scales together into a meaningful system is an open question for discussion.

(Kudos to Raph for creating such an interesting trust metric in the first place. In the paper Raph effectively said he hoped it would extrapolate to other real-world issues and not just be confined to one Web site somewhere. It looks like it does, though perhaps not necessarily in quite the way he was expecting.)