FOAF-based whitelisting for email

Posted 26 Mar 2007 at 21:14 UTC by kjetilk Share This

There are at least 20 million FOAF profiles out there, from Advogato, LiveJournal, Opera Community, personal hand-maintained files etc. In many cases, the FOAF is also an export of the social network, i.e. they are statements that one user knows another.

I don't know any spammers, nor do any of my friends. Besides, spammers tends to use random MAIL FROM addresses anyway, so if any of my friends, and their friends, actually, pretty much anyone that is reachable by following that social network originating from me, is sending me an email, it is pretty safe to accept it.

We should put these data to good use. As a part of a concerted effort to bring running Semantic Web code to the masses, I initiated a project that will develop software to compute a simple unidimensional trust metric, primarily based on network distance and make it available. Furthermore, we will develop plugins for SpamAssassin and qpsmtpd to use the data. In fact, minimal plugins are allready developed, so what remains is to compute the trust metrics and define how the plugins can access the computed trust metrics as well as create a scalable system where the metrics can be queried.

The project is using the foaf-dev mailing list for discussion. The initial discussion focused on the meaning of "trust" as well as the importance of topical trust. In particular, our friends from the Konfidi project has designed an elaborate trust system, relying in part on FOAF, in part on topical trust and strengthened the system with PGP. Clearly, the statement that you know someone does not imply that you certify that the person will not spam. Also, that I trust my climbing mate with my life when climbing does not mean that I'll tell him my root password.

However, it was resolved that we should only require the data allready on the web, and even if that means very little topical trust data is out there, it is most likely good enough to be useful for the task of whitelisting your network's email through.

I do not plan to deny all the email from senders not in my social network myself. It is just one of many anti-spam measures, some are cheap, some are heavy. I expect this to ease the load on my mail servers. I also plan to do OCR on emails containing images. Some argue that they would gladly drop all image emails from anyone not in their addressbook, and so a one-hop social network would suffice. It is not hard to come up with a use case that would make this look silly: Just think about the girl you met at a party last night: She got your email, but you had not yet entered hers into your addressbook, and now she's sending you her picture... If she wasn't in your addressbook, it is not unlikely she is in your social network. I personally will use it to ease the spam-scanning loads, but you are of course free to use it as you see fit.

There are several anticipated attacks on this system. Just sneaking in fake email addresses will not mean much, but inserting fake knows-statements will. We will need to be careful about the sources we have for those. Also, there will be some (natural) supernodes, and if spammers starts to use their addresses in their MAIL FROMs, it will become a big annoyance for them. Thus, strengthening the system with OpenID, SPF, DomainKeys or even PGP may be needed, either as a modifier to the trust metric, or implemented as a part of plugins to mail systems.

We would like to take advantage of the experiences that Advogato has gained from years of work on trust metrics, we can always use a hand with development, and we would of course want to use the FOAF data from Advogato.

Excluding forged FROM addresses is crucial., posted 26 Mar 2007 at 22:01 UTC by Pizza » (Master)

If you can't trust that the FROM address isn't forged, then any verification derived from the address is meaningless. End of story.

Similarly, if anyone in the FOAF chain is compromised, it all crashes down because the the trust vector becomes the transmission vector. This is often the case with virus/worm-laden mail.

inverse function, posted 26 Mar 2007 at 22:15 UTC by lkcl » (Master)

quantum mechanics shows that the inverse wave function is essential.

therefore, just like you say, pizza - it's not enough to have a one-way link: you also need to have a weighting for the link the other way, too.

not only _that_, but the "context" in which the FOAF linking is known is _also_ important, so now you have a weighting thrown in for that.

not only _that_, but also it is important to factor in as many different "contexts" - different FOAF sources - as possible, giving weights for each one.

in fact, you probably don't want weights for each one, at all, but you want to "normalise" it out, given the total number of FOAF sources available.

so. you have a normalised vector which gives weights to all of the FOAF sources (again, i believe that's from quantum mechanics - wave functions)

you then perform a "filter" function "on how many of these FOAF things from all the sources do we agree"?

and that becomes your total "spam" weight, times the total possible allowed weighting.

very very straightforward.

and very cool.

summary, posted 26 Mar 2007 at 22:17 UTC by lkcl » (Master)

normalised weighting of FOAF source "strength".

filtered percentage agreement of individual FOAF "agreement"

times SPAM weighting.

equals score.


eh., posted 27 Mar 2007 at 05:53 UTC by ncm » (Master)

I get spam claiming to be from people I know all the time. It doesn't mean their machine is compromised. Rather, somebody who has sent them mail, or got mail from them, was compromised. The spambot harvested the "To:" and "From:" addresses from that other person's mailbox. The names I recognize are all people who are active on multiple mailing lists, so knowing who they are doesn't help in discovering who was compromised.

The SPF and DomainKeys etc comment, posted 27 Mar 2007 at 12:50 UTC by kjetilk » (Journeyer)

ncm and Pizza: we are fully aware of that problem, thus the comment about SPF, DomainKeys and PGP etc. It isn't a very simple problem, since people don't like very strong stuff like PGP, but according to some people with lots of experience from SpamAssassin say that the support for SPF is getting sufficiently good to be relied on. I never liked SPF, but that's life...

email - one-way communication anyway, posted 27 Mar 2007 at 19:38 UTC by lkcl » (Master)

what's the one thing that email can't do?

it can't be used to communicate, two-way, on complex issues.

it's just not possible. especially when there are a number of people involved.

our minds cannot remember enough from one email message to the next.

only _very_ intelligent individuals, with _extremely_ good memories, can use email to successfully communicate, and even then, only on issues which do not require diagrams, hand-waving, emotion, proper emphasis...

... and what was the _first_ thing that microtoss added to email? rich text formatting, of course. embedded html for "outsiders".

and, now that "rich content" is here, what's the _single_ largest source of problems?

xxxxing microsoft outlook depressed email.

i was going to say something clever like "in time, when history looks back..." but it's so blatantly obvious that email is _the_ biggest communications failure that humanity has ever invented.

so - whilst efforts to improve email communication and reduce spam etc. are laudable, i question even the _usefulness_ of any such efforts.

(btw i'm not criticising your work: i realise we still have to have email...)

You dont know any spammers?, posted 11 Apr 2007 at 11:26 UTC by Chicago » (Journeyer)

Spammers dont go round telling people that they are spammers. They usually deny it strongly, mainly cause they would get beaten to a pulp on the spot.

Im not a spammer.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

Share this page