ORBS, MAPS, and Trust Metrics

Posted 13 Jul 2001 at 16:29 UTC by sethcohn Share This

With the 'death' and rebirth of ORBS and the "subscription only" change to MAPS (slashdot), and overall the prevailing problem of spam, it's time to do something. This article proposes (and asks for feedback on) a method using trust metrics.

Opt-out doesn't work. Contrary to my illustrious Senator Ron Wyden here in Oregon (slashdot), I think opt-out isn't an close to an answer.

RBL aka MAPS is now going to be a 'pay' service, and it wasn't very effective anyway.

ORBS, even after the fork (slashdot), has never a huge deterrent either.

I posted a version of some of this over on XMLvL.net, lkcl's new site for xvl, his NG improvement of mod_virgule, the engine that runs advogato, as part of a short discussion of what trust metrics can do for email and spam. That site isn't quite ready for big usage, as previews don't work quite right yet... so here's the gist, open for feedback and moreso, looking for someone to take this idea and run with it (I have neither the time nor the expertise)

In explaining 'trust metrics' to friends, the potential for fixing spam in both email and Usenet was really clear: build a mail or news client/server with trust metric potentials.

I cert my friends, my work, etc and so their mail comes thru fine.

Friends of friends, etc, will get thru, with a lesser potential, and the further the metric stretches, the less likely I probably need to read the email.

Unknown people who have good certs from someplace I trust (and that is where community trust comes in... imagine if your email client/server said 'oh, well, sethcohn has a advogato rating, so it's not spam).

If I get email and it has no certs, it's likely junk. If I get certified spam, I can look to see who is certing it, and drop my cert of them.

Someone proposed something similar on slashdot recently using public key signatures, but of course, the major problem becomes you end up forcing everyone to get a signature in order for the system to work. And then you end up with management problems and key issues and authorizing issuers and much more. This would fix those by growing over time, and allowing people to join the trust metric movement slowly. I think PK sigs have a place, but getting people to participate is a slow process and it won't work in the meantime.

By my certing my friends as 'trusted' email users, very quickly a 'valid' base of users would be built. Even unknown emailers would be 'embraced' by the system if they weren't sending spam, and if they started, they'd be decerted pretty quickly.

How you trust that mail from "unknown to you person joeuser@somedomain.org" is really from "joeuser who is listed on advogato"?

If you were using a "trust metric aware" mail client (or server), I'd envision it similar to the way <wiki>Meatball?:InterWiki</wiki> works... it would have a list of valid 'sites' and a way to do a lookup on each of those sites. So joeuser sends you an email... using his primary mail address let's make one up here

joeuser@sethhatesspam.org

Your mail client (or your server? There are advantages/disadvantages to either approach) gets this, and one of 3 things happens:

1) You already have that address in your address book, so it's accepted.

2) You don't have it, but he included a pointer

(a header: X-trustmetric: advogato)
to a site where you can validate trust worthiness. This would be because he is also using a "trust-metric aware" email client/server. Your mail client (or server) does a lookup via the Interwiki style lookups
(ie advogato:joeuser or advogato:joeuser@sethhatespam.org)
and finds a valid and trusted cert. and possiblly alternate valid email addresses for him

This won't stop directly stop mail spoofing (someone else claiming to be joeuser could claim the metric) but there are other existing methods to fix that, like PK signatures. I don't think trust metrics can do it all. But since you'd have to spoof mail to do this, it's a minor issue and _much_ easier legally to control. Hijacking an address is clearly wrong and couldn't be covered by any stretch of 'free speech'

3) If he didn't include a pointer to a trust metric anywhere: Now your mail client (or server) has to do some work. It might be able to query a database or 2 or 5 out there that are created for this very purpose... And those might be tied to the mail clients in that they will upload trust metric data to a central server and in exchange, be able to tap the community trust web... It might also just be able to send a query to each of the Interwiki-ish sites it knows about that and ask them if they have ever heard of joeuser.

Either way: yes, one of them has heard of him, so you get the email validated as 'good'

None of the above: Sorry, joeuser is unknown, might be spam, let's flag it. If it wasn't spam, you should be able to hit a single button/key and joeuser will be notified that his email was not trusted, and that he should remedy this ASAP by getting a listing someplace and providing links to do so. But it will work without his having to do so, so this isn't an exclusive club... and hopefully responsible users would opt-in quickly. Note how this completely reverses the dynamic compared to ORBS or MAPS: We validate email users, not invalidate domains or ISPs...

Using an Interwiki approach, the solution should scale well, since it could dynamically remap to new servers as needed, and metaservers could be built to cache and handle the load if this took off.

If an ISP or company set up a trustmetric site for their users, great, more power to them, (using LDAP?) and if it went sour (due to spammers being certed), it could just be removed from the Interwiki-ish list, or even better, the list could be using trust metrics itself and the site would be decerted.

Usenet could work the same way, with talented/knowledgable posters being rated highly and spam articles being dropped quickly due to lack of valid certificates. Since Usenet was/is/canbe so highly specialized, a cert in one area wouldn't translate far, but it might carry over to related newsgroups (so a high alt.kibology rating wouldn't mean much in sci.genetics, but a high sci.physics might...)

Feedback welcome... I wish I had the skills and time to code this up, but I invite anyone to use this idea and go for it.


MAPS Quite Effective, posted 13 Jul 2001 at 18:07 UTC by Waldo » (Journeyer)

RBL aka MAPS is now going to be a 'pay' service, and it wasn't very effective anyway.

I disagree, at least in terms of my experience with MAPS. For political reasons, I turned off ORBS blocking a few months ago, though only on a few of my mail servers. Those servers are low-traffic, serving 5-20 users, though providing long-time e-mail addresses. (That is, the addresses have been in use for several years, and some have been used to post to Usenet, on discussion boards, etc.) MAPS proved an excellent solution. Somewhat more spam got through than with ORBS (I'd guess about 10% more), but there was not a single complaint of overblocking. I understand that my servers are hardly typical, but for all geeks out there running similar systems, I couldn't recommend MAPS more highly. (Politics aside. :)

How about a PGP-style web of trust?, posted 13 Jul 2001 at 19:35 UTC by deven » (Journeyer)

It seems to me that the solution may be some sort of PGP-style web of trust, where the mail servers are part of that web. Or maybe try to adapt some of the ideas of Usenet II to email. Advogato's certification system is centralized, and email is important to keep decentralized. This is just my opinion, but I think a "trusted network" of mail servers is likely to be the ultimate solution to spam.

Yes, it would probably require deployment of many new mail servers, quite probably using some protocol other than SMTP. On the other hand, if something works well enough, and can integrate with SMTP for legacy systems, the constant aggravation of spam might convince people it's worth the trouble...

Cert Servers, Not Users, posted 13 Jul 2001 at 20:00 UTC by Waldo » (Journeyer)

I think it would be most interesting if such a system could work in a less-detailed fashion, permitting mail servers to be certified instead of just users. After all, spam tends to originate not from untrusted users, but untrusted servers. (In a practical sense -- obviously, the humans are the ones sending the spam, not the machines, which blindly relay.) Which isn't to say that an individual certification system isn't a good idea. I just think that it would be far easier to implement a machine-based version.

Or maybe I'm missing the point. :)

Replies , posted 13 Jul 2001 at 20:37 UTC by sethcohn » (Master)

Waldo wrote:

Somewhat more spam got through than with ORBS (I'd guess about 10% more), but there was not a single complaint of overblocking.

But how much is that worth paying for? For free, yes, it worked, but as the articles referenced by slashdot pointed out, the cost of MAPS meant they ended up going to a charge model. Any centralized server solution will require that in the end. Someone will have to pay. What I envision would 'ride for free' on top of existing trust metrics and foster the creation of new trust metric sites on different topics.

I think it would be most interesting if such a system could work in a less-detailed fashion, permitting mail servers to be certified instead of just users. After all, spam tends to originate not from untrusted users, but untrusted servers. (In a practical sense -- obviously, the humans are the ones sending the spam, not the machines, which blindly relay.) Which isn't to say that an individual certification system isn't a good idea. I just think that it would be far easier to implement a machine-based version. Or maybe I'm missing the point. :)

Ever get spam from yahoo.com? I like yahoo.com's service and I use it as a primary mail account. By your logic, since tons of spam can (and do) come via yahoo.com's mail server, let's block the whole thing. Ditto for ALL commercial ISPs other there.

deven wrote:

It seems to me that the solution may be some sort of PGP-style web of trust, where the mail servers are part of that web. Or maybe try to adapt some of the ideas of Usenet II to email. Advogato's certification system is centralized, and email is important to keep decentralized. This is just my opinion, but I think a "trusted network" of mail servers is likely to be the ultimate solution to spam. Yes, it would probably require deployment of many new mail servers, quite probably using some protocol other than SMTP. On the other hand, if something works well enough, and can integrate with SMTP for legacy systems, the constant aggravation of spam might convince people it's worth the trouble...

I envision a trustmetric solution as being very decentralized. Ideally, dozens (hundreds?) of trustmetric sites could spring up, and the system could use any/all of them. I don't see people throwing out STMP, the legacy software is here to stay, and turning the tide would be hard if not impossible. Adding a header based system means that all existing servers and clients would work, and it could be implemented on either servers or clients (so users wouldn't depend on their providers having to upgrade)

"Trust" procmail, posted 13 Jul 2001 at 22:33 UTC by jschauma » (Observer)

While I think Waldos idea of certing servers rather than individuals is more functional, I still would remain rather skeptical of the whole idea. I very often get email from people I do not know the least bit, which I still do not want to drop. That includes emails from:

  • people responding to a usenet-posting
  • people responding privately to a mailing-list-posting
  • people commenting on my website
  • people responding to anything open source related
All these emails would be labeled with the lowest priority if not dropped at all, since neither is likely to have any means of indicating why I would want to reda their mail. Heck, you can't even rely on people choosing appropriate subjects!

In addition, if you are (wild example, pulled out of chapeau claque) looking for a job and send your resume all over the internet, you get a lot of emails from other strangers that you do not want to miss.

My point being here: I'm very happy with procmail, which very reliably filters my incoming mail into different folders. Anything that's left after the sorting of mailing-lists, friends and family, work etc could be dropped into a mailbox called "unimportant" which you could scan periodically.

Anything that looks like spam goes to /dev/null right away anyway.

Spam Sources, posted 13 Jul 2001 at 22:51 UTC by Waldo » (Journeyer)

Ever get spam from yahoo.com? I like yahoo.com's service and I use it as a primary mail account. By your logic, since tons of spam can (and do) come via yahoo.com's mail server, let's block the whole thing. Ditto for ALL commercial ISPs other there.

Actually, no, I can't remember ever getting spam from Yahoo. I get lots of spam that claims to originate from Yahoo, but I can't recall a single instance of when it's really come from Yahoo. A quick sort of my spam folder (yes, I've saved all of it for the past 2 years :) and a random selection of the few dozen @yahoo.com spams reveals not a single one that actually originated from Yahoo. This is, of course, because Yahoo's requires that all mail by sent via their web-based system, which is not practical for bulk-mailing.

That said, I see your point that there are major providers through which spam could go and it would therefore be less trusted. But if 99.9% of the mail that Yahoo sends is not spam, then they should score really quite highly on a trust metric. OTOH, if a small-town ISP runs an open relay and only 80% of the mail that they send is not spam, then they're going to rank really quite low.

By your logic, since tons of spam can (and do) come via yahoo.com's mail server, let's block the whole thing.

For discussion's sake, let's say that tons of spam comes through Yahoo's mail server. But remember that this is a trust metric -- we're not blocking it, we're permitting users to apply that metric to their mail to filter their mail as they see fit. Furthermore, if "tons" of spam really did come through Yahoo then, yes, I would like to block the whole thing on my personal trust metric settings.

You have a good system in mind here. I just think that it would be far too labor-intensive and systems-intensive to set it up on a person-by-person basis. I believe that a server-based system takes the ORBS and MAPS approaches and extends it into a distributed, community-driven approach that lets the end user script the action that they'd like for their mail system to take. In fact, this seems like an ideal use of XML-RPC, now that I think about it. It's imminently doable, with a little work on the part of MUAs or a little Procmail wizardry.

Replies 2, posted 14 Jul 2001 at 08:27 UTC by sethcohn » (Master)

jschauma writes:

I very often get email from people I do not know the least bit, which I still do not want to drop. That includes emails from: people responding to a usenet-posting people responding privately to a mailing-list-posting people commenting on my website people responding to anything open source related

And you also list the example of a jobhunt. For all of this, you list using procmail as the mail filter.

No question, procmail does a great job. But it's also a 'personal' solution. You taught it your friends, work, mailing lists, etc... And added spam rules too. A 'roll your own' is always going to be most effective and happy making. For that matter, anyone with a domain and a mail server can create onetime or disposible acccounts, so they don't worry either. It's the average user we need to look out for, who doesn't use procmail, nor do they have access to more than one or two email addresses at a time. This user uses major ISPs, and I'll address this below too.

As for the other examples... I'm not suggesting that mail be dropped... it would take a while before enough 'trust' was in the system to warrant that. And at first, since most people wouldn't be listed in a trust metric, you would have a large volume of 'spam-possible' emails, which you'd have to look through... but look at the circumstances in a short time... each of those people you answer ends up in your address books, and gets a trust metric listing via that. Your trust metric hopefully melds with others, generating a large trust database of 'good users'. Each of your replies would hopefully (as a footer or other note) include a pointer to how to avoid being classified as 'uncertified', especially if the answer is as simple as adding a X- header and visiting a website once.

We don't worry about abuse of the large database for 2 reasons, the first that access would be on a lookup basis, and secondly, even if a 'list' could be generated, what good is it? Abuse (via spamming the list) would lead to being cut off from those very people, even if you did decided to 'blow a valid cert' in doing so.

Waldo writes:

I get lots of spam that claims to originate from Yahoo, but I can't recall a single instance of when it's really come from Yahoo.

My bad example. You are correct: often the spam comes from someplace else, with a yahoo return address.

That said, I see your point that there are major providers through which spam could go and it would therefore be less trusted.

Yes, my point exactly.

OTOH, if a small-town ISP runs an open relay and only 80% of the mail that they send is not spam, then they're going to rank really quite low.

Another reason that server-level blocking is unfair: you are more likely to block a 'smalltown.net' ISP than a national one like 'earthlink.net' for fear of blocking too many people at once. Yet the user of the smalltown.net account is being punished for choosing an independent ISP (and often the local ISP is more likely to fix an open relay if notified anyway, you ever try to get to a real tech at a big ISP?).

I recall a few instances where 'local' isps were 'blackholed' for an instance of spam, yet the ISP and its' other users continued to be 'punished' long after the spammer was gone. Why are we treating ISPs as if they are responsible for something they aren't? If they are common carriers, they aren't responsible, and if they aren't, then they are liable for any content coming through. Let's not make them the focus if possible, that is where all of the lawsuits end up coming from. Blocking aol.com is just as wrong as blocking littleisp.net, but aol.com is more likely to sue, and also more likely (by sheer numbers) to have more spammers. (and by aol, I mean any of the bigger ISPs)

You have a good system in mind here. I just think that it would be far too labor-intensive and systems-intensive to set it up on a person-by- person basis.

Database intensive I think yes, but labor intensive? Keep in mind, I envision this as a benefit to using trust metrics. In other words, this is NOT something solely for the sake of stopping email spam. It is merely one benefit to the proliferation of trust metrics. There are many many many other benefits and uses, and that is certainly not the topic of this article discussion.

I believe that a server-based system takes the ORBS and MAPS approaches and extends it into a distributed, community-driven approach that lets the end user script the action that they'd like for their mail system to take. In fact, this seems like an ideal use of XML-RPC, now that I think about it. It's imminently doable, with a little work on the part of MUAs or a little Procmail wizardry.

I'd like to see someone do something with it. Even a server-level approach would be a start. Personally I think the user level method would work well, and as a side benefit, encourage more trust metric usage, which would feedback positively on many levels. As a global society, we have a huge issue of 'who can we listen to for accurate information?' on many levels, and trust metrics have the potential to solve some of that issue.

Teaching people (reply to sethcohn), posted 14 Jul 2001 at 14:43 UTC by jschauma » (Observer)

Yes, sethcohn, I see your point. procmail is a personal solution and probably not quite appropriate for Joe Sixpack with his AOL account. However, the only way this trust-metric could work at all (for the mass-market!) is if it is completely automatic and does not require any interaction from the correspondent. You wrote:

[...] Each of your replies would hopefully (as a footer or other note) include a pointer to how to avoid being classified as 'uncertified', especially if the answer is as simple as adding a X-header and visiting a website once.[...]
Visiting a website - maybe, but "as simple as adding a X-header"?? Have you tried telling the typical Outlook Expres user how to add a X-header? No, this process would have to be automatic. IE:
  • You receive an email from somebody "untrusted"
  • By replying, that persons address gets bumped up in your system
  • Your reply causes the corresponding persons mailreader to automatically add, say, an X-header, to other emails sent to you
Now the problem here is obvious: it requires quite a degree of complexity on site of the mail-reading agent. What it boils down to, it sounds to me, is automatical PGP. Currently, you can guarantee that you get proper email by using the public-key/private-key scheme, which also, basically, only requires the corresponding party to "visit a website" (to retrieve the public key).

Btw, your referece to usenet, how articles could be scored up is already implemented in slrn and Gnus. You can up- or down-score articles that have certain properties, and according to the score they are sorted, highlighted or whatnot. Criteria can be applied to all newsgroups, to some, or to one particular. For example, any article posted to any of the unix-related newsgroups I read that is composed with Outlook Express gets 100 points taken off it's score. :)

This can be applied to mail as well in gnus, as far as I understand it (I don't use Gnus - but hey, maybe this should be added to mutt... hmmmm... :). But again, this requires quite some interaction from the user, and the typical AOLer will not want to deal with this.

People, unfortunately, often are cows.

I've been thinking about this for a couple of weeks.., posted 14 Jul 2001 at 19:41 UTC by steved » (Journeyer)

Similar thoughts crossed my mind recently, although for different reasons. My idea was basically that there should be a way for people to make (signed) assertions about mail servers, and to query (and use, based on an assigned or computed trust) assertions made by others. The assertions could cover any number of things, such openness to relay, existence of abuse@ or postmaster@ addresses, timeliness of response to complaints, etc. The user can then decide their own policy on any of the standard controversial issues (direct-to-MX from dialup, dynamic IP, open relays, etc.)

I would also want to be able to hand junk/unsolicited email that had slipped through the net over to a program that could try to find out the "class" of junk mail (open relay abuse, direct from dynamic IP, etc) and generate (with user approval) and track complaints and responses. The flip side would be periodically (once a week?) showing the user a list of the detected spam and allowing them to highlight false positives. This way you could collect stats on the reliability of different filtering fules.

Research, posted 14 Jul 2001 at 21:18 UTC by raph » (Master)

I just wanted to point out that "stamp trading networks", based on the capacity constrained network principle that fuels Advogato, are the topic of Chapter 7 of my thesis-in-progress. I believe that this is a very good approach to spam-resistant message delivery. A particular goal is to allow routing of email from sources "distant" in the trust graph, just at a far lower rate than your immediate friends. According to the theory, someone trying to send spam would quickly saturate all the capacities in his immediate area (in the trust graph), and be unable to send more than a tiny amount.

So far, stamp trading networks are a speculative idea. Nobody knows how well they're really going to perform until a prototype is built. In addition, scaling remains a challenge. Routing in the residual capacity network is fairly easy when you have global knowledge of the network, but that's only realistic for small networks. I have some ideas about how to make routing scalable (based, primarily on the Chord ideas), but they might well not be the right way to do it.

Obviously, such a thing wouldn't work with the existing sendmail-based architecture. But I think building an alternative e-mail infrastructure is entirely realistic. For one, webmail interfaces could switch over to it without difficulty. It would take a while, of course, to migrate, but I can easily imagine that after the new network gains critical mass, port 25 will be relegated to spam in much the same way that port 119 has become today.

"More research is needed." The exciting thing is that, as a research area, this really seems to be heating up. The need for better, more robust distributed authentication. Microsoft is building a system that (as far as I can tell) is based on pretty simple technology, but will no doubt be tuned to work reasonably well for the majority of users. Our best hope of winning is to develop something that truly is better.

I look forward to someday butting foo-colored ribbons on my homepage declaring "port 25 is for spam", and "just say no to the Spam Message Transmission Protocol!"

Walk-in email, posted 15 Jul 2001 at 08:09 UTC by ringbark » (Journeyer)

For the same reason as above, I'm reluctant to blow away unsolicited email altogether. I also receieve unsolicited email in response to my web-based material and don't want to lose it.

The people at brightmail had a very successful anti-spam engine, which appeared trustworthy. Unfortunately, their business model wasn't good and the service was switched off in June. Pity, I would have been prepared to pay to receive less mail.

Certification and non-repudiation of email is an effective thing too: Thawte ofers a mechanism for free certificates containing an email address or personal name, but they suffer the problem of low penetration. What we really need is a high-profile fraudulent email case or something like it. I live in my country's capital city, but can't get enough points from Thawte to get certified enough to certify others. And their requirements seem modest to me.

At the risk of being accused of triteness: yes, unsolicited commercial email is a mjor problem, but I don't see any straightforward solution any time soon, largely because existing and future trust systems have such low participation.

Mining the database, posted 16 Jul 2001 at 20:30 UTC by mcelrath » (Apprentice)

One problem with a cert system such as you suggest is that it provides a new opportunity for spammers to "mine" the database. If you're a spammer, trolling usenet, slashdot, etc. for e-mail addresses, it would be really cool to be able to go look up which e-mails are correct (as in, assigned a trust), and which ones are fake. Lots of people spam-proof their e-mail, or use throwaway accounts, and those will probably have zero trust-metric. Not only that, but the spammer can determine which of the harvested e-mail accounts has high trust, and forge the From: header to be from that person, increasing the chance it will get through your spam-filter.

I think pgp/gpg is the only way to go. The certification has to be cyptographic so that it can't be forged. Now if only everyone used it...

Using cryptography should be something taught in schools. I mean, they taught us how to write checks in Jr. High, in the 21st century they should be teaching us how to get a GPG key and maintain it. If everyone knew how it worked, it would be possible to implement electonic cash too...

auto-cert and digitially-signed Certs, posted 18 Jul 2001 at 13:53 UTC by lkcl » (Master)

i considered a way to add digital signatures to Certs, a while back. i'm still on the case, just haven't implemented it :) not least because it will be a minor pain: the Cert will have to be generated by the user (or generated by a server, and downloaded to the user), and signed, and then downloaded to the server.

yuuuck :)

_or_ the user runs their _own_ site, _or_ the user uses a _trusted_ site which they are happy to put their private key on [read, not bloody likely!] and then the _site_ generates the cert, signs it with your key, which must accept the key's password blah blah.

_or_ the user uses an 'intermediate' identity, where they Certify, through the painful but secure way once and only once, the 'intermediate' identity as 'representing them, and to the best of their knowledge, the site that supplies that intermediate identity has not been compromised'.

then you have reasonable security and a convenient means to do updates.

now, once you have this 'intermediate' identity, where digital signatures can effectively be added under the control of the site, you have a secure means to do 'auto-certs'.

on a 30-minute or 8-hour basis, whatever, the SERVER - with no user-intervention - performs some validation check. it sends out a special 'spam-checking' message, for example, with a digital signature in the content [so that the identity of the server can be identified and it doesn't get treated _as_ spam!!!!! :) :)]

the spam-checking message is received by a cooperating server, which goes 'whoops, yes, i _did_ receive that relayed-spam, oh dearie me, _that's_ bad, i'd better let the originator know'. so it forwards the message back to the server that sent it.

the server then matches it up, sees all the headers, and goes, 'hmm, this server, this server and that smtp all sent this message on WITHOUT checking it. that's bad. WA,WA, OOOPS - This Server Is Now Certified As Black-Listed as being a Spam Relay.'

the Certification is *auto-generated* by the server. no user-intervention is needed.

then, you generate the ORBs list from that black list.

luke

Brightmail - a clarification, posted 19 Jul 2001 at 06:28 UTC by ringbark » (Journeyer)

I recently posted an article here which suggested that the anti-spam service at Brightmail had been switched off. This is not the case.

Brightmail previously offered a free service which enabled individual users, regardless of their ISP, to filter incoming POP mail and discard the vast majority of spam received. Weekly, a summary email was sent to users concerned, listing the subject lines of this mail, which could (if desired) be retrieved and read. During the time that this service ran, I had several thousand emails removed and only one false positive, a marketing mail from Wizards of the Coast IIRC. Brightmail claims a false positive rate of less than 0.01%, which I believe.

Unfortunately, their free service was discontinued on 1st June. I presume they couldn't figure out a way to make this service pay, although I would have been pleased to pay a small fee for this service, just as I pay for anti-virus software.

Their FAQ says:
As of June 1, 2001, Free Brightmail was permanently shut down to allow us to focus on our commercial advanced message management products.

I look forward to their offering a product with the same functionality as their previous offering, even with a price tag.

Mine SPAM!, posted 22 Jul 2001 at 03:44 UTC by exa » (Master)

Someone mentioned mining. Machine learning happens to be my thesis subject so perhaps you wonder how well current algorithms work. :) Well, not bad at all! As a matter of fact, this would be much better than some web of trust or trust metric which I don't really take very seriously, seeming kind of contrary to the spirit of Internet. I do think that anybody should be able to send an email to anyone.

But you should also give people effective filtering tools. It would be great if my web browser was smart enough to just erase all those annoying ads.

You can actually do it with using standard text mining algorithms. You just need to extract some meaningful feature vectors, ahem, once you do that you just need to train a classifier with input "this mail is spam, this mail is not". Would be pretty effective. Train test and be merry!

I might post a more informative article later. Always wanted to try out this one ;)

Brightmail replacement, posted 3 Sep 2001 at 07:16 UTC by jmason » (Master)

Ringbark said:

I recently posted an article here which suggested that the anti-spam service at Brightmail had been switched off. This is not the case.

Brightmail previously offered a free service which enabled individual users, regardless of their ISP, to filter incoming POP mail and discard the vast majority of spam received. Weekly, a summary email was sent to users concerned, listing the subject lines of this mail, which could (if desired) be retrieved and read. During the time that this service ran, I had several thousand emails removed and only one false positive, a marketing mail from Wizards of the Coast IIRC. Brightmail claims a false positive rate of less than 0.01%, which I believe.

My spam filter, SpamAssassin, takes a similar tactic to Brightmail's, using content and header analysis to detect spam. It also uses RBL services and Vipul's Razor.Last time I ran stats on it, it differentiated between spam and non-spam mail correctly in 99.94% of cases. Pretty good -- not as good as Brightmail, but getting there!

Currently it runs as a procmail-style filter. I haven't made it into an ASP service like BrightMail have, but it is open source, so you're welcome to do so ;)

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

X
Share this page