email2

Posted 22 Aug 2003 at 13:10 UTC by mikehearn Share This

Raphs diary entry on the suckage of the email system, and his hints about ideas for a new system, prompted me to write down the design for a new email network I was toying with about a year or two ago, but never did anything with.

For want of a better name, email2 is basically a cross between usenet and email. The IMHO broken metaphor of sending messages, which was inherited from the "if it's not a metaphor, it's confusing" mindset of the time when a lot of our desktop technologies were first designed, is abandoned in favour of a message store. It opens up the possibility of several interesting new features.

Goals

Its goals are:

* Simplicity

* Usability (by which I mean you should not feel yourself frustrated at how the system works). Things that frustrate me in the current system are the poorness of how mailing lists are handled, the way quoting works (or doesn't), the limited support for threading and so on.

* Bugfixes to the current system, for instance it should have proper threading that doesn't constantly break.

* Genericity - there should be no need for awkward hacks to do things like mailing lists.

* Security - it should not be as easily exploitable by spammers as SMTP is, and it should be designed in such a way that makes integration of digital signatures and encryption easy.

Overview

The first thing to understand about email2 is that there is no longer any INBOX, as there is no longer the concept of sending messages. Instead, each user has a message store, and when you wish to 'send' a message, the contents of the email is placed in the message store in the same way that incoming messages are. That means that when you get a reply, it's threaded correctly with the message you sent - you can see your own correspondance alongside other peoples, with no need for quoting the entire message below the reply in order to preserve context.

Messages are stored heirarchically - all messages have a parent, even if that parent is simply the top level of the message store.

Messages appear in other peoples message stores only if they are in the To list. The To list can be changed at any point, in which case the messages will dynamically appear or disappear in those peoples stores. The To list can inherit from the parent message.

Combined with a well written client, this means that it's possible to "CC" people on entire threads very easily. Likewise, if you are no longer interested in following a thread, you can simply "delete" the first message, which will remove you from the To list, and you will no longer be bothered by replies. You could of course get yourself readded at any point to undo that decision.

The To list allows messages to be reference counted. When messages no longer have anybody on the To list, it is moved to archival on the email2 server, until it's eventually deleted or moved to cold storage. All messages start with exactly one person in the To list, the person who originated the message. "From" in this case is merely a matter of useful metadata - it's possible for the person who wrote the email to delete it, but for other people to continue the thread which it started.

Consequences

This lets us simplify the mechanics of mailing lists somewhat. A mailing list then is simply an email sitting on a server somewhere, with the subscribers in the To list. Mailing list messages do not have a To list, they inherit the original (unless you want to copy people in on the thread). To unsubscribe, you remove yourself from the To list.

It also means it's possible to change messages after they have been sent. That's a highly useful ability, but one that has cause for concern over accountability. As the server is the actual holder of the message however, it'd be easy to record changes to a message in the database.

Message stores can of course be split into trees of folders, however because messages stores contain only references to messages floating in an abstract space (messages are simply files identified by guid@server type ids), it means you can move entire threads around without breaking things.

Flipping the system in this way, so that "sending" a message just involves sending a pointer to a message uploaded to a server somewhere, has some interesting consequences for quotas. In the email system of today, people typically have hard mailbox limits, and hard upload limits. For instance, many ISPs give you 10mb of mail space, and don't let you send messages greater than 5mb.

Unfortunately, now we've moved messages into an abstract space, it becomes a lot harder to establish who "owns" a particular block of data. The easiest route it to say that whoever sent the message owns it, and therefore that rather than have receive quotas, send quotas make the most sense. However, if you send a message, delete it (so you are no longer on the To list) it will nonetheless remain in the system until everybody which is on the To list deletes it. Are you still owning that message even if you are no longer on the To list? If not, then how do you deal with somebody who receives a lot of mail, but simply never removes themselves from the To list?

Quotas

This raises some questions for potential server admins of email2. Are the resource usages even bounded at all? Worse, what if a user on a low capacity server sends a very large message to a very large number of people? You get instant DDoS.

Even if the server was able to cope with this, the design is still inefficient. You're potentially sending large amounts of information many more times than needed, much further (net geo) than necessary.

One way to solve these issues is to build in a distributed replication system. When a message is uploaded to an email2 server, and the server begins sending notifications of the new message to the message stores on the To list, it can upload copies of the message to servers which are featured > N times in the To list. Because logical messages are identified by GUID, the client can check in multiple places for a copy before hitting the original server.

Now, it becomes possible to look at the servers local cache, and cross reference that with the users, to see which users are keeping messages in the store for long periods. If a user is "unfairly" holding a message active on the network it might be worth having a mechanism by which a message can become fossilised by which I mean the local client stores it, and it is no longer accessible from the network at all.

In order to keep the cached copies up to date, some form of publish/subscribe protocol is necessary.

There are many more features that it would be possible to add to this network. Let's have a look at how we might implement it.

Implementation

The most obvious protocol to use as the underpinnings of the network is Jabber. Jabber is not just an IM system, it's actually a rather flexible XML routing network as well. In particular, Jabber servers:

  • Have concepts of users, authentication, user databases
  • Know about publish/subscribe
  • Are active by which I mean they run continuously rather than being actived by inetd, which lets you perform some interesting optimizations
  • Can perform dialback authentication between themselves, which makes server spoofing a lot harder.

The Jabber protocol meanwhile is simple, extensible, well designed and XML based but it nonetheless supports pure binary OOB transfers. That's important to avoid ugly hacks like base64 encoding attachments. Attachments can be transferred as pure binary in separate streams, while the message contents takes the form of XML.

Using XML here isn't simply to be fashionable. Through using it smartly, we can give ourselves room to improve things later. For instance, let's imagine the To list as a list of elements. Now let's pretend that one email2 server is down or unreachable. Today, that would be dealt with by a bounce - we can do better. We can encode the information about the failure as a child element of the To element, meaning that the failure information is parseable by the client, and it is attached to the actual message which failed to properly deliver. That lets us produce more intuitive UIs.

Jabber addresses also have the advantage of looking like email addresses, making a transition easier. If I say your address is billg@microsoft.com, then the email2 server knows how to contact the relevant server to send a notification about a new message being available for the billg account.

Spam

The issue of spam is an interesting one. There is nothing (yet) inherant in this system which tries to prevent it, although the use of Jabber means that it's not possible to fake server addresses.

As messages must be uploaded along with the To list to persistant storage, it would be possible at least to use some basic heuristics to lock down your own facilities. Rules like "an account which uploads more than 1 message every 30 seconds for 5 minutes should be suspended" would probably catch most. You could also examine the length of the To list. That doesn't stop spammers setting up their own servers of course, at which point you have a problem in that it's hard to distinguish between legitimate mailing lists and bulk email.

Because the Jabber protocol is an interactive, routed protocol though, meaning that once an connection is established it's very easy to send small packets to arbitrary addresses or services, this gives us interesting possibilities for "instant blacklists". Let's say that all email servers must have a domain name. Jabber dialback can verify that this is the case, such that if it gets an incoming connection from foobar.org, it will resolve and connect back to foobar.org to ensure the connection is a valid one. That prevents spammers simply rotating their IP addresses, as a domain name takes time and money both for propogation and registration. Now we can place a button in email clients that moderates a particular domain - trust metrics for email servers anybody?

You can apply this technique to messages as well. If enough people moderate a message "-1 Spam", the message would be blacklisted and would be removed from the users message store, perhaps even before they had seen it. There would still be spam of course, just as there are still trolls on Slashdot, but the majority would be free to ignore them.

Interop

Like any new system, email2 would suffer from the network effects of the previous one. Nobody wants to use a mail system which nobody else uses. For that reason, interop facilities are needed. If they are designed in from the beginning instead of being an afterthought, the result can be far slicker.

SMTP gateways are possible - incoming messages are treated simply as messages uploaded by another user. Clearly, in order to make this process seamless, it's necessary to have the concept in the server of multiple types of user. If no email2 server for billg@microsoft.com answers, but an SMTP gateway does, then it's reasonable to deduce that billg doesn't have email2 but we want to interop with him. This might be similar enough to the transports technology in Jabber to actually make use of the same system, I haven't given it enough thought.

Writing such a gateway might not necessarilly be simple though. The ability to do things like delete messages but keep them around in the ether, change them after they've been sent and so on do not map well to SMTP. It's worth bearing that in mind as we design the system.

It'd also be nice to write an IMAP gateway, such that it becomes possible to use existing email clients (with reduced functionality of course) and sites like mail2web.com in order to access mail when perhaps all you have is a locked down net cafe kiosk.

I don't personally have time to work on this right now, I have to focus on my existing projects (autopackage and Wine). If somebody is interested in implementing it, then please talk to me as I have some more ideas I haven't written down here on the gritty details. Given the total cockup that was instant messaging, it'd be nice that if somebody does create new networks like this, they were created by the community.

thanks -mike


Existing clients?, posted 22 Aug 2003 at 17:47 UTC by jrobbins » (Master)

I really like this idea. I have also been thinking that entirely new communication protocols are needed. I have been thinking that the answer is more of a multi-site web discussion forum with caching clients. I think that is actually just another way of saying what you are saying.

Spammers will still find a way to spam, unless many more barriers are built into the system.

Would everyone in the world need new mail clients?

What would be the bridge between email1 and email2 during the transition period (which could last 20 years)?

Sounds familiar, posted 22 Aug 2003 at 18:58 UTC by obi » (Journeyer)

Maybe check out this, from Bernstein (author of qmail)

http://cr.yp.to/im2000.html

Neat, posted 23 Aug 2003 at 01:46 UTC by brondsem » (Journeyer)

Neat ideas. I've had many (I'm sure all of us have), but yours went far beyond some of mine. With such a drastic break from the traditional sense of messages, interopability will be very difficult. But of course a necessity due to the pervasiveness of email. Or maybe not: you could design email2 to run in parallel with email. Some people will write gateways, but discourage it. If you want to participate in some email2 discussion, you'll have to have an account completely seperate from your email account. This may make the adoption period of email2 much longer, but it would be much easier and cleaner.

I don't think we can accurately predict how people would use email2 to spread spam and viruses. Of course we can block the obvious methods of spamming, but the whole paradigm and functionality has changed and humans are very creative. We would have to start using the system and hope that when spam does arise, it can be blocked without too much difficulty.

This would be such a massive undertaking, not only in actual work, but in planning. Perhaps a roadmap of progressive stages could be proposed so that different parts of this can be implemented and used as others are developed. That development model would also spark user's interests by offering them something that's useful early in the development process. Otherwise developing this as one massive project would take years for even a useful beta release.

Bloody 'ell, posted 23 Aug 2003 at 02:50 UTC by fejj » (Master)

* Simplicity

Hmmm, maybe it's just me but your system sounds more complicated than what we already have.

* Bugfixes to the current system, for instance it should have proper threading that doesn't constantly break.

This isn't a problem with the specs or how email was designed, it's a problem with the software implementations. The reason threading breaks so often is because many clients don't set the In-Reply-To/References headers properly, or not at all.

You can't blame the system because the software vendors can't follow the specs - especially when the specs aren't that complicated.

* Genericity - there should be no need for awkward hacks to do things like mailing lists.

See above about blaming the wrong things.

* Security - it should not be as easily exploitable by spammers as SMTP is, and it should be designed in such a way that makes integration of digital signatures and encryption easy.

As others have mentioned, you will never be able to reliably stop 100% of all spammers out there. People will always find a way.

Some people might point out that an ESMTP server w/ SASL support could be used to make it harder for spammers to relay messages. I'm not sure this is really all that true, they *could* just setup their own SMTP servers I suppose. Anyways, just a thought.

Personally I think making an SMTP-like transport protocol that makes spoofing headers and the like more difficult (thus allowing people to actually trace it back to the source) is more valuable than anything else... but you neglected to mention anything like this.

mail stores

Sounds to me like what you are suggesting is a central database server with ACLs that stores all messages received - each message only being stored once with a refcount representing the number of users that have "received" it. Please note that Exchange does this :-)

I'm not trying to push Exchange here, I'm just pointing out that your 'design' is already doable with the current specifications. Note that the mail protocols do NOT specify how mail should be stored, there's nothing stopping an IMAP implementation from doing this for example.

Oh, and before you try to point out that "but the SMTP server will deliver 1 message per user that was meant to receive it!"

Bullshit. Read the SMTP spec :-)

Each message (even if to 500 people) only requires a single 'send' session (MAIL FROM:/RCPT TO:/DATA/etc) and so if the SMTP server was aware of the "store only one copy of the message for those 500 users to share" idea, it would Just Work (tm).

Spam

Given what I said above about one actual delivery for any number of users with a single SMTP session - "if server uploads more than X messages in Y amount of time" instantly screams that it hasn't been well thought-out yet. Clearly this won't work :-)

But Jeff, if we did things the Jabber Way (tm) where the client must start a new 'send' session for each user it is sending a particular message to, then it *could* work this way!

Okay, let me refute that...

What are the main reasons we wish to eliminate spam, exactly? Well, network congestion for one... right? That kills the "lets force the client to send once per user" idea pretty much straight away. This means we basically have to go back to what SMTP already does...

Another reason people want to eliminate spam is because it "decreases shareholder value" by reducing productivity due to users having to spend ever-growing amounts of time ridding their mailboxes of the stuff, right? Well, this is the real problem to solve... how can you reliably get rid of this shit? Many users swear by bayesian spam filters or SpamAssassin or something. Reading raph's diary, seems he is mostly frustrated with the speed of execution. Well, this is what you get when you implement this stuff in perl.

Granted, even writing it in c or assembler is gonna have loading time overhead. So seems to me that this needs to be moved to a lower level in the stack. Currently, if your mail server receives a spam message for 50 people in your domain, that message gets processed by the spam filter 50 times - once per user. What needs to happen here is for the spam filters to process this message *before* it gets delivered to each user's mailbox (or, in the central mail store idea - before it gets delivered to the central mail store).

Now, of course, you probably saw one logic flaw in my argument there. And that is that often times spammers will send spam specially crafted one per user. Or, if to a list of users, often not all of them will be in the same domain and so therefor if your company has 50 users - you may actually get 50 sessions of one recipient each rather than a single session of 50 recipients. However, I don't see how this is any worse than the Jabber Way (tm).

Likely if the spam filtering was moved to a lower level, you'd see a performance boost. You'll always get at least *some* spam that targets multiple users in the same domain on a regular basis.

Implementation

groan. XML this, XML that. Bloody 'ell. What's wrong with MIME? As far as I've seen, once those XML worms eat into your brain, it's hard to ever get anything practical done again. To an XML person, every nail looks like a thumb. Or something like that.

My frustration with email, posted 23 Aug 2003 at 05:26 UTC by raph » (Master)

My frustration with email is many-faceted. Above all, I am hugely resentful about the amount of time it currently soaks up unproductively: nursing flooded servers, deleting spam messages, configuring spam/virus filters, tracking down why these filters are producing false positives, and so on. I could really use this time for better things. Further, while the amount of time so taken is getting bigger, the effectiveness of mail is getting worse.

Yes, much of the email infrastructure suffers from very poor engineering. However, I feel that as soon as we have a clear vision of what we want our email infrastructure to look like, it should be relatively clear how to build that. In the meantime, sure we could we could rewrite Mailman in C so that it can process 100 virus bounces a second, rather than maxing out at 5, but it's a clear win in the grand scheme of things.

replies, posted 23 Aug 2003 at 10:29 UTC by mikehearn » (Journeyer)

obi: Thanks, I hadn't seen that. Seems nothing is ever new these days :)

fejj: I don't think it's that complex. The basic idea is what I was talking about, which is that mail is the senders responsibility (DJ Bernstein put it better than I did it turns out).

On threading - I think there *is* a problem with the current system, in that in order to see the actual entire thread you need to CC yourself on any mail you send, which isn't automatic, is easy to forget etc. Yeah, you can add stuff to mail clients to do that, but if it weren't an optional part of the protocol (ie, mail servers don't reject mails that don't state their parent) it'd be harder to write broken clients.

If you were going to create a new SMTP "like" protocol, you'd have broken compatability anyway, so it would seem to make sense to go the whole way.

Sounds to me like what you are suggesting is a central database server with ACLs that stores all messages received - each message only being stored once with a refcount representing the number of users that have "received" it. Please note that Exchange does this :-)

Well, I've not used Exchange much. It sounds like it only works on one server though, which makes sense as you're supposed to centralise mail inside corporations.

Oh, and before you try to point out that "but the SMTP server will deliver 1 message per user that was meant to receive it!"

... I wasn't going to claim that ....

Finally, on implementation - I'm sure you could use MIME as well, but seeing as Jabber is XML based, and Jabber already provides a lot of useful infrastructure, it makes sense to base the message format and protocol also on XML, as that is what is easiest. The only thing more annoying than people who use XML for no reason are people who automatically assume that any use of XML is for fashion purposes ;)

New email system, posted 23 Aug 2003 at 10:45 UTC by Omnifarious » (Journeyer)

I'm actually working on a new email system right now. It actually has a much broader scope than email, but email is my first target because SMTP and other things have so many obvious problems.

I'm not doing the message store thing directly. That could be currently be largely done with MIME using External-Body anyway.

The change I'm making is to the addressing and formatting of messages:

  1. All messages are sent to and from a pk-id. A pk-id is a 256 bit hash of a public key. Anybody can make one at any time. They can move it from ISP to ISP. It's also unforgeable.
  2. Messages will be encrypted and signed by default. Once you start using pk-id's, why not do this?
  3. Messages will be in a binary format so the signatures will work. Text formats are subject to mangling that will render a digital signature invalid. It frequently happens to me with OpenPGP signatures.

There will also be ways to publish signed assertions that a certain external mail server will handle messages for your pk-id, or that your pk-id can be found at a particular traditional RFC822 address, or TCP/IP address (i.e. both IP and port #). These assertions will all have expiration times. The email client will handle making and publishing them automatically. Another assertion that can be published will be a 'proof of work' requirement that can be used to implement hash-cash, pay me to read your email, or 'please reply with this cookie' schemes.

It's a big infrastructure change and upgrade, but I think it can be done by slowly taking over SMTP, then moving past SMTP to something better.

I'm calling it CAKE for Key Addressed Crypto Encapsulation.

threading, accounting, posted 23 Aug 2003 at 14:48 UTC by dan » (Master)

It's important to distinguish between implementation simplicity and interface simplicity. The latter is what you need if you want people to have a pleasant time using your system; the former is what you want if you're trying to attract them to work on it.

It could be argued that the present mail system has erred far too much towards implementation simplicity: I could characterise Raph's complaint as an objectikon to being required to spend all his time working on his email configuration instead of just being able to use it.

That said, the proposed email2 system is at the stage of needing people to work on it before anyone can use it, and appears to me to be unnecessarily complex in its implementation anyway. But then again, maybe as I haven't used Jabber and don't much like the XML religion, it might just be that you have good technical decisions for what looks more to me like gratuitous change.

Anyway. I have a couple of actual objections:

  • A single parent is not sufficient for threading - a thread should not be restricted to being tree-shaped. For example, you are away over the weekend while people discuss a project you're responsible for and ask you questions. When you return, you observe that the discussion is somewhat fragmented and wish to refocus it by sending a single mail replying to points raised in several messages.

  • A single parent is no sufficient for distribution lists. Suppose I want to send both to the board of directors and to my department, but not to everyone in the company. How do I do that?

I suspect further that using the same parent slot for both of these purposes (neither of which it's sufficient for anyway) will cause other problems, but I haven't thought about what these might be.

Incidentally, quotas are conceptually easy if you adopt the rule that the people who want to keep a message are the ones that pay for it. Initially that's the sender, who will (probably) want to keep it for at least as long as it takes to deliver; subsequently when each recipient views the message, they become accountable for a share as well. When the recipient deletes the message they no longer pay towards its storage: when it has no references at all, it gets GCed.

The sender needs to be able to tell whether or when an addressee has received the message, so he knows when it's safe to delete it or whether to try some other communication channel. If an addressee never ever receives the message, the sender needs to decide whether to delete it anyway. In fact, the sender's software could even arrange to delete messages automatically after n days whether received or not: this would be appropriate behaviour for mailing lists. Or spammers. The fundamental difference between a mailing list and a spammer is whether the recipients want the mail or not, so spammers automatically either get large bills for storage or don't get their mail through, whereas mailing list managers get to share their storage bills with all the happy recipients of their mail.

Re: replies, posted 23 Aug 2003 at 14:59 UTC by fejj » (Master)

mikehearn: yea, I read Bernstein's rough ideas last night, and it was a bit more clear than what you wrote so I had misunderstood what you meant by a mail store.

However, if you read Berstein's website - he's got a handful of unsolved problems with his idea.

I'd have to say that I'm also a bit leary of the "well, we'll store the actual email message on the sender's machine and make the recipients access it thru his box" - this is gonna complicate mail clients a LOT. Unacceptably so, in fact. It is also extremely inefficient.

He also says that a notification message will be sent to each recipient to let them know they have a new message (and presumably who sent it and info about the sender's mail store address so that they may read it). Hmmm, clearly this hasn't been well thought out :-)

Other than being extremely vague, it also means more bandwidth consumption. Isn't this one of the things we were trying to avoid?

You are again blaming the system rather than implementations reguarding threading. Evolution supports a concept called vFolders which pools together any number of the user's mailboxes in a single view. To solve your threading complaint, one could simply create a vFolder view of their Inbox and Sent folders. This idea ould even be brought to the point where the user could pool all related messages in a single view. Or whatever... This solves your "I have to CC myself on every mail I send out" problem.

As far as the other threading problem reguarding mailers that don't set the In-Reply-To/References headers - how is your new protocol going to solve this? Fact is that it can't. So don't even pretend that a new protocol will solve this - not all new emssages will have to reference another message. If the sender's client is brain-damaged, it will still break threading - no matter what the protocol is, because there is no way for the protocol to know that this new message being fed into the system is in reference to another message already in the system.

As far as SMTP-like protocols - no reason it can't be completely compatable with current SMTP server implementations.

There are a few problems with your "XML is great" section. First off, the "format" you describe doesn't exactly allow for complex heirarchial structures that MIME provides. You describe that message contents would be in XML while the attachments would be in raw binary. Uh... this design seems to be suffering from the AOL view of the world (AOL's mail client cannot send nested multiparts, and so "attachments" seem to be separate from the actual message content when in fact they aren't). I also found it somewhat amusing that you thought using XML would rid us of "the ugly hack of base64 encoding". XML, too, must encode binary data :-) Oh, and its method is a lot less efficient compared to base64.

I should point out that a MIME based system could easily handle raw binary "attachments" as it stands today, assuming we could rely on all mail servers between sender & recipient supported binary transfers. (Content-Transfer-Encoding: binary).

The only thing more annoying than people who use XML for no reason are people who automatically assume that any use of XML is for fashion purposes ;)

What was your non-fashion purpose for using XML again? :-)

I honestly think that before you do any more theoretical design of the email2 system, that you carefully read and understand how the current system works. Then you can examine its deficiencies and come up with ways to fix these problems in a more-informed manner.

PS: I still fail to see how email2 is any simpler than the current system. What is simpler? For whom? Certainly not simpler for the developers... and unless the developers are able to implement this overly complex system to a degree in which it is completely transparent to the suer, I don't see how it'd be simpler for the user either.

References and Reply-To, posted 23 Aug 2003 at 16:10 UTC by Omnifarious » (Journeyer)

Near as I can tell, MIME message/external-body is completely sufficient for a message store approach. One interesting thing you could do with both the URL for external-body, and the References/Reply-To headers is use one of the P2P content URIs to reference messages. This would push all the caching and disk space usage decisions into the P2P content layer, which is really where it belongs anyway.

To me, static data should _always_ be referenced by a content URI.

What I'm doing with CAKE doesn't really change anything you'd do with RFC 822 except for the addresses. It's more about replacing SMTP.

standards/spam, posted 24 Aug 2003 at 02:31 UTC by elanthis » (Journeyer)

I agree with fejj - making more standards isn't going to magically make people comply. If foolish coders ignore the old standards, why would they magically pay attention to the new ones?

So far as e-mail system stopping spam; one solution I thought of, that wouldn't be _too_ hard to implement (just difficult to get everyone to actually do it), would be to require all incoming e-mails to enter over a public key system, but require the public key of the sender to be stored in DNS. The sender connects, the receiver does a reverse lookup (this must be done to ensure you get the correct domain for the sender), finds the email key in the DNS heirarchy, and gives the sender the receiver's key encrypted with the sender's key. Only people with domain control (which locks out most ISP users, or at least allows the ISP to revoke keys for offending users) can install the keys into the DNS heirarchy. For spammers with their own domain and IP range (happens enough), it's at least a _lot_ easier to blacklist - you just knock out the domain of anyone who sends spam, since you know for sure the spammer owned the domain. (ISPs should of course manage customers' outgoing spam on the ISP's server.) It's even usable in conjunction with the current e-mail system - e-mail coming in without the encryption are "grey", e-mails from a blacklisted domain are "black", and a e-mails from an encrypted sender (not in the blacklist) are "white."

I'm sure there's some fatal flaw in the plan above, 'cause I'm not particularly that bright, but it sounds good to me. ;-)

You could start now, posted 24 Aug 2003 at 05:25 UTC by dan » (Master)

Earlier this year I started getting bounces which indicated that spammers were forging my address in the From/Sender line, and since then I've routinely been PGP-signing all my outgoing email. Now if anyone wants to whitelist me - or blacklist me - they can get a copy of my key and be sure that the messages they're passing/dropping are actually from me, and not just someone else pretending to be.

These days some of my correspondents also sign their mail - and I'm not talking about crypto nuts for the most part, just ordinary techies. When the volume increases just a little more, I'll also be figuring out how to make my mail reader assign a higher score to signed mail, and I'm looking forward to the day that I can start dropping all unsigned mail in the probable-spam folder.

OK, so my key is only on the keyservers rather than in DNS (is there a RR for public keys, or is this still being discussed?) - centralised point of failure bad, blah blah - but it's a start. And this is something that any individual can start doing now; you don't have to wait for a major infrastructure change to make it possible.

Using GPG or PGP, posted 24 Aug 2003 at 06:28 UTC by Omnifarious » (Journeyer)

The problem with gpg or PGP signatures is that the standard is fragmented (PGP/MIME and the icky inline method). Support is spotty. OpenPGP and GPG are not interoperable for many things. Also, since mail is in a text format, it is often munged to make signatures invalid. Lastly, those programs are painful to use because of a huge plethora of options and the fact that in general, they were designed by crypto geeks for crypto geeks.

I agree that if everybody at least signed their email, things would be a lot better. But, they don't, and they won't with GPG or PGP. A new piece of infrastructure needs to be built that is both nearly transparent to the average user, and can't be used at all without signing and encryption being the standard practice.

Re: Using GPG and PGP, posted 24 Aug 2003 at 12:06 UTC by werner » (Master)

OpenPGP and GPG are not interoperable for many things

Huh? If there is anything in GnuPG not OpenPGP compliant, please tell me. Probably its only a typo and you meant PGP. To be fair, I have to say that PGP 8 made a huge leap towards OpenPGP and there are no serious interop problems I know of.

Support for PGP/MIME is widespread and almost all decent Unix MUAs support it. Classic armoring has always been problematic for may people (except for those who don't need more than plain old ASCII) but for unknown reasons too many folks still think it is the best way to encode it and ignore the benefits of MIME entirely. BTW, MIME also offers a neat solution to the how-can-we-encrypt-the-subject problem: Simply put it into an rfc822 container and have the MUA decode it on the fly.

Given the bad experience trying to have everyone migrate from classic PGP encoding to PGP/MIME - any attempts to drop SMTP in favor of something new seems to be entirely futile.

gpg for spam, posted 24 Aug 2003 at 14:35 UTC by elanthis » (Journeyer)

I don't see how GPG can really help with spam, unless you only allow certain people to mail you. Anyone can make a GPG key, anyone can put those in key servers, etc. The idea of making DNS responsible removes the "one or more keys for every user" to "one key for each domain", which makes blacklisting actually feasible (and far more easily automated; it'd be simple for an anti-spam organization to keep an public domain blacklist), and also removes all effort from ISP end-users - any mail they sent thru their ISP's mail servers would automagically use the domain key verification method.

GPG for spam, posted 24 Aug 2003 at 15:27 UTC by dan » (Master)

While it's true that anyone can create and upload a key to the servers, it's much less likely that anyone I trust will have signed it

I'm in two minds about what to do if a spammer went to the trouble of creating a key and getting it attached to the pgp web of trust. Unless he actually maintained the email address in it as a valid place to send mail, that would still represent an abuse of the pgp system and I'd just remove my cert of whoever signed it.

In fact, I'd be strongly tempted to revoke my signature of anyone who signed a key used for spamming even if it really was a valid mailbox, (unlikely though this situation is), even though this is not how pgp is supposed to be used. The ideal would be if I could apply the there were multiple attributes for 'trust':

  • I trust this person to identify other people as people (i.e. sign their PGP keys)
  • I trust this person to identify other people as non-spammers
  • I trust this person to identify people whose binary content I'd be happy about running on my computer
  • ... etc

I haven't thought too hard about the transitive issues here, so there may be problems I haven't thought of. For the particular case of spammers I don't think it's a pressing issue anyway, but there are other applications it'd be useful for.

GPG for spam, posted 24 Aug 2003 at 19:08 UTC by dwmw2 » (Master)

elanthis writes:
The idea of making DNS responsible removes the "one or more keys for every user" to "one key for each domain"

That's not necessarily the case. You can do it in DNS and still have one key per user -- which is almost certainly the way you'd want to do it.

If you receive a mail purporting to be from dwmw2@example.com, you look up a public key in either a TXT or a new type of record for something like 'dwmw2.example.com.' and require that the email be signed by the private key corresponding to that public key.

Doing it per-user means you can get immediate negatives for bogus users at valid domains, and you can revoke or change individual users' keys if they're compromised or otherwise invalidated, without having to change the key for the whole domain.

Using a per-domain key would mean either you trust all the users with the private key, or more sensibly you require that all users use the domain's mail servers with authentication for outgoing mail. Letting users have their own keys on their machines and send mail directly is more optimal, I suspect.

GPG/PGP a solution? No., posted 24 Aug 2003 at 21:04 UTC by Omnifarious » (Journeyer)

First, yes, I mean PGP, not OpenPGP. There are interop problems with PGP and gpg.

PK (not PKI *shudder*) technology allows you to have effective whitelists. Whitelists are a semi-joke without it. It isn't too hard for a spammer to figure out who you're likely to accept mail from and forge the header.

Practically every protocol on the Internet suffers from the exact same set of problems:

  1. An IP address is determined by network topography, not identity. If you want an identity independent of topography, you end up going with some centralized service like AIM or hotmail, or DNS.
  2. You can't be sure that the thing you got came from who it says it came from.
  3. You can't be sure that the data you're sending or have recieved hasn't been read or modified by someone along the way.

All of these problems can be solved by effective use of cryptography. Piecemeal approaches in which one protocol or another gets a crypto shot in the arm lead to lots of the same bugs and holes happening in every implementation.

Something needs to be done about this. The place to start doing something right now is email because, IMHO, email is facing a criticial meltdown situation right now from both spam and worms.

OpenPGP kind of solves the problem for email, but it doesn't very well. Again, three reasons (because I like the number 3):

  1. For identity, people don't store gpg keys in their address books as the destination of email, they store an email address, which is determined by network topology. So, with OpenPGP, identity vs. topology is still broken.
  2. Also, it is very complex to use effectively. Most people don't want to care about the details of how a web of trust works, or what it means to sign a key, or anything like that. They don't want to know about public keys at all if they can help it. OpenPGP, in the forms in which it's being deployed right now is much too complex for the average person who has more important things to care about than cryptography.
  3. Lastly, because of a vain attempt at interop with people who don't have OpenPGP, most messages are sent in plaintext and simply signed. Only half the benefit of public key technology is being realized, even for those people who choose to climb the steep learning curve.

Something different is needed. That something should, as a concious design choice, avoid any but the coarsest interop with older technologies in order to convince people to upgrade (because otherwise the won't). It also needs to offer enough obvious benefits to individual users so they have an incentive to upgrade. Lastly, its design should learn from the mistakes of older protocol designs in order to create as large a technical barrier as possible to attacks on the networks.

GPG and trust for spam, posted 24 Aug 2003 at 21:07 UTC by Omnifarious » (Journeyer)

Another thing GPG lets you do is have services which will investigate email being sent from a particular pk-id and create global blacklists based on their investigations. This forces the spammer to frequently generate new pk-ids, which increases the cost to them.

While it's not used this way right now, the web of trust can be used to assign negative as well as positive trust.

re: GPG and trust for spam, posted 25 Aug 2003 at 03:46 UTC by Mysidia » (Journeyer)

Another thing GPG lets you do is have services which will investigate email being sent from a particular pk-id and create global blacklists based on their investigations. This forces the spammer to frequently generate new pk-ids, which increases the cost to them.

You would think you'd need to use something other than the pk-id, I think

Otherwise, a spammer could start generating keys incorrectly in order to attack the system (say by generating certificates with the same pk-id as another legitimate user)

Because it's probably not too costly for the spammer to generate a new key for each run (pretend they're a new user), a reliable spam prevention system needs to use positive trust, not negative trust.

re: GPG and trust for spam, posted 25 Aug 2003 at 06:28 UTC by dwmw2 » (Master)

Mysidia writes:
Because it's probably not too costly for the spammer to generate a new key for each run (pretend they're a new user), a reliable spam prevention system needs to use positive trust, not negative trust.

Or both. If you get a message signed with a key you don't know, you can issue a 4xx temporary reject, and only accept it later when they're whitelisted, or reject it properly if they're blacklisted.

RFC : SPAMCENTER, posted 26 Aug 2003 at 11:27 UTC by mdupont » (Master)

RFC : SPAMCENTER

The spamcenter is a place to deliver spam to.

By collecting spam there, instead of bouncing it or delivering it to the user, you will be able to reach the following goals :

1. See who is spamming.

2. See what is being spammed.

3. See when the spammers are spamming

4. Alow users to retrieve non-spam

5. Allow patterns to be set up to compress the spam as just differences to other spam.

6. Allow tester of antispam software to run on the data files.

7. Allow people to route their mail via the SPAMCENTER for screening.

8. use certifications and webs of trust to allow certified users to report new spam.

9. Allow certified users to create patterns to catch spam.

10. store all the data under the GFDL and GPL.

Some notes, posted 26 Aug 2003 at 19:06 UTC by Malx » (Journeyer)

Forst of all - DNS will be the center of trust for every solution you could imagine (PGP or not). If not - then some other central server or root-ed hierarchy.

MS Exchange - yes it is not for only 1 server. It supports replication. Same with Lotus Notes.

SPAM will be even in new protected system - it will be delivered though gates (smtp to Email2, web to Email2).

You must not change MUAs - just make supporting layers for SMTP/POP3/IMAP (same as in Exchange - there is IM and NNTP also :)

And main thing please make a list of typical e-mail usage. That means full cycle:

  • you should get e-mail somewhere (web, vCard, from e-mail, Ldap, etc)
  • first time post (that is where Spam is) - this is not protected
  • every day posting - poster is known to recipient
  • e-mail cange or posting from different e-mail (different location, but same person)

What are you trying to fix?, posted 7 Sep 2003 at 23:24 UTC by kmself » (Journeyer)

mikehearn: from what I've read of your problem statement/proposal, I'm inclined to write this off as yet another poorly structured misguided effort.

You're mixing and matching promiscuously and without discrimination from client (MUA) and server (MTA) issues.

Spam, connection authentication, delivery assurance, and service assurance / DoS resistance are MTA issues.

Quoting, threading, content and sender authentication, are MUA capabilities.

These are two seperate issues. There is some overlap (MTAs shouldn't change opaque content, as described by RFC 2015, for example), but lumping them together is...naive.

Each of the bulleted items in your goals is simply...vague to the point of meaninglessness. Simplicity is in the eye of the beholder. Security has as much to do with configuration and use as design (though I'd grant a system could be built to be security-aware). Usability must state usability goals (addressing, delivery verification, spam reduction, etc.).

My recommendation is: scrap this discussion. List what does and doesn't work with the current mail delivery system. Prioritize your major concerns and gripes. Suggest possible solutions.

There are problems with email, no doubt. It should also be remembered, though, that SMTP is just that: a simple mail transport protocol. It's just a simple protocol, don't ask too much of it. It's concerned with transport of mail. Not validation, message integrity, spam filtering, etc.

The experience with both email and networking in general over the years shows that secure systems can be built over public bases. I see the issue as being one of finding the appropriate ways to plug in fixes, while retaining usability of the existing system. I don't think SMTP needs to be scrapped wholesale. And I firmly believe that vendors, particularly those with large, entrenched, illegal monopoly influence over the IT industry, will react in violent opposition to proposals which weaken their hold. Translated: you'll have the issue of misbehaving mail clients. A compelling case needs to be made for people to use alternatives, and the cost of this transition lowered greatly.

OSI Model For Email2, posted 8 Sep 2003 at 20:08 UTC by nymia » (Master)

I think the article was well written, except that it somehow failed to map the features of Email2 to the various levels defined in the OSI Model. It could have been a solid article explaining in detail where security or authorization can be implemented. Maybe security can be implemented at the Session or Application level, but putting everything on a single layer like the Data Link or Network layer is simply way beyond the definition.

SMTP seems to have several layers, each of which can be dissected and analyzed. There is a possibility that SMTP was initially designed emphasizing on delivery which is basically found in the Data Link and Network layer.

Overall, it doesn't hurt to add more code in the upper layers, probably squeeze more juice out of SMTP before it gets thrown in the bin.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

X
Share this page