Recent blog entries for raz

MX6, An Immodest Proposal for Easing the Acceptance of Email over IPv6

Most current efforts to start accepting email over IPv6 at scale appear to be stumbling on two irreducible problems:

  • IPv4-dependence of current abuse-prevention techniques and the absence of a reliable way for a receiver to say "apologies, I can't assess this message over IPv6, please retry over IPv4" means that most have been unwilling to even attempt accepting email over IPv6 generally.
  • This requirement is sufficiently novel that merely using an existing SMTP response code or defining a suitable new one is likely to cause breakage (e.g. bounces) when encountered by senders unaware of the special meaning of the response code.

If these problems are the relevant ones and are truly irreducible then what is required to allow more early adopters to start is a way for a sender to affirmatively indicate its ability to interpret a "fallback to IPv4" response code, even before an AAAA record is returned when looking up the MX; if it doesn't indicate its ability to do so then only an A record should be provided, meaning that delivery will only be attempted over IPv4. Proposed here is a means of doing so by specifying a new Service name for an SRV record ("MX6") and the sequence of actions around a new response code.

A receiver NOT implementing MX6 might publish the following:

only4.example MX 0 mx.only4.example
mx.only4.example A 10.0.0.1

while a receiver implementing MX6 might publish the following:

with6.example MX 0 mx.with6.example
mx.with6.example A 10.0.0.2

_mx6._tcp.with6.example SRV 0 0 25 mx6.with6.example
mx6.with6.example A 10.0.0.2
mx6.with6.example AAAA 0:0:0:0:0:ffff:a00:2

This would support the following scenarios:

An MX6-unaware sender delivering to an MX6-aware receiver

  • Looks up "MX with6.example"
  • Gets mx.with6.example and then 10.0.0.2
  • Connects via IPv4 to 10.0.0.2
  • Proceeds as usual

An MX6-aware sender delivering to an MX6-unaware receiver

  • Looks up "SRV _mx6._tcp.only4.example"
  • Gets nothing
  • Looks up "MX only4.example"
  • Gets mx.only4.example and then 10.0.0.1
  • Connects via IPv4 to 10.0.0.1
  • Proceeds as usual

Both MX6-aware, receiver is willing to accept the message over IPv6

  • Looks up "SRV _mx6._tcp.with6.example"
  • Gets mx6.with6.example and then 10.0.0.2 and 0:0:0:0:0:ffff:a00:2
  • Connects via IPv6 to 0:0:0:0:0:ffff:a00:2
  • Proceeds as usual

Both MX6-aware, receiver is unwilling to accept the message over IPv6

  • Looks up "SRV _mx6._tcp.with6.example"
  • Gets mx6.with6.example and then 10.0.0.2 and 0:0:0:0:0:ffff:a00:2
  • Connects via IPv6 to 0:0:0:0:0:ffff:a00:2
  • After the DATA phase gets 550 5.5.XXX
  • Connects via IPv4 to 10.0.0.2
  • Proceeds as usual

Further thoughts:

  • The use of the word "acceptance" in the title is intentional. Just about everything needed for senders to send over IPv6 is already in place, so this is not specifically about a delivery over IPv6. It's addressing a receiver concern and therefore about receiving - and more importantly about acceptance - over IPv6.
  • It is to be hoped that efforts to modify IPv4 reputation techniques to suit IPv6 are successful, however this is far from certain at this point.
  • A very important characteristic of this proposal is that it makes no attempt at all to specify what reason a receiver might have for suggesting that a sender fallback to IPv4. There have been various proposals to specify in standards a set of authentication mechanisms or other practices a sender might be obliged to perform in order to be granted access via IPv6. The history of abuse-prevention techniques suggests that (a) any such specification is likely to be ignored by receivers anyway (b) even if they don't, our odds of correctly guessing what to specify such that it gets implemented but not abused are about nil and (c) the approach used by receivers is likely to include elements largely outside the control or visibility of senders, like end-user addressbook matching. The only interoperability items required are the means for a receiver to suggest fallback and for a sender to indicate - even before receiving an address record - a willingness to fallback, so that's all that's specified here. The process of receivers working out what they need and senders implementing it will take time to unfold and need not be specified at this point.
  • Other means of having a sender specify this willingness include:
    • Entirely separate protocols that don't require alteration to mail-server software (as is the case e.g. for Domain Owners for SPF, which helped accelerate rollout), but in light of the problem statement at the start of this proposal, this does not appear feasible.

    • Large receivers or intermediaries operating registries of which servers are capable of what, which is pretty obviously a terrible idea and doesn't solve the "before an AAAA record is provided" requirement (because of the use of shared resolvers) anyway.

    • Altering DNS responses by the AS# of the network containing the DNS resolver as google did when rolling out IPv6 connectivity generally, however this is not workable as there isn't the same identity between network location and implementation (MX6 awareness in this case, IPv6 peering in Google's).

  • The use of 550 5.5.X is loosely inspired by both RFC 5321's 551 "User not local; please try " and by 452 "Requested action not taken: insufficient system storage" after RCPT TO (too many recipients for this transaction, retry this recipient and any subsequent ones in a later transaction). As this response code is not expected to be seen by non-MX6-aware senders the choice between 4xy and 5xy for this novel meaning is somewhat arbitrary, however choosing 5xy does mean that unaware senders who are mistakenly delivering over IPv6 will get an immediate fail (and thus an immediate bounce to the user), rather than be forced to wait their maximum retry time before doing so. The choice between X.5.X and X.7.X is somewhat arbitrary too; X.5.5 is almost exactly on point except that it does not connote "retry over IPv4", so a new code is required anyway.
  • The use of SRV records does open up the possibility of receivers specifying different ports for this SMTP-with-special-meaning (contrast submission over TCP/587), but this does not seem wise as it would invalidate widely implemented outbound TCP/25 control mechanisms. A better approach is to use TCP/25 at additional IPv6 addresses to separate the MX6 server(s) as such addresses are likely to be plentiful.
  • While the basis upon which to decide when to suggest fallback to IPv4 is entirely up to the receiver, some useful things to include after the fall-back-to-IPv4 response code might include a multiple-line reply with:
    • Authentication-Results: headers as would have been prepended to the message, to aid authentication problem diagnosis

    • Where a reputation service that allows registration is in use (e.g. something like Spamhaus PBL but for IPv6) should such a thing prove useful, a URL for starting that process

  • The name of the SRV Service is a little misleading (MX6 handles both IPv4 and IPv6, and MX continues to be able to handle both also), but is compact and appears to be about the right thing to call it (_mx6withfallbackto4 isn't very catchy). Hopefully this is the last time there's a need to introduce a non-backward compatible SMTP change that requires that a sender indicate awareness prior to receiving an address record; if so, then we won't be propagating still more SRV Service names every time a new extension is proposed.
  • One of the problems that led to this thinking was a particular problem for a large service provider whose customer wanted IPv6 receiving with no dependence upon a registration or other reputation service. This does solve that problem, albeit by creating a potentially more difficult one, which is the reason for the title.

Comments welcome.

A defensive strategy for accepting email over IPv6

Accepting email over IPv6 risks providing spammers with an easy entrance point because IP-address blocklisting is not likely to be viable for an address space as large as IPv6′s. The need to continue to accept email over IPv4 for the indefinite future provides a useful safety valve in that a receiver can push messages offered over IPv6 whose validity is uncertain back to the existing IPv4 service, thereby reducing the dependence upon – or even eliminating the need for – IPv6-address blocklists.

To take advantage of this a receiver needs whitelists (manually maintained, automatically generated, user addressbooks, provided by a reputation data provider, …) and the ability to test and act on domain authentication (SPF, DKIM, DMARC, …) during the SMTP conversation. Any message failing authentication, or passing authentication but not matching a whitelist, need merely be given a
temporary failure (4xx) response code. A well-behaved MTA (e.g. non-spammer) receiving 4xx responses will work through the receiver’s listed MXs until it finds one that gives an authoritative (2xx/5xx) response.

The argument that email receivers will need to accept email over IPv4 for the indefinite future is well-known and almost certainly correct, however organisations may find themselves wanting to accept email over IPv6 as well for at least two reasons:

  • The desire to pilot, experiment with or research acceptance of email over IPv6.
  • An externally imposed mandate that IPv6 be deployed for “all applications”.

The approach described here can be used in two different ways:

  • A defensive deployment from the outset for those who wish to get something working, but would prefer to deal up front with the risk of spammers exploiting the difficulties of IPv6-address blocklisting.
  • A fallback option for those who are willing to deploy without solving this problem, but wish to have a documented strategy for dealing with this problem when/if it arises.

In either case the benefit is the same: a production-use-ready approach for accepting at least some email over IPv6 with a safe fallback to IPv4 for the rest.

Ideally all of the relevant authentication mechanisms (SPF, DKIM and DMARC) can be processed and acted on during the SMTP transaction, but this approach can be adopted even if this is only true for SPF; the result will simply be that some of the email that could have been accepted over IPv6 will instead be pushed to IPv4.

Most types of whitelist data can be applied:

  • IPv6 address whitelists can be used as is.
    • A locally-maintained list of IPv6 addresses of mail-servers of trusted partners.
    • IPv6-address whitelists supplied by reputation data providers.
  • Domain whitelists can be used in conjunction with domain authentication (SPF (perhaps subject to DMARC’s alignment rules), DKIM, last-resort SPF data from a reputation data provider, …)
    • A locally-maintained list of domains of trusted partners.
    • A domain whitelist from a reputation data provider
  • In situations where end-user addressbooks are accessible during the SMTP conversation, the presence of a sender in the recipient’s addressbook can be treated as a whitelist match
    (subject to authentication checks as above)
    • For webmail providers this is pretty much a given
    • For others this is sometimes available from existing mail-server software, in other cases software can be used to automatically gather this data locally.

In general, content-based anti-spam filters need not be used for messages which have passed any of the above. A particular exception is malware checking: clearly, it is not desirable to deliver malware even if it’s from a source that’s known to behave well, e.g. because someone’s PC has become infected and is emailing exploits or phish to each of the user’s contacts.

Weaker signals might also be used to decide to accept a message subject to content-based anti-spam filters not detecting a problem. These include:

  • The existence of an rDNS entry for the source IP address, the existence of a matching forward DNS entry and the use by the connecting MTA of the same name in the HELO/EHLO string.
  • The connection originating from an AS, or a network within one, known to be particularly stringent in its containment of abuse. To avoid confusion, I’ll use the term
    “greenlisting” to refer to the listing of IPv6 addresses or networks as being allowed to connect but still subject to content-based filtering.
  • The RFC5322.From domain name being registered with a registrar
    known to be particularly stringent in de-registering abusers. This would of course have to be done in conjunction with domain authentication as above. (This is also somewhat hypothetical, I’m not sure that any registrar is currently strict enough for this purpose.)
  • Even without a domain whitelist entry, the historical behaviour of the RFC5322.From domain in sending mail to the receiver’s IPv4 service. Again, this would have to be done in conjunction with domain authentication.
  • The presence of well-formed, non-anonymised whois information for the RFC5322.From domain and/or the source IP address block.

These are all a little less robust than competent whitelisting, and may have to be tried on a “sacrificial lamb” basis, however as with the broad strategy of building on an IPv4 fallback, this is easier and safer to do than it was in an IPv4-only universe.

Astute readers will notice that what I am describing is an implementation of the Aspen Framework that Meng Wong described in his Sender Authentication Whitepaper 8 years (!) ago. I’d suggest that:


  • The concern about the infeasibility of IPv6-address blocklists and the certain availability of the IPv4 fallback for the indefinite future provides an opportunity to implement this approach for IPv6 receivers that never existed in an IPv4-only environment.

  • The period of time that this has taken should be a strong warning to people who blithely assume that email can simply be moved to IPv6 by mandate. Email is an unusually tough problem, progress is slow.

  • That things move so slowly makes incremental approaches like the one described here more valuable than they might otherwise be. (There’s little point piloting a partial approach that will be rendered obsolete when the “complete” approach arrives 6 months later. If you assume that a complete approach is many years away, then there is more to gain from the deployment of partial approaches.)


It is conceivable that this will eventually be the beginning of a migration strategy, that over time so much email will be able to be accepted on a “we know something good about this message” (rather than a “we know nothing bad about this
message” basis) that it will become viable to reject outright any email about which nothing good is known. I don’t actually expect that this will be the case, but also suspect that so much will change during the parallel running of delivery-to-MX over IPv4 and IPv6 that it’s not practical to predict how delivery-to-MX over IPv4 might be phased out. The important observation would appear to be that this approach provides a production-use-ready way to start.

Additional thoughts:


  • There is a legitimate concern about the additional workload that this will create – both for receivers and legitimate senders – in causing duplicate delivery of some/most/all legitimate email. I’d suggest that for early adopters this will not be a great concern, particularly while the total volume of email-over-IPv6 is small.

    • If many receivers adopt this approach when piloting accepting-over-IPv6 then the incentive to spammers to move to IPv6 will be greatly diminished in the first place, thus cutting much of the duplicate workload for receivers who senders can see are doing this. (This effect seems unlikely to be large enough to render the infeasibility of IPv6-address blocklists moot, but it would be a great side-effect!)

    • Early adopter senders are more likely to adopt full authentication anyway, however insufficient whitelisting may make encountering large numbers of receivers who push traffic to IPv4 cause costs that senders aren’t willing to incur. I’d
      suggest that operational experience will tell us how this plays out and that senders and receivers will be in a better position to work out what to do about this when/if there’s enough traffic for it to be an actual problem.

    • This problem is likely to be particularly acute for forwarders for whom far less mail is likely to pass authentication, despite being legitimate. As in other contexts, forwarded streams are likely to require special handling (e.g. by not delivering them via IPv6 except where DKIM passes, or treating delivery-via-IPv6 as a problem to solve later). It may also be the case the receivers can simply greenlist known-strict forwarders and apply content-based filtering as usual. (Note that such forwarders would not appear on useful blocklists anyway.)




  • There is another concern about 4xx responses causing poorly-behaved sending MTAs to delay even before trying other listed MXs, much as there is for greylisting. RFC5322 5.1 only specifies “In any case, the SMTP client SHOULD try at least two addresses.” If it turns out that a substantial number of
    sending MTAs limit themselves to just two addresses, then implementing this defensive approach would require listing only a single IPv6-reachable MX. This is sufficient from fault-tolerance perspective (fallback to IPv4 being an intrinsic part of the design), but may run afoul of external mandates about MX configuration rules. Such rules could usually be adjusted as part of implementing this approach, but this may nonetheless end up being a show-stopper for the entire approach for some organisations. Only operational experience will tell for certain.


  • Also as for greylisting, there may be a problem with legitimate-but-poorly-behaved sending MTAs that never retry after a 4xx response. As these are rather small in number, the same approach that was used for greylisting is likely to be viable: the development of a database of known legitimate senders who don’t deal correctly with 4xx responses and simply greenlisting them. Mail from these sources should be still be checked by content filters of course.

  • There may arise a concern that the use of addressbook data in deciding how to respond during SMTP might expose an addressbook-harvesting risk. I’d suggest that this was not a concern because it would only apply where domain authentication had succeeded with known good senders (not something that a botnet could usually do by itself) and even then, would only apply if the harvester had guessed a known sender+recipient pair. This appears to be too small an attack surface to worry about but, as ever with security concerns, this needs to be monitored and may need to be the subject of future work.


Relevant disclosure: I work for TrustSphere which supplies software that can be used for whitelist automation (TrustVault) and reputation data that can be used as described above (TrustCloud). On re-reading it occurs to me that this post makes a case for using TrustSphere’s products. I’d like to clarify that it is not the case that I believe the above (or wrote it without believing it!) because I work for TrustSphere but, rather, than I work for TrustSphere because I believe the above. See also my comments on this from a few years ago.

23 Nov 2012 (updated 23 Nov 2012 at 05:18 UTC) »
Towards ‘serverless’ social-networking

The rise of ‘cloud’ services and the rapid uptake of smartphones has created an unexplored – and perhaps quite large – niche for social software outside the control of advertiser-funded social network services (Facebook et al). While smartphone power and connectivity constraints make pure peer-to-peer social software on smartphones impractical, it is possible to construct a hybrid approach which moves much of the heavy lifting to undifferentiated/non-sticky services in the cloud while retaining owner/user control.

By contrast:


  • Many people, perhaps a majority, are perfectly happy to depend
    upon advertiser-funded social network services.


  • A visible majority is not and is therefore putting effort into personal server projects like FreedomBox to run a server in their own home which stores/shares/controls their own data and perhaps some of that of their friends. This approach avoids the power and connectivity barriers in smartphones, but requires the purchase, installation, connection, maintenance and physical securing of a device at the owner/user’s home and requires some technical expertise in dealing with the maintenance
    of the server operating system and software. Even if backups (and restores!) and upgrades are fully automated, diagnosing and correcting failures requires specialist expertise – and the
    time to use it – that the vast majority of people don’t have. This latter piece is a major part of the value that SaaS-providers generally – and social network services in particular – provide.


  • For people not concerned about governmental/law-enforcement interference, a [virtual] personal server in a data centre provides all of the other relevant benefits of a personal server and eliminates all of the physical aspects, but still requires specialist expertise in diagnosing and correcting failures.


  • For people who aren’t willing to run a server – whether virtual or real – but are willing to have their data in the hands of someone who isn’t selling advertising to fund their service and are willing to incur a small cost in time, money, inconvenience, etc., a variety of approaches are being explored. Notable amongst these are distributed/federated social networking software (e.g. Diaspora) and paid-subscription-only services (e.g. App.net).


  • Another group of people – myself included – would prefer not to run a server if possible – or are unable to – but would very much prefer that their data was under their own control. This is the unexplored niche.

The options are:




























Will purchase, install, connect, maintain and physically secure
device at residence. Will maintain server software.
Will maintain server software. Won’t maintain server software. Willing to
pay $/time/inconvenience for increased freedom.
Will only use phone.
Concerned about governmental/law enforcement interference. FreedomBox
Concerned about control of data by others. FreedomBox on a virtual server. P2P app with non-sticky service help.
Concerned about advertising-funded sites skewed incentives
and/or constant unpleasant changing of the rules.
Diaspora on a friend’s server.

App.net.

Not concerned. Facebook

To understand where the additional niche exists, imagine that smartphones generally had:


  • effectively unlimited battery capacity (comparable to that of a PC plugged into a national grid)

  • effectively unlimited CPU capacity (smartphones are now so powerful that this is rarely a constraint, but it would be nice if a photo/video that the owner/user shared suddenly going viral didn’t make it impossible to use the phone for several hours)

  • effectively unlimited network capacity (enough that authorised people browsing the owner/user’s photos could be loading them directly from owner/user’s the phone as they viewed
    them)

  • a fixed IP address and no NAT between it and the public Internet (so it could serve data without help from hosted services)

In this environment, it would be possible to produce social network software that ran only on phones and talked only to peers on other phones. Unfortunately on current and likely future mobile phone networks, three of those things are always false and the fourth is usually false. It is possible, however, to use a certain class of network-hosted service as force-multipliers for an app running on a phone to give it capabilities [almost] as good as those four things, and to do so without giving control away:



  • The simplest approach uses an object-storage service (Amazon S3, Rackspace CloudFiles, OpenStack Swift, … possibly enhanced with a CDN for even better speed) to share objects (files, possibly encrypted and/or subject to access control) to make things that have been shared available to others. For asynchronous browsing by others of things that the user has shared, this immediately provides all four capabilities described above. Importantly it is possible to share to multiple services of this type at the same time and to add and remove services at will, meaning that the user is never tied to one provider.


  • To add timely notification (which improves interactivity and reduces polling workload), any of a number of IM services (notably IRC and XMPP/Jabber) can be used to deliver short ‘message available at https://storageservice/objectid’ notifications between apps in near-real-time. This is not ideal as (a) such services are not currently available on a pay-per-use IaaS/PaaS basis, meaning that the user is dependent upon the willingness of someone else to carry their traffic free of charge and (b) this use (machine-to-machine) may be outside the intended use of such services, meaning that this use may not be as reliable as typical IM use. To the extent that this use is possible, parallel use of multiple services is also possible because when the traffic is
    machine-to-machine, the difficulties of untangling multiple streams of messages can be resolved by automated means, meaning again that services can be added and removed at will and the user is never tied to one provider. (Note also that there are several other approaches to the timely notification problem, some of which may be considerably better options; IM services are simply the most obvious example.)

This is not strictly ‘serverless’, but it introduces the use of hosted services in a way which (a) doesn’t cede control to an advertiser-funded social network service and (b) doesn’t require that the owner/user be willing/able to take on the administration of a virtual/real server.

An important objection in both cases is that identifiers in domains controlled by others are still required (host names for the storage-services’ web-servers in the first case,
nicknames/usernames in the second case), however it is not necessary for any of these to take the traditional role of an email address as a personal identifier known to the user’s contacts, they are merely communication endpoints and if the means of stating which ones to use is automated, use of multiple endpoints of multiple types can be sustained. This does require a less obvious means of representing identity, but note for comparison that until recently Facebook users used nothing analogous to an email address, users were located by their name and their proximity to others in the social graph. Each user has a unique identification number, but in general only developers need to know this. The recent addition of email addresses doesn’t materially change the means of locating people, it simply happens that Facebook has added email support. The same identifier-independence is true for the scheme proposed here: the use and propagation of multiple communication endpoints can happen out of the sight of owners/users.

Another important concern is that if too many people start using this approach, IM networks are more likely to start blocking this kind of use. I’d suggest – as a hypothetical example – that FreedomBox-like projects may provide a way to address this: in many cases someone owning a FreedomBox is likely to be willing to have their friends use the device to deal with real-time notification needs. The FreedomBox XMPP/Jabber server could perhaps be enhanced to allow the option for certificate-based authentication by any of the owner’s friends without requiring registration formalities, meaning that this approach could extend non-advertiser-controlled social networking software to a much, much larger audience than those who are willing to run a [virtual] FreedomBox themselves. Not everyone knows someone who’s willing to run their own server, but the pool of people who do know such a person is dozens or hundreds of times as large as the pool of people who are able to do so themselves, meaning that if this approach is of interest to FreedomBox-like projects then there may be an opportunity here to reach a much larger audience much sooner.

This post is not yet a call to action, more a partial statement of vision, I intend to write several more posts over the next few weeks/months fleshing this idea out.

(permalink at rolandturner.com)

Prefixing stdout and stderr with helpful markers

I’m testing a piece of logging code, so I care a lot about what goes where. I figured that there had to be a shell one-liner to allow me to mark stdout/stderr without any setup or code changes. Here it is:

(( some_command | sed '-es/^/stdout: /' >&3 ) 2>&1 | sed '-es/^/stderr: /') 3>&1

So, for a trivial example:

$ ((( echo out ; echo err >&2; ) | sed '-es/^/stdout: /' >&3 ) 2>&1 | sed '-es/^/stderr: /') 3>&1
stderr: err
stdout: out
$

As the example shows, sequence between stdout and stderr may not be preserved.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!