Monday, January 12, 2015

Why email is hard, part 8: why email security failed

This post is part 8 of an intermittent series exploring the difficulties of writing an email client. Part 1 describes a brief history of the infrastructure. Part 2 discusses internationalization. Part 3 discusses MIME. Part 4 discusses email addresses. Part 5 discusses the more general problem of email headers. Part 6 discusses how email security works in practice. Part 7 discusses the problem of trust. This part discusses why email security has largely failed.

At the end of the last part in this series, I posed the question, "Which email security protocol is most popular?" The answer to the question is actually neither S/MIME nor PGP, but a third protocol, DKIM. I haven't brought up DKIM until now because DKIM doesn't try to secure email in the same vein as S/MIME or PGP, but I still consider it relevant to discussing email security.

Unquestionably, DKIM is the only security protocol for email that can be considered successful. There are perhaps 4 billion active email addresses [1]. Of these, about 1-2 billion use DKIM. In contrast, S/MIME can count a few million users, and PGP at best a few hundred thousand. No other security protocols have really caught on past these three. Why did DKIM succeed where the others fail?

DKIM's success stems from its relatively narrow focus. It is nothing more than a cryptographic signature of the message body and a smattering of headers, and is itself stuck in the DKIM-Signature header. It is meant to be applied to messages only on outgoing servers and read and processed at the recipient mail server—it completely bypasses clients. That it bypasses clients allows it to solve the problem of key discovery and key management very easily (public keys are stored in DNS, which is already a key part of mail delivery), and its role in spam filtering is strong motivation to get it implemented quickly (it is 7 years old as of this writing). It's also simple: this one paragraph description is basically all you need to know [2].

The failure of S/MIME and PGP to see large deployment is certainly a large topic of discussion on myriads of cryptography enthusiast mailing lists, which often like to partake in propositions of new end-to-end encryption of email paradigms, such as the recent DIME proposal. Quite frankly, all of these solutions suffer broadly from at least the same 5 fundamental weaknesses, and I see it unlikely that a protocol will come about that can fix these weaknesses well enough to become successful.

The first weakness, and one I've harped about many times already, is UI. Most email security UI is abysmal and generally at best usable only by enthusiasts. At least some of this is endemic to security: while it mean seem obvious how to convey what an email signature or an encrypted email signifies, how do you convey the distinctions between sign-and-encrypt, encrypt-and-sign, or an S/MIME triple wrap? The Web of Trust model used by PGP (and many other proposals) is even worse, in that inherently requires users to do other actions out-of-band of email to work properly.

Trust is the second weakness. Consider that, for all intents and purposes, the email address is the unique identifier on the Internet. By extension, that implies that a lot of services are ultimately predicated on the notion that the ability to receive and respond to an email is a sufficient means to identify an individual. However, the entire purpose of secure email, or at least of end-to-end encryption, is subtly based on the fact that other people in fact have access to your mailbox, thus destroying the most natural ways to build trust models on the Internet. The quest for anonymity or privacy also renders untenable many other plausible ways to establish trust (e.g., phone verification or government-issued ID cards).

Key discovery is another weakness, although it's arguably the easiest one to solve. If you try to keep discovery independent of trust, the problem of key discovery is merely picking a protocol to publish and another one to find keys. Some of these already exist: PGP key servers, for example, or using DANE to publish S/MIME or PGP keys.

Key management, on the other hand, is a more troubling weakness. S/MIME, for example, basically works without issue if you have a certificate, but managing to get an S/MIME certificate is a daunting task (necessitated, in part, by its trust model—see how these issues all intertwine?). This is also where it's easy to say that webmail is an unsolvable problem, but on further reflection, I'm not sure I agree with that statement anymore. One solution is just storing the private key with the webmail provider (you're trusting them as an email client, after all), but it's also not impossible to imagine using phones or flash drives as keystores. Other key management factors are more difficult to solve: people who lose their private keys or key rollover create thorny issues. There is also the difficulty of managing user expectations: if I forget my password to most sites (even my email provider), I can usually get it reset somehow, but when a private key is lost, the user is totally and completely out of luck.

Of course, there is one glaring and almost completely insurmountable problem. Encrypted email fundamentally precludes certain features that we have come to take for granted. The lesser known is server-side search and filtration. While there exist some mechanisms to do search on encrypted text, those mechanisms rely on the fact that you can manipulate the text to change the message, destroying the integrity feature of secure email. They also tend to be fairly expensive. It's easy to just say "who needs server-side stuff?", but the contingent of people who do email on smartphones would not be happy to have to pay the transfer rates to download all the messages in their folder just to find one little email, nor the energy costs of doing it on the phone. And those who have really large folders—Fastmail has a design point of 1,000,000 in a single folder—would still prefer to not have to transfer all their mail even on desktops.

The more well-known feature that would disappear is spam filtration. Consider that 90% of all email is spam, and if you think your spam folder is too slim for that to be true, it's because your spam folder only contains messages that your email provider wasn't sure were spam. The loss of server-side spam filtering would dramatically increase the cost of spam (a 10% reduction in efficiency would double the amount of server storage, per my calculations), and client-side spam filtering is quite literally too slow [3] and too costly (remember smartphones? Imagine having your email take 10 times as much energy and bandwidth) to be a tenable option. And privacy or anonymity tends to be an invitation to abuse (cf. Tor and Wikipedia). Proposed solutions to the spam problem are so common that there is a checklist containing most of the objections.

When you consider all of those weaknesses, it is easy to be pessimistic about the possibility of wide deployment of powerful email security solutions. The strongest future—all email is encrypted, including metadata—is probably impossible or at least woefully impractical. That said, if you weaken some of the assumptions (say, don't desire all or most traffic to be encrypted), then solutions seem possible if difficult.

This concludes my discussion of email security, at least until things change for the better. I don't have a topic for the next part in this series picked out (this part actually concludes the set I knew I wanted to discuss when I started), although OAuth and DMARC are two topics that have been bugging me enough recently to consider writing about. They also have the unfortunate side effect of being things likely to see changes in the near future, unlike most of the topics I've discussed so far. But rest assured that I will find more difficulties in the email infrastructure to write about before long!

[1] All of these numbers are crude estimates and are accurate to only an order of magnitude. To justify my choices: I assume 1 email address per Internet user (this overestimates the developing world and underestimates the developed world). The largest webmail providers have given numbers that claim to be 1 billion active accounts between them, and all of them use DKIM. S/MIME is guessed by assuming that any smartcard deployment supports S/MIME, and noting that the US Department of Defense and Estonia's digital ID project are both heavy users of such smartcards. PGP is estimated from the size of the strong set and old numbers on the reachable set from the core Web of Trust.
[2] Ever since last April, it's become impossible to mention DKIM without referring to DMARC, as a result of Yahoo's controversial DMARC policy. A proper discussion of DMARC (and why what Yahoo did was controversial) requires explaining the mail transmission architecture and spam, however, so I'll defer that to a later post. It's also possible that changes in this space could happen within the next year.
[3] According to a former GMail spam employee, if it takes you as long as three minutes to calculate reputation, the spammer wins.

4 comments:

Andris said...

The Estonian ID card infrastructure supports S/MIME (there's a LDAP based key server for public keys etc.) but no one is actually using it besides maybe some government officials.

Even automated official e-mails to citizens do not use S/MIME but are plaintext e-mails that include an encrypted document container as an attachment, so you'd have to open the container in a government provided decryption software to see what's inside. Here's an example of a speeding ticket I once got from the police (removed all personal stuff but kept the encrypted container with the actual ticket): speeding_ticket.eml

Joshua Cranmer said...

@Andris:

I recall reading a presentation by the Estonian government that claimed in the neighborhood of 300,000 signatures. It's been some time, so I don't recall if that was explicitly S/MIME signatures or not (I assume it was, given the proximity to the discussion on how they made the S/MIME lookup work).

Even if S/MIME's use in Estonia is pitiful, it's still the case that S/MIME is more popular than PGP (1:~10-20) but still sees tiny share overall.

Andris said...

All people in Estonia have government issued ID cards (as it is the primary identification document, also for offline use) and I guess that 300k (out of 1.3M of total population) is the amount of people who actually use it digitally – but definitely not for e-mail. This is mostly used for online identification and signing documents (for example verifying bank transfers, signing contracts etc.). I have seen some tutorials of how to setup your Outlook to use S/MIME but I have never seen anyone actually using it this way. I'm pretty sure 99.99% of people would have no clue of what to do if someone sends them an encrypted S/MIME message. Maybe the people working with state secrets, like in defense or foreign affairs or whatever use it for e-mail but even if they do then they're still just a handful.

Andre said...

I am not aware of anyone who uses S/MIME. Most people I know use gpg. With the gptools for Mac that is very easy, for other platforms enigmail is not there yet, but it just works.

What annoys me about encryption is just the usability of engigmail compared to the gpgtools for Mac. All this is solvable and I wonder why the public sector does not step in. In a post-Snowden world we just can't accept suboptimal email encryption and leakage of our trade secrets.