The (Sad) State of Mobile XMPP in 2014

Georg Lukas, 2014-01-30 18:20

In a post from 2009 I described why XEP-0198: Stream Management is very important for mobile XMPP clients and which client and server applications support it. I have updated the post over the years with links to bug tracker issues and release notes to keep track of the (still rather sad) state of affairs. Short summary:

Servers supporting XEP-0198 with stream resumption: Prosody IM.

Clients supporting XEP-0198 with stream resumption: Gajim, yaxim.

Today, with the release of yaxim 0.8.7, the first mobile client actually supporting the specification is available! With yax.im there is also a public XMPP server (based on Prosody) specifically configured to easily integrate with yaxim.

Now is a good moment to recapitulate what we can get from this combination, and where the (mobile) XMPP community should be heading next.

So I have XEP-0198, am I good now?
Entering Coverage Gaps
Message Synchronization
End-to-End Encryption
Encrypted Media
And then iOS...
Summary

So I have XEP-0198, am I good now?

Unfortunately, it is still not that easy. With XEP-0198, you can resume the previous session within some minutes after losing the TCP connection. While you are gone, the server will continue to display you as "online" to your contacts, because the session resumption is transparent to all parties.

However, if you have been gone for too long, it is better to inform your contacts about your absence by showing you as "offline". This is accomplished by destroying your session, making a later resumption impossible. It is a matter of server configuration how much time passes until that happens, and it is an important configuration trade-off. The longer you appear as "online" while actually being gone, the more frustration might accumulate in your buddy about your lack of reaction – on the other hand, if the session is terminated too early and your client reconnects right after that, all the state is gone!

Now what exactly happens to messages sent to you when the server destroys the session? In prosody, all messages pending since you disconnected are destroyed and error responses are sent back. This is perfectly legal as of XEP-0198, but a better solution would be to store them offline for later transmission.

However, offline storage is only useful if you are not connected with a different client at the same time. If you are, should the server redirect the messages to the other client? What if it already got them by means of carbon copies? How is your now-offline mobile client going to see that it missed something?

Even though XEP-0198 is a great step towards mobile messaging reliability, additional mechanisms need to be implemented to make XMPP really ready for mass-market usage (and users).

Entering Coverage Gaps

With XEP-0280: Message Carbons, all messages you send and receive on your desktop are automatically also copied to your mobile client, if it is online at that time. If you have a client like yaxim, that tries to stay online all the time and uses XEP-0198 to resume as fast as possible (on a typical 3G/WiFi change, this takes less than five seconds), you can have a completely synchronized message log on desktop and mobile.

However, if your smartphone is out of coverage for more than some minutes, the XEP-0198 session is destroyed, no message carbons are sent, and further messages are redirected to your desktop instead. When the mobile client finally reconnects, all it receives is suspicious silence.

XMPP was not designed for modern-day smartphone-based instant messaging. However, it is the best tool we have to counter the proprietary silo-based IM contenders like WhatsApp, Facebook Chat or Google Hangouts.

Therefore, we need to seek ways to provide the same (or a better) level of usability, without sacrificing the benefits of federation and open standards.

Message Synchronization

With XEP-0136: Message Archiving there is an arcane, properly over-engineered draft standard to allow a client to fetch collections of chat messages using a kind of version control system.

An easier, more modern approach is presented in XEP-0313: Message Archive Management (MAM). With MAM, it is much easier to synchronize the message log between a client and a server, as the server extends all messages sent to the client with an <archived> tag and an ID. Later it is easily possible to obtain all messages that arrived since then by issuing a query with the last known archive ID.

Now it is up to the client implementors to add support for MAM! So far, it has been implemented in the web-based OTalk client, more are to come probably.

End-to-End Encryption

In the light of last year's revelations, it should be clear to everybody that end-to-end encryption is an essential part of any modern IM suite. Unfortunately, XMPP is not there yet. The XMPP Ubiquitous Encryption Manifesto is a step into the right direction, enforcing encryption of client-to-server connections as well as server-to-server connections. However, more needs to be done to protect against malicious server operators and sniffing of direct client-to-client transmissions.

There is Off-The Record Messaging (OTR), which provides strong encryption for chat conversations, and at the same time ensures (cryptographic) deniability. Unfortunately, cryptographic deniability provably does not save your ass. The only conclusion from that debacle can be: do not save any logs. This imposes a strong conflict of interest on Android, where the doctrine is: save everything to SQLite in case the OOM killer comes after you.

The other issue with OTR over XMPP (which some claim is solved in protocol version 3) is managing multiple (parallel) logins. OTR needs to keep the state of a conversation (encryption keys, initialization vectors and the like). If your chat buddy suddenly changes from a mobile device to the PC, the OTR state machine is confused, because that PC does not know the latest state. The result is, your conversation degrades into a bidirectional flood of "Can't decrypt this" error messages. This can be solved by storing the OTR state per resource (a resource is the unique identifier for each client you use with your account). This fix must be incorporated into all clients, and such things tend to take time. Ask me about adding OTR to yaxim next year.

Oh, by the way. OTR also does not mix well with archiving or carbons!

There is of course also PGP, which also provides end-to-end encryption, but requires you to store your key on a mobile device (or have a separate key for it). PGP can be combined with all kinds of archiving/copying mechanisms, and you could even store the encrypted messages on your device, requiring an unlock whenever you open the conversation. But PGP is rather heavy-weight, and there is no easy key exchange mechanism (OTR excels here with the Socialist Millionaire approach).

Encrypted Media

And then there are lolcats¹. The Internet was made for them. But the XMPP community kind-of missed the trend. There is XEP-0096: File Transfer and XEP-0166: Jingle to negotiate a data transmission between two clients. Both protocols allow to negotiate in-band or proxy-based data transfers without encryption. "In-band" means that your multimedia file is split into handy chunks of at most 64 kilobytes each, base64-encoded, and sent via your server (and your buddy's server), causing some significant processing overhead and possibly triggering rate limiting on the server. However, if you trust your server administrator(s), this is the most secure way to transmit a file in a standards-compliant way.

You could use PGP to manually encrypt the file, send it using one of the mentioned mechanisms, and let your buddy manually decrypt the file. Besides the usability implications (nobody will use this!), it is a great and secure approach.

But usability is a killer, and so of course there are some proposals for encrypted end-to-end communication.

WebRTC

The browser developers did it right with WebRTC. You can have an end-to-end encrypted video conference between two friggin' browsers! This must have rang some bells, and JSON is cool, so there was a proposal to stack JSON ontop of XMPP for end-to-end encryption. Obviously because security is not complicated enough on its own.

XMPP Extensions Graveyard

Then there are ESessions, a deferred XEP from 2007, and Jingle-XTLS, which didn't even make it into an XEP, but looks promising otherwise. Maybe somebody should implement it, just to see if it works.

Custom Extensions

In the OTR specification v3, there is an additional mechanism to exchange a key for symmetric data encryption. This can be used to encrypt a file transmission or stream, in a non-standard way.

This is leveraged by CryptoCat, which is known for its security. CryptoCat is splitting the file into chunks of 64511 bytes (I am sure this is completely reasonable for an algorithm working on 16-byte blocks, so it needs to be applied 4031.9375 times), with the intention to fit them into 64KB transmission units for in-band transmission. AES256 is used in CTR mode and the transmissions are secured by HMAC-SHA512.

In ChatSecure, the OTR key exchange is leveraged even further, stacking HTTP on top of OTR on top of XMPP messages (on top of TLS on top of TCP). This might allow for fast results and a high degree of (library) code reuse, but it makes the protocol hard to understand, and in-the-wild debugging even harder.

A completely different path is taken by Jitsi, where Jingle VoIP sessions are protected using the Zimmerman RTP encryption scheme. Unfortunately, this mechanism does not transfer to file exchange.

And then iOS...

All the above only works on devices where you can keep a permanent connection to an XMPP server. Unfortunately, there is a huge walled garden full of devices that fail this simple task². On Apple iOS, background connections are killed after a short time, the app developer is "encouraged" to use Apple's Push Service instead to notify the user of incoming chat messages.

This feature is so bizarre, you can not even count on the OS to launch your app if a "ping" message is received, you need to send all the content you want displayed in the user notification as part of the push payload. That means that as an iOS IM app author you have the choice between sacrificing privacy (clear-text chat messages sent to Apple's "cloud") or usability (display the user an opaque message in the kind of "Somebody sent you a message with some content, tap here to open the chat app to learn more").

And to add insult to injury, this mechanism is inherently incompatible with XMPP. If you write an XMPP client, your users should have the free choice of servers. However, as a client developer you need to centrally register your app and your own server(s) for Apple's push service to work.

Therefore, the iOS XMPP clients divide into two groups. In the first group there are apps that do not use Apple Push, that maintain your privacy but silently close the connection if the phone screen is turned off or another app is opened.

In the second group, there are apps that use their own custom proxy server, to which they forward your XMPP credentials (yes, your user name and password! They better have good privacy ToS). That proxy server then connects to your XMPP server and forwards all incoming and outgoing messages between your server and the app. If the app is killed by the OS, the proxy sends notifications via Apple Push, ensuring transparent functionality. Unfortunately, your privacy falls by the wayside, leaving a trail of data both with the proxy operators and Apple.

So currently, iOS users wishing for XMPP have the choice between broken security and broken usability – well done, Apple! Fortunately, there is light at the end of the tunnel. The oncoming train is an XEP proposal for Push Notifications (slides with explanation). It aims at separating the XMPP client, server, and push service tasks. The goal is to allow an XMPP client developer to provide their own push service, which the client app can register with any XMPP server. After the client app is killed, the XMPP server will inform the push service about a new message, which in turn informs Apple's (or any other OS vendor's) cloud, which in turn sends a push message to the device, which the user then can use to re-start the app.

This chain reaction is not perfect, and it does not solve the message-content privacy issue inherent to cloud notifications, but it would be a significant step forward. Let us hope it will be specified and implemented soon!

Summary

So we have solved connection stability (except on iOS). We know how to tackle synchronization of the message backlogs between mobile and desktop clients. Client connections are encrypted using TLS in almost all cases, server-to-server connections will follow soon (GMail, I am looking at you!).

End-to-end encryption of individual messages is well-handled by OTR, once all clients switch to storing the encryption state per resource. Group chats are out of luck currently.

The next big thing is to create an XMPP standard extension for end-to-end encryption of streaming data (files and real-time), to properly evaluate its security properties, and to implement it into one, two and all the other clients. Ideally, this should also cover group chats and group file sharing (e.g. on top of XEP-0214: File Repository and Sharing plus XEP-0329: File Information Sharing).

If we can manage that, we can also convince all the users of WhatsApp, Facebook and Google Hangouts to switch to an open protocol that is ready for the challenges of 2014.

transmitted from one place to another. For the sake of this discussion, streaming content is considered as "multimedia" as much as the transmission of image, video or other files.

because it prevents evil apps from eating the device battery in the background. I am sure it is a feature indeed – one intended to route all your IM traffic through an infinite loop).

Comments on HN

lolcats, porn, or whatever other kind of multimedia content you want↩
the Apple fanboy will object that this is a feature and not a bug,↩