Designing a new chat system

[This is post 2 about designing a new chat system. See the previous post on chat systems for more context!]

Okay, so, the previous post on chat systems seemed to generate a fair deal of discussion (in that the Matrix lead developer showed up and started refuting all of my points¹…!).

Partially since I said I’d try my own hand at making something new in that post, and partially because this project is also going to be my Computing A-level Non-Examined Assessment², I now need to actually get on with it and try and make something! So, without further ado…

Key implementation goals

From the frustrations expressed in the previous post, and my own personal experience, I believe a working chat system probably wants to include some of the following points:

Reliability. First and foremost, the system needs to deliver the messages to the recipient. Notifications should work, and be as timely as possible. The system should avoid just dropping messages on the floor without any indication as to this happening.
- If messages cannot be delivered, the system should make this abundantly clear to the user, so they know about it and can try again.
- Read / delivery receipts, as well as presence³, should be taken into account.
Interoperability. As discussed previously, your chat system is useless if it doesn’t talk to the ones that are already out there, unless you’re really good at persuading people. There should be some provision for bridging to existing systems, and potentially some for federation as well.
- More on this later; federation is arguably a questionable design decision, but we’ll discuss this.
Persistence. People nowdays have phones and computers that aren’t online 24/7. The server should account for this, and implement some sort of basic “store-and-forward” mechanism, so you can send messages to people when they’re offline.
Flexibility. Using an online chat system often means losing a degree of control about how you come across, and what information you send. Things like read receipts and presence should be configurable, to allow for users having different privacy preferences. Notifications should also be relatively granular, so you can avoid your phone vibrating every time someone says something in one of your many chatrooms.

Ideally I’d do some kind of research to prove that these goals are actually prized by your average user⁴ – who knows, maybe sending emoji or something is actually way more important, and users will sacrifice one or more of the above in order to get special features like that⁵ – but we’ll roll with this for now. (In fact, if you happen to be reading this blog and have strong opinions on the matter, please leave a comment; it’d be really helpful!) I suppose we’ll just say this blog post is a living document, and leave things at that for now.

The first one of these bullet-points I plan to tackle is interoperability – and specifically, whether this magical new ideal chat system should federate or not.

The question of federation

Federation refers to the practice of a bunch of people agreeing on a common standard, and designing software using this standard to create an open network of interconnected servers. The example cited in the previous post was Mastodon, which uses the ActivityPub standard to enable distributed social networking; the Matrix is another federated standard for chat⁶. Federated protocols, at least at a first glance, aren’t doing too badly; this site gives an overview of various federated protocols and their uptake in terms of users.

In fact, the biggest federated protocol in the world is email – you can set up your own mail server with your own domain (e.g. hi@theta.eu.org), and start emailing with people on any other server; email is based on a set of open standards that anyone can implement.

The spam problem

However, being federated can both be a blessing and a curse! Having your communications platform be wide open, so anybody can create an account, or federate their messages into it, or whatever, seems like a good idea at first. However, there’s another side to it: how do you control spam and abuse? Most people get spam emails nowadays, which is one of the unfortunate side effects of this openness; if anyone can connect to your mail server and send you email, without having to do anything first to confirm themselves as legitimate or trusted, you’re opening yourself up to receiving a whole bunch of junk.

There are ways to combat this – spam email checkers nowadays, like the venerable SpamAssassin⁷, tend to use a rule-based approach, essentially giving emails a ‘spam score’ based on various metrics of sketchiness: does the email pass SPF and DKIM checks? Does the subject line contain something like “get rich quick”? Is the sender using a commonly abused free email service, like Gmail or Yahoo Mail? (and so on and so forth). These work pretty well for the run-of-the-mill spam, but still can’t really protect against actively malicious people who conform to all the standards, and still send you spam.

In many chat protocols, we see a similar story. IRC has suffered greatly at the hands of robots that connect to any IRC servers they can find, join all the channels they can, and start flooding text until something notices and kicks them off (having still managed to cause a considerable amount of annoyance). The freenode IRC network suffered a recent spamwave that was actually targeted at them, with people spamming rude and unhelpful messages about the network administration until something was done about it⁸. Matrix looks like it’s also vulnerable to the same kind of thing as well⁹¹⁰. Spam is a non-negotiable part of the internet, and efforts to fight it are almost as old as the internet itself!

Mastodon manages to avoid a lot of these problems (at least from what I can see) simply as a result of what it is – in something like Twitter, you don’t really get spam in your timeline unless you explicitly choose to follow the spammers, and posts only start federating from instance to instance once an actual human issues that follow request. This is in contrast to, say, a wide-open federated chatroom, where people can just join on as many anonymous clients as they want whenever they feel like flooding a channel with text.

The full-mesh problem

The other main issue with federation is somewhat more practical: if your chat system, or social network, or whatever, is now distributed across multiple servers, how do you actually shunt messages around between them in a vaguely efficient manner? So far, the answer seems to just be “give up and use a full mesh architecture” – that is, if we need to send messages out to users on 20 different servers, connect individually to each 20 and deliver the message. (‘what happens when you activity post’ is a good, short explainer for how this works in terms of ActivityPub, Mastodon’s federation protocol.)

This works fine for smaller use-cases, but can be somewhat problematic when it comes to either (a) large follower counts, or (b) sending big files like images. As Ted Unangst says in his honk 0.1 announcement:

I post a honk, with a picture of a delicious pickle, and it goes out to all my followers. These are small messages, great. Their instances will in turn immediately come back and download the attached picture, however, resulting in a flood of traffic. If a particularly popular follower shares the post, even bigger flood.

There’s only so much honk can do here, since even trickling out outbound notes doesn’t control what happens when another instance shares it. Ironically, I think some [Mastodon] instances are spared from overload because other instances are already overloaded and unable to immediately process inbound messages.

Essentially, the issue is twofold: firstly, when you’re sending something in a federated environment, you usually have to talk to each server in your chatroom, or with your followers on it, or whatever, to deliver your message, which takes time and OS resources (most OSes limit the number of outgoing TCP connections, for example, and constructing/destructign them isn’t free in any case). Secondly, as mentioned above, if you do something like join a new chatroom or have your popular message shared by another larger server, hundreds of interested servers might go and hammer yours asking for information of some kind, causing you to become somewhat overloaded. (For example, protocols often require users to have cryptographic keys, in order to be able to verify message authenticity, and getting these keys usually involves going to the server and asking for them – which is a problem if there are suddenly loads of new servers that have never heard of your user before!).

In contrast, non-openly-federated protocols like IRC don’t have this problem; IRC is as old as dirt and still realised that having everything be full-mesh isn’t the best of ideas. Instead, servers are linked in a spanning tree (see ircdocs.horse for a better explanation and pretty diagram), such that servers only need to broadcast messages to the servers they’re directly connected to, which will forward the messages on further, propagating them throughout the tree without any need for non-connected servers to ever talk to one another. This is far more efficient – but it does assume a high degree of trust in all the servers that make up the network, which wouldn’t work in a federated context where you can’t trust anyone.

The CAP theorem problem

The other big problem is that this thing called the CAP theorem exists, which says we can only have at most two of the following three guarantees, if trying to store state across a distributed system:

Consistency - all servers have a consistent view of the world
Availability - the system can be queried at all times, although asking two servers may return different results
Partition tolerance - the system doesn’t break if there’s a network split

Essentially, what this distils down to is the idea that you will have a network split or partition somewhere (where some servers are unreachable for whatever reason), and, when this happens, you will either have two sides of the network that have a different view of the world (thus violating consistency), or you will have to stop accepting new data in order to make sure that stuff doesn’t get out of sync (thus violating availability). This isn’t really something you can get around; it’s a fact of life when designing distributed systems.

Different systems deal with this in different ways:

If an IRC network suffers a [netsplit][ircns], the two sides of the network see all the users on the other side quitting, and they can’t talk to them any more.
ActivityPub doesn’t really have to care about this, because there’s no distributed state anywhere; if an AP server wants to find out something about a user, it asks that user’s server.
Matrix actually has this rather nifty “eventual consistency” property, where you sacrifice consistency in the short-term when netsplits occur, but the system eventually sorts itself out when everyone gets reconnected again.

Given that we’ve specified our ideal chat system wants to be persistent as well as interoperable, we’re going to have to consider this problem somewhere – unless we get rid of distributed state entirely, that is. However, chat systems usually do have some idea of state, like who has administrator permissions in a chatroom.

Conclusions

Federation has been the source of many good things (interoperability! no lock-in!) and many bad ones (spam! networking complications!). As such, it’s not immediately obvious that completely open federation works for chat – contrary to what was said in the previous post, wholesale copying ActivityPub might not really work for something like chat, due to spam and issues with distributed state. Instead, we’re going to need something else – and we’ll explore what that something else could perhaps be in further blog posts in this series!

It’s worth reading through his comments, and coming to your own judgement as to whether you think what I’ve said is fair or not. I didn’t really respond, because I think we fundamentally disagree on whether the Matrix architecture is a good idea, and there wasn’t much point debating that further; the comments are there, and you’re welcome to form your own opinion! ↩
(Blogging about this process is arguably a questionable plan.) ↩
Presence is the feature that lets you know whether someone’s online or not, or when they were last online. WhatsApp calls it “last seen”; IRC has the /away command; multiple other examples exist. ↩
The A-level examiners tend to like that sort of stuff – and for a good reason; it’s nice to know that you aren’t just blindly spending time implementing something nobody wants. ↩
Snapchat is a pretty good example of people sacrificing both interoperability (you can’t bridge Snapchat to anything else) and flexibility (you have very little control over read receipts, presence, and things like that). However, the added functionality seems to be worth it! ↩
…which I somewhat harshly criticised in the last blog post. ↩
This is what theta.eu.org runs for spam detection, actually, and it mostly works! ↩
Actually, I’m still not entirely sure why this spamwave stopped; I’d be tempted to believe that the spammers giving up and stopping it themselves was probably the main reason, although I know some additional filtering protections were put in place. ↩
Citation: I was lurking in #matrix on Freenode yesterday (2019-09-25, ~20:00) when some random angry user started coming in and flooding the channel with random spam images. ↩
(I swear, I don’t have a vendetta against them or anything!) ↩

This blog post has been syndicated to the following places (feel free to leave a comment there!):

Designing a new chat system - federation