Designing group chat state for a new chat system

[This is post 4 about designing a new chat system. Have a look at the first post in the series for more context!]

This is potentially the point at which the previous few blog posts, which have talked in vague, vacuous terms about the possibility of designing some new utopian chat system, start to gradually become more about actually getting the work done – purely due to the fact that, as previously mentioned, I have a deadline! It’s been a while¹ since I last said anything about the chat systems project – so, let’s get things rolling again with a practical post about our database schema, shall we?

The importance of a good schema

“Bad programmers worry about the code. Good programmers worry about data structures and their relationships.”

~ Linus Torvalds

It’s been said, as you can see in the quote above, that thinking hard about your data structures – the way you choose to store whatever data it is that you’re processing – is vitally important to the success of whatever it is that you’re building – so it’s worth having good ones, or else you’ll end up with hacky code that doesn’t quite work right². If you have the right data structures, the code usually sort of flows in to fill the gaps and make everything work right – whereas if you do the code first, it doesn’t usually work as nicely.

Now, data structures come in all shapes and sizes – there are trees, hash tables, association lists, binary heaps, and all sorts of other fun stuff that you’d probably find in some CS textbook somewhere. We could use some of these for our new chat system – and, in fact, we probably will use the odd hash table or two. However, given I want to keep things nice and boring, sticking to proven, reliable technologies for this project³, we’re probably just going to store everything in a set of relational database tables (i.e. rows and columns!).

And, a schema is essentially a description of a set of what columns mean what, and what tables you’re going to have. Which is what I’m going to write now, so, without further ado…

What do we actually want?

It’s a good idea to start with a discussion of what data we have, and what we’re trying to get out of that data. We’ve said that we want our new system to support our own funky blend of federation, so that’s going to need to be accounted for. Naturally, a chat service will have a bunch of users, each with their own usernames, passwords, emails, and other information that needs to be stored about them. We’ll probably have some messages as well, given users tend to make a lot of those when you give them the chance to.

As we’ve also discussed before, the primary function of our chat service is to convey messages from their source to their recipient, and do so in a reliable manner. That implies further that, in addition to the messages themselves, we’d also benefit from storing information about message delivery – did the messages get through, in the end, or do we need to try again sometime?

Since we’re supporting group chats, those also need to have some information stored about them. Our federation protocol⁴ requires us to store multiple different ‘versions’ of group chat state (remember, ‘state’ refers to things like the list of members of the chat, who has admin rights, and what the current topic is) – because it’s based on this whole funky consensus protocol stuff, we’ll need to keep track of what servers have proposed changes, and which changes we decided to actually accept.

A model for group chat state

The consensus algorithms we looked into previously allow us to get a bunch of trusted servers (‘sponsoring servers’) to agree on something, where ‘something’ is just some arbitrary value – usually with some kind of monotonically incrementing identifier or version number. It thus follows that we need some model for what that ‘something’ will look like; how will servers communicate information about a room’s state to one another?

Users, administrators, and moderation

Actually, we haven’t even specified what we want this room state to look like yet – there are still some unresolved questions around things as simple as how administrator rights / operator powers should work in group chats. Different platforms do this in different ways, after all:

IRC has a system of ‘channel modes’, where users can be given flags such as +o (operator; can kick, ban, etc.) and +v (voice; can speak in muted channels).
- Some servers then extend this, adding +h (‘half-operator’), +a (‘admin’), +q (‘owner’), and all sorts of other random gunk that confuses people and makes it hard to determine who can do what.
- Of course, server administrators can override all of this and just do what they want, on most implementations.
Matrix has ‘power levels’ - each user has an integer between 0 and 100 assigned to them, which determines what they’re able to do. A set of rules (stored in the room state, alongside the power levels) specify what power levels map to what – for example, only people with power level >= 50 can set the topic, etc.
- You’re not allowed to give people power levels higher than what you already have, and you can’t take away power from people who have more or the same amount of power as you. This is kinda required to make the whole thing work.
- Because Matrix is decentralised, it’s possible to get yourself into a state where everyone’s lost control of a chatroom and you can’t get it back. Of course, though, this is quite easy to avoid, by making the software stop you from shooting yourself in the foot.
WhatsApp has ‘group admins’ and…that’s pretty much it⁵. You either have admin rights, in which case you can do everything, or you don’t and you can’t really do much.
- This is very simple for users to understand.
- However, it makes WhatsApp completely impractical for all sorts of use cases; anyone who’s found themselves kicked out of a WhatsApp chat after someone went crazy and banned everyone after pleading for admin ‘in order to help keep things civil’ probably knows what I’m talking about.
Discord has ‘roles’ - there’s a bit in server settings where you can create sets of permissions, called ‘roles’, that you grant to different users in order to empower them to do specific things⁶.
- This is partially why Discord can play host to massive chatrooms for very popular games like PUBG and Fortnite: the system is very flexible, and suited to large-scale moderation.
- However, it’s also perhaps quite confusing for some people, especially if you’re just using it for smaller-scale stuff like a private group chat.
- Like Matrix, the roles are arranged in a kind of hierarchy; roles at the top of the hierarchy can make changes to roles below them, but not the other way round.

So, hmm, it might seem like there’s a lot to choose from here – but, in fact, it’s a bit more simple than you’d think. It’s immediately apparent that the more flexible Matrix/Discord systems can in fact be used to make more simple systems, like the IRC and WhatsApp ones; if all you want is group admin or not group admin, you can make two Discord roles (one with all permissions, one with none), or have two power levels (100 and 0, say, with appropriate rules for each one), and you’ve essentially got the more simple system using your complex one. (And you can do funky things with your user interface to hide away the added power, for those who don’t really need it.)

Taking some inspiration from this idea, and the age-old concept of an access-control list, here’s a proposed model. We’ll specify a set of capabilities that describe what actions can be taken – for example, speak, change-topic, mute-user, and so on – and then a set of roles, like the Discord roles, that are sets of capabilities. Each user gets a role, which in turn describes what they’re able to do. (If we want to make things work nicely for ex-IRC people, the roles can optionally come with a small letters, like +o and +v.) Unlike Discord and Matrix, there won’t be any hierarchy to the roles. Roles can only be modified by someone holding a capability called change-roles, and that’ll be the end of it. The sponsoring servers in our federation model will do this role check every time they receive a message or a request to change the state in some way, and refuse to apply the change if it’s not permitted.

The list of capabilities will eventually be written in some spec document somewhere, and thereby standardised across different server implementations. Essentially, they’ll work like IRCv3 capabilities, where vendors can make their own capabilities up if they want to (prefixing them with a valid domain name for whatever it is that they built).

To make things easier, the special capability * allows a user to make any change.

In JSON-esque syntax⁷, this would look a bit like:

{
    "roles": {
        "normals": ["speak", "change-topic"],
        "voiced": ["speak", "speak-in-muted-channel", "change-topic", "theta.eu.org/special-extension-power"],
        "admins": ["*"]
    },
    "memberships": {
        "server.org/user1": "normals",
        "server.org/adminuser1": "admins"
    }
}

Of course, there’s also state that doesn’t directly relate to users – the current group chat subject, whether or not guest users are allowed in, etc. Some state may have more complicated structure – for example, the Matrix people have a server ACL state event that lets you ban specific servers from taking part in rooms, which is pretty much its own little object with embedded arrays and booleans – which means we can’t just model this state as a simple stringly-typed key-value store, i.e.

{
    "state": {
        "subject": "example",
        "other_state_key": "other_value",
        // etc
    }
}

The question is, though, how extensible we want things to be: can users (or servers, for that matter) store arbitrary objects in our group chat state, or do they all have to follow some predetermined schema? Matrix takes the approach of letting users chuck in whatever they like (assuming they have the necessary power level) – essentially treating each group chat as a mini database – while pretty much every other platform restricts room state to a set of predetermined things.

Capability negotiation

I’m not entirely sure allowing arbitrary extensibility, in the way that Matrix does, is such a good idea – for two reasons⁸:

Allowing people to just store arbitrary data in your group chat seems a bit too much like an abuse vector. It’s a chat system, not a distributed database; allowing people to just chuck whatever stuff they want in there is a bit much!
- How would you display these arbitrary state events, given you don’t know anything about them?
- You’d need some limit on size, to prevent people just using you as free storage?
- In many jurisdictions, you’re responsible for content that gets put on your server. What if someone uploads stuff you’re not comfortable hosting, and stashes it away in some event your client doesn’t implement (and therefore doesn’t show)?
Usually, things in the group chat state are there for a reason, and it doesn’t make sense for servers to just ignore them.
- For example, consider when the Matrix folks rolled out the server ACL feature: servers not on the latest version of their software would just completely ignore it, which is pretty bad (because then it had no effect, as malicious actors could get into the room via the unpatched servers).
  - They ‘solved’ this by polling all the servers in a room for their current version number, and checking to see which ones still needed updating (which let them badger the server owners until it got updated).

Instead, it’s probably better to actually have some system – like the previously-mentioned IRCv3 capability negotiation⁹ – where servers can say “yes, I support features X, Y, and Z”, thus enabling those features to be used only if all of the sponsoring servers in a group chat actually support them. This solves the issue of extensibility quite nicely: non-user related state is governed by a strict schema, with optional extensions for servers that have managed to negotiate that.

Group Chat state summary

So, to sum up: we’ve managed to get a better idea of what the blob of group chat state, shared across all servers participating in a group chat and agreed upon via the federation consensus protocol, should look like: it’ll contain a capability-based system for regulating what users are allowed to do what, and it’ll contain a set of other pieces of strictly-typed non-user-related state, like the group chat subject! This might all seem a bit abstract for now, but it’ll hopefully become clearer once actual code starts getting written. On that note…

A note about timings

We managed to roughly sketch out group chat state over the course of around ~2,500 words (!), but there’s still all the other stuff to spec out: users, messages, and reliable delivery mechanisms. In addition, there are also a number of less conceptual things to sketch out, like how we’re going to ensure server-to-server transport happens securely (SSL? Encryption?¹⁰) and things like that.

And this all has to be done by March, at the absolute latest. Yay!

On a more hopeful note, we do actually have some code for this whole project – currently, it does the registration flow part of a standard IRC server (basically, the plan is to piggyback off of IRC for the client<->server stuff, to get things going), and has a very untested implementation of the Paxos consensus protocol. We’re using Common Lisp as our main programming language, which is fun¹¹!

Anyway, the rest of the spec stuff will follow in the next blogpost (hopefully quite quickly after this one…!) – if anyone actually has the tenacity to read my ~2,500 words about the annals of random chat protocol stuff, that is! We will also actually show how all of this translates into a database schema, which was sort of the point. Oops.

To be continued…

Err, two months. I was very busy applying to universities, okay! ↩
A large part of my criticism of Matrix centered around the fact that they did something funky – and, in my view, perhaps unnecessary – with theirs. ↩
Using some fancy NoSQL database has undone many a project in the past; SQL databases have had a lot of engineering and research effort put into them over the decades to make them work reliably and quickly, and that shouldn’t be given up lightly! ↩
Read the ‘funky blend of federation’ blog post linked earlier, if you need a refresher! ↩
I think they recently added some sort of limited permissions system for “who can change the topic: admins or all group members?” and things like that, but this is the gist of it. ↩
They also serve a vanity / novelty purpose; look at me, I’ve got a colourful name! ↩
Even though we might not be actually using JSON at the end of the day, it’s a pretty well-understood way to describe things. ↩
This IRCv3 issue is about a very similar problem (allowing arbitrary client-only message tags on IRC), and is also worth a read! ↩
These capabilities are, unfortunately, nothing to do with the user/administrator capabilities discussed earlier… ↩
The Matrix people went and used Perspectives instead of regular SSL PKI for their stuff, and they eventually had to move away from it. It’s worth learning from that mistake! ↩
It was going to be Rust, but then I started doing Lisp stuff and figured this would be more ‘interesting’… ↩