Under the Hood of Spark Email Real-Time Collaboration

No one would argue that collaboration is a crucial part of teamwork. Working in a team, we constantly need to share, discuss, delegate and eventually reach our common goal of making our business successful. Nowadays, we can use various tools for these purposes, and obviously these tools have evolved significantly in the past 5-10 years: we can chat in corporate messengers (Slack, Skype), edit documents together (Google Docs, Pages, Dropbox), review each other's code (Github Pull requests, Crucible), etc. But for some reason, collaboration in email is not all that popular, although it seems the idea is floating near the surface.

Let's imagine a situation when the CEO of a company writes an important email to investors and wants to add some figures from the latest financial report. He requests the required information from the finance department. But since that email should be sent on behalf of the CEO, he needs to edit his email himself and insert the financial information into the email (although that work might be delegated). What if he could simply share his email draft with the colleagues from the finance department, and they could add all the required figures directly to the email? Sounds perfect? But can we do it with the help of any popular email clients? I would answer – there is no easy way of doing that.

At Readdle, we decided to introduce a new way of collaboration for our users in our email client – Spark. The idea was to give our users the same intuitive and easy way of using collaboration mechanics they have in Google Docs.

How to do it technically? How to let multiple people make changes at the same time and avoid conflicts, allow formatting, undo/redo, inline images, etc? Those were the questions we had to answer.

If you think it's not that hard, let me show you the most basic example and the possible issues.

Let's say we have two people (Alice and Bob) who want to edit a document. For simplicity, the only thing they can do is to "insert character C in position N." Really easy! We need some way to transfer these commands from one user to another and we are done. Can you see a problem here? There is one. Let's suppose the initial state for both users is "123456789," and it is synced meaning both users see the same string. At some point, they notice that "0" is missing in the sequence. Alice decides to add it at the beginning of the text and Bob – at the end. Now Alice sees "0123456789" and Bob sees his own new version of the string – "1234567890." Since we are talking about collaboration, the results should be synced between the users allowing them to see changes from each other. Alice sends a command to Bob: "Insert '0' in position 0" and Bob sends a command to Alice: "Insert '0' in position 9." Now, each client receives commands from the opposite one and tries to adopt this state. What happens? Bob inserts '0' to the 0 positionand gets "01234567890." But, Alice uses the 9th position, to get the text "01234567809," which is not exactly what we expect to see. Here is the basic conflict we need to solve.

The problem of real-time text collaboration is nothing new, and there are several known algorithms to solve it. The most well known are OT (Operational Transformation) and CRDT (Conflict-free Replicated Data Type). That's where we started our research.

OT (Operational Transformation)

We chose OT to be the first try, as the more well known and discovered one.

OT is a quite old algorithm, published in 1989, and successfully adopted and extended with additional functionality by Google in Google Wave and later in Google Docs.

The basic idea is simple: users send their changes as operations and transform incoming operation based on existing ones. Let’s take a look at an example of how OT works for Alice in the previously described case.

She has her own operation "Insert '0' in position 0" = INS('0', 0) and now receives "Insert '0' in position 9" = INS('0', 9).

According to the idea of OT, we need to transform incoming operations considering outgoing operations since the last synced point. In other words, we need to correct the operation parameters in a way the outcome of the corrected incoming operation applied on the result of the outgoing operations is the same as the outcome of the non-corrected operation. It means we need to have rules for each pair of operation types, resulting in an N2 rules. For example, in our case, we have two insert operations; we need the rule to transform insert operation (A), based on another insert operations(B). We can use the following rule: If the position in the B operation is less than in A, we need to increase the A's position by the length of the text in B. Let’s check how it works in our case. The incoming operation for Alice is INS('0', 9) and outgoing operation is INS('0', 0). We transform INS('0', 9) considering INS('0', 0). Position 9 is greater than 0, so this position should be increased by the length of the inserted string, which is 1, so INS('0', 9) will become INS('0', 9+1) = INS('0', 10) and applying this operation results in "01234567890."

Bob has the opposite situation. INS('0', 0) should be transformed with respect to own operation INS('0', 9), but 0 is less than 9; no modifications are needed, and we keep INS('0', 0). Finally, we have the same resulting string as Alice: "01234567890." Hurray!

Well, it wasn't that difficult. But while scaling the approach, you may notice one problem here: if client A sent his operation, how do we know whether the client B has already received and applied it or not at the moment of sending his operations? Do we need to transform incoming operation from B or not? To solve this we need a way for each client to know the current state of other clients, which becomes costly in the case of multiple clients. To avoid these costs, we can use a centralized source of truth – server, which coordinates all the clients. Basically, clients sync only with the server; they only need to know about its state. The downside of this approach is to perform the task, the server basically needs to act as the client, while accepting operations from other clients. This means we need to have all the logic implemented on the client duplicated on the server, making future updates, testing and debugging more complex. Remember we only talked about one type of operation, but real life is a bit more complex (consider formatting, images, etc.). We need to invent and code transformations for all the possible pairs of operations. Things become more complicated when we need to add undo/redo functionality.

Having all of this in mind, we decided to take a break and consider other possibilities – CRDT.

CRDT (Conflict-free Replicated Data Type)

The general idea of CRDT is pretty simple: It is a structure that "can be replicated across multiple computers in a network, where the replicas can be updated independently and concurrently without coordination between the replicas, and where it is always mathematically possible to resolve inconsistencies which might result." (Wikipedia). Or in plain-spoken words, the structure allowing us to apply operations in any order, with any delays and duplications, without any transformations. Sounds like voodoo. To give you an idea of how it is possible we can look at the simplest possible CRDT, which is the "grow only counter."

Grow only counter

What can be simpler than a counter? The only operation available is "increase by 1;" the only outcome is the final value of the counter. But things are never that simple if you want to allow to use this counter on multiple distributed nodes with peer to peer synchronization. Initially, CRDT structures were supposed to work in environments with unstable connections, where a loss or a duplication of a package is possible. Indeed we don't need this level of stability, as we have network protocols with guaranteed delivery. But we can take a look at the worst case scenario. How can we transmit updates of our counter? We can't broadcast the updated value, as it may already be changed by another user and become outdated. We can't send "increment" operations, as the messages may be duplicated. CRDT solution for this problem is simple and elegant: Let each node to have its own counter.

Nodes broadcast their values and each node stores the value of all the nodes. To get the resulting value, one sums up the values of all the nodes. Each node can change the value at any time and broadcast it to others, without any chance of conflicts. Duplication of events is not a problem. We only broadcast values, not operations; there is no risk of double applying. Delay is also not a problem. We receive our value later. Even swap of the packets in transmission is not a problem, as we know that value can only grow. We drop the packets with values lower than the current one. Nice and easy, elegant and suitable for any network.

This quick example gives an idea of how CRDT algorithms can solve collaboration issues. But, as you remember our goal is not a simple counter, but complex text collaboration. Luckily, there are CRDT algorithms for this problem. The first one we decided to try out was called Logoot.

Logoot

The basic idea is to assign a unique id to each character of the text in a way that IDs only grow in the text and use them in operations. It helps keep the text in sync no matter how we perform the changes. This is achieved due to the static IDs and their strict order. Let’s see how it works on the following example.

Two users are trying to write the word “HELLO.”

Alice wants to insert the first character 'H.' To create this operation, we should generate an id for this new character. To limit our choice and have something to start with, we add virtual characters S (for Start) and E (for End). Let S has id = 0 and E = 1000. The algorithm of generating a new id is an interesting topic, but now, for simplicity, let’s choose the first available, which is 1 for this new character. Now we have operation INS(‘H’, 1).

Bob receives the operation to insert 'H' with id = 1. We search the current text for the character with the maximum id less than 1 (note as the text has strictly growing ids, we can use binary search to do it quickly). Obviously, we find S as the character and insert 'H' after it. Success! Now we have 'H' at all the nodes.

Now Alice wants to insert 'L' after 'H'. Again, we choose the id between 1('H') and 1000. Again, we use the simple generator for id and take the next available, which is 2 and repeat the previous step on the other nodes. Success! We have “HL” on all the nodes.

But wait, it seems our Alice forgot to insert 'E' between 'H' and 'L.' We need an id which is 1<id<2. Hmmm... tricky. But who said our IDscan only be integers. We can have an array of integers. We select 1 on the first level and go to the second level, which is currently empty. There we can again use virtual 0 and 1000 and select any number between them. For example, we can choose 1 and the resulting ID will be 1.1. We have maintained strict order and the other nodes can apply the operation to their state. Nice! We are almost there.

Remember we are collaborating. At the same time, Bob notices the same mistake and tries to fix it. But he seems to be Dutch and decides to add 'A' instead of ‘E' ( as we know “hello” is “hallo” In Dutch).

As they don't know about each other, based on our current algorithm, both characters receive the same id, and we break our increasing sequence.

Looks like an issue, but fortunately, it is easy to fix.

First, let’s assign some IDs to each collaborator (we call it the site id). (For better visualization we will use here ‘A’ and ‘B’ ids, but in real life, it’s much more convenient to use the sequence of integers.) Add this id to each part of the character id. 1 created by A is (1:A). To determine the order of the two character IDs we use site IDs in a comparison, which again gives us strict and unified order on all the clients. This is it. Problem solved and now we have strict order even in a collision situation.

With this simple algorithm, we are able to add any number of characters in any order and always receive the consistent result for all the collaborators. As you can imagine, the delete operation is also simple. We specify the id of the character to be deleted. Insert and delete operations give us a full range of the operations on plain text.

You may notice one important thing here: With these operations, the system won't work if events come in the wrong order (like delete after insert vs insert after delete). There are multiple ways to solve this, but fortunately, in our case, we work with a centralized server and have protocol’s guarantees we receive all the events and receive them in the correct order.

Getting back to our initial task, this algorithm was good enough to start prototyping. We happily started coding and soon had our first Proof of Concept of the collaboration engine. It worked like a charm... except for one issue, but a nasty one. Let’s imagine two users are working offline and after a while, they start syncing their documents. First one has the events H - 1:A, E- 2:A, L - 3:A, L - 4:A, O - 5:A, second one has events B - 1:B, Y- 2:B, E - 3:B. What is the result of the merge?

HBEYLELO looks like a total mess, right? The precious work of two people was literally ruined! Definitely not production ready. With this in mind, we returned to the board to find a new algorithm.

RGA (Replicated Growing Array)

This was a really tough moment, as we thought maybe we’d made the wrong choice, and we should return to OT algorithms. Fortunately, after a bit of searching, we found another algorithm from the set of CRDT algorithms, called RGA (Replicated Growing Array). It doesn't have this merge issue, and it is quite similar to Logoot; it looked like it would be easy and fast to try. We started coding again.

Let's take a brief look at this algorithm. Each character has an id again, but now we generate the id by using the sequence in the following way: new id = (max_id +1, site_id). The insertion operation now contains the ID of the character to the left of the insertion point, the newly generated id for the symbol to be inserted and the symbol itself. In case of collisions (when two operations use the same position as the insertion point), we use the following rule: Check the characters to the right of the insertion point, skip all the characters with IDs greater than id of the new character. It allows the same strict order for all the collaborators.

What about deletion? Here is the first downside of RGA. As all the operations use the ID of existing characters as references, we can't delete one, since it may be in use by the other client's events. To solve this, we mark deleted character as a tombstone, remove it from the text, but keep in internal storage. It seems we have all the pieces of the puzzle: We can insert, we can delete, and we can edit the plain text!

We transformed our Logoot implementation into RGA and started testing it on all the cases we could come up with. Luckily, the result was pretty solid, and this time we didn't find any critical issues, except a few problems with the performance we solved with some optimizations.

When we reached the more or less stable version of the algorithm, we started implementing it in production.

Results

In conclusion, we have successfully implemented RGA for text collaboration and currently use it for the shared drafts functionality in Spark.

Let’s have a quick look at the pros and cons of the algorithm:

Pros

1. Simple algorithm making testing, debugging, adding new features and life in general easier :-).

2. All the merge logic is consolidated on the client side, allowing us to keep the back-end simple. Therefore, in case of further updates, we don’t need to make changes on both sides.

Cons

1. Full event history should be stored. There is no easy way to save an intermediate snapshot.

2. A full history should be processed when the user opens a shared draft to reconstruct the final state of the document causing a delay on opening.

3. IDs are shuffled in the text; to find character index for id we have to use slow linear search.

We didn't stop with this result. The next steps support inline images, formatted text and undo/redo. We have added broadcasting of the current selection with live updates. Of course, we have plans to add more and more features in the future.

Download Spark now and check out yourself how real-time collaboration works.