For the last week I’ve been working on making a big engine system thread-safe. I found that there is relative little information out there on how to multi-thread actual, real-world systems (if you need something faster than a single global lock). Most of the articles I found focused on simple data types, such as linked lists. So I thought it would be interesting to share my experiences.
The Truth is our toungue-in-cheek name for a centralized system that stores the application data. It is based around IDs, objects, types and properties. An object in the system is identified by a uint64_t ID. Each object has a set of properties (bools, ints, floats, strings, …) based on its type. Using the API looks something like this:
uint64_t asset_id = the_truth->create_object_of_type(tt, asset_type); the_truth->set_string(tt, asset_id, ASSET_NAME_PROPERTY, "foo");
The Truth is the place where other systems read and write their data. They can of course use internal buffers too, for better performance, but if they want to share data with the outside world, it goes through The Truth.
In addition to basic data storage, The Truth has other features too, such as sub-objects (objects owned by other objects), references (between objects), prototypes (or “prefabs” — objects acting as templates for other objects) and change notifications.
Goals of the design
Since The Truth is a low-level system that we expect to shuffle a lot of data through, it is important that the design is performant. Having a single lock for the entire Truth certainly won’t cut it. We need to have many threads accessing the Truth simultaneously. At the same time, we don’t want to have to obtain a lock for every single piece of data that we get from the Truth either, because that could be costly too.
So we’re looking for something like this:
Read/get operations should not require any waiting/locking.
Write/set operations may require waiting/locking, but they should only lock the actual object that is being touched. Threads should be able to write to different objects without contention.
One problem is immediately obvious here: If get() doesn’t use any locks, how can we make sure that the data we are reading is not being trashed by a simultaneous set() operation?
Luckily, we are rescued by the concept of atomicity. On modern hardware, writes and reads of aligned 64-bit values are atomic. I.e., other threads will always see either the old 64-bit value or the new 64-bit value, never 32 bits from one and 32 bits from the other. So as long as we are only writing 64-bit values, get() and set() can run simultaneously on the same object without creating problems.
Unfortunately, we also have larger objects that we want to store, such as strings or sets. Luckily, we can handle those with pointers. Instead of writing a string piece by piece, which could result in a garbled up value, we can just write a new char * into the object. This is a 64-bit value which can be written atomically.
In fact, we can take this one step further. Instead of writing individual values we can just create a whole new copy of the object, change some values and then atomically replace the old copy with the new one in a lookup table:

Object lookup table.
Object lookup table.
We could have this copying and replacing be implicit and completely hidden behind the get/set function interface, but that has two drawbacks:
The reader could see objects in a “half-written” state. I.e. even though each individual property would be either fully written or not, the properties themselves could come from different versions of the object. I.e. a reader could see y = 5 and z = 2, an object which never existed.
If a writer wants to set multiple properties (which usually is the case) we would have to copy the object multiple times behind the scenes, which is inefficient.
So instead, we make the interface explicit. We require a read() or write() operation to get a pointer to the object, and a commit() operation to finalize the writes:
const object_o *reader = the_truth->read(tt, id); float x = the_truth->get_float(tt, reader, POSITION_X_PROPERTY); float y = the_truth->get_float(tt, reader, POSITION_Y_PROPERTY); float z = the_truth->get_float(tt, reader, POSITION_Z_PROPERTY); object_o *writer = the_truth->write(tt, id); the_truth->set_float(tt, writer, POSITION_Y_PROPERTY, 5.0f); the_truth->set_float(tt, writer, POSITION_Z_PROPERTY, 5.0f); the_truth->commit(tt, writer);
Note that with this change, the reader’s pointer will either point to the new or the old object, so the reader will either see (7, 3, 2) or (7, 5, 5), never (7, 5, 2).
Also note that what we are doing here is pretty similar to how version control works — we are creating a new version of the object. This is also why I choose the name commit().
Also also note that we can’t delete the old object when we commit(), because there might still be readers that are using that old object (more on that later).
Some notes on consistency
With this design we are saved from reading half-written values. We are also saved from reading half-written objects. But we still can have other consistency problems.
The simplest one is a write conflict. What if two writers start writing the same object simultaneously? They will both get a copy of the old object, change some values and then commit. Whoever commits last will overwrite the other writer’s values.
A more complicated consistency problem can occur when multiple objects depend on one another. For example, a writer might do this:
Remove the reference from object A to object B.
Delete object B.
A reader could read object A before step 1 and discover the reference to B, then after step 2 attempt to access B. This would fail, since B is now deleted.
A classic example of the same problem in the database world is moving money between accounts. With withdrawing the money from one account and adding it to the other as separate operations a reader could see money appearing out of thin air.
Interestingly, we can solve both these problems without forcing the reader to lock.
To solve the write conflict, we can use the compare-and-swap (CAS) instruction which is available on all modern CPUs and also through the atomics library in C11 and C++11. Compare-and-swap atomically writes a 64-bit value only if the value there matches our expectation. Otherwise, it leaves the old value alone. It also tells us whether the write succeeded or not.
We can change commit() so that instead of just writing the new pointer, it does a CAS with the pointer we initially used to obtain our write-copy of the object. This means that the commit() will only succeed if the object hasn’t changed since we obtained it. In our example above, the second writer’s commit() would fail, and the object would keep the values set by the first writer.
To solve consistency problems involving more than one object, such as the money transfer, we need some kind of transaction mechanism. In other words, we need to be able to commit the changes to multiple objects simultaneously and have the reader see all these changes at once. This might seem tricky, initially, because we can only write one atomic value at a time. But we can repeat the trick we did before, when we need to write a bigger value, we just write a pointer to that value instead.
In this case, the pointer we can replace is the pointer to the entire lookup table. By replacing the lookup table, we can replace as many individual objects in it as we like:

Replacing the lookup table.
Replacing the lookup table.
A reader will either have a pointer to the old or the new lookup table. Thus, the reader will either see (A, B, C, D, E) or (A, B’, C, D, E’) and both sets of objects are consistent.
We can combine this with the CAS technique too, and only replace the root pointer if it hasn’t changed. This allows us to make a change to an arbitrary number of objects as a single transaction that will either succeed or fail. Pretty similar to the transaction model you would get in a relational database.
Another interesting thing: If you know how git works on the inside with tree objects and blobs, you can see that this approach is actually very similar to git’s.
The drawback of this approach is that every transaction requires changing the root pointer. (It requires copying the lookup table too, but we can reduce the cost of that by having hierarchical tables, similar to git.) Thus, there will be a lot of contention for the root pointer and that contention will cause a lot of commits to fail. (There is probably some way of being a bit smarter about this to let some of the commits that touch disjoint set of objects through, but I haven’t dug deeper into this.)
In the end, we have to consider what kind of consistency model works for us and if we are willing to pay for it. After all, if performance is not a big deal, we can just use a regular database.
In our case, we are not dealing with critical financial data. It’s not a big deal if a reader sees some object from one write and some objects from another write. So we don’t need the full transaction model.