Data Structures Part 1: Bulk Data

July 24, 2019
protect

Any programmer can benefit from some understanding of different data structures and how to analyze their performance. But in practice, I’ve never found any use for AVL treesred-black treestriesskip lists, etc. Some data structures I just use for one particular algorithm, and nothing else (e.g., heaps to implement the priority queue for A* search).

In most of my day-to-day work, I get by with surprisingly few data structures. What I mostly need is:

  • Bulk data — a way of efficiently storing a large number of objects.

  • Weak references (or handles) — a way of referencing objects in the bulk data without crashing if an object has been deleted.

  • Indices — a way of quickly accessing specific subsets of the bulk data.

  • Arrays of arrays — a way of storing dynamically sized bulk data objects.

In the next few blog posts, I’ll show how I implement these things. Let’s start with the simplest and most useful one — bulk data.

Bulk Data

There is no good term for it that I’m aware of, but I use bulk data to mean any large collection of similar objects. It might be things like:

  • All the bullets in the game.

  • All the trees in the game.

  • All the coins in the game.

Or, if you are writing your code at a higher abstraction level, it might be things like:

  • All the entities in the game.

  • All the meshes in the game.

  • All the sounds in the game.

Typically each system (rendering, sound, animation, physics, …) in the game has a couple of different types of objects that it needs to keep track of. For example, for a sound system it might be:

  • All the sound resources that can be played.

  • All the currently playing sounds.

  • All the effects (fades, pitches, etc) that are being applied to the sounds.

For the bulk data, I will assume that:

  • The order in which the objects are stored doesn’t matter. I.e., we think of it as a set of objects.

  • Each object is represented as a fixed-size POD-struct that can be moved or duplicated with memcpy().

It is certainly possible to think of cases where order does matter. For example, if the objects represent renderable items we might want to sort them front-to-back before rendering to reduce overdraw.

However, in most cases, I think it is preferable to sort the data as it is being used, rather than storing the data in a sorted container, such as a red-black tree or B-tree. For example, we can sort our renderable objects front-to-back before passing them down to the renderer, or sort our files alphabetically before showing them in a list. It might seem expensive to sort the data every frame, but in many cases, we can do it in O(n) with a radix sort.

As for only using POD-structs, I prefer plain C structs to C++ objects, because it is easier to see what is going on in memory and reason about the performance implications. However, there are situations where you might want to store something in the bulk data that doesn’t have a fixed size, like a name or a list of child objects. I will discuss this in a future post when I talk about “arrays of arrays”. For now, let’s just assume that all objects are fixed-size PODs.

As an example, here’s what the bulk data structures for our hypothetical sound system might look like:


typedef struct {
    resource_t *resource;       // Resource manager data
    uint64_t bytes;             // Size of data
    uint64_t format;            // Data format identifier
} sound_resource_t;

typedef struct {
    sound_resource_t *resource; // Resource that's playing
    uint64_t samples_played;    // Number of samples played
    float volume;               // Volume of playing sound
} playing_sound_t;

typedef struct {
    playing_sound_t *sound;     // Faded sound
    float fade_from;            // Volume to fade from
    float fade_to;              // Volume to fade to
    double fade_from_ts;        // Time to start fade
    double fade_to_ts;          // Time to end fade
} playing_fade_t;

When thinking about how to store bulk data, we have a couple of goals:

  • Adding and deleting objects should be fast.

  • The data should be laid out in a cache-friendly way so that we can iterate quickly over it for system updates.

  • It should support referencing — there should be a way to talk about specific objects in the bulk data. In the example above, the fade needs to be able to indicate which sound it is fading. I’ve written the references as pointers in the example but depending on how we implement the bulk data we might use something else.

  • The data should be allocator friendly — it should use a few large allocations, rather than allocating single objects on the heap.

JikGuard.com, a high-tech security service provider focusing on game protection and anti-cheat, is committed to helping game companies solve the problem of cheats and hacks, and providing deeply integrated encryption protection solutions for games.

Read More>>