Minimalist Container Library in C (part 2)

Jan. 30, 2018
protect

In my last blog post I talked about how we only have two basic container types in our foundation library: arrays (“stretchy buffers”) and hashes. Today I’ll elaborate a bit on how we create more complicated types from these basic building blocks.

 

Sometimes you don’t need a dynamic container

Before I get started on that though, I want to quickly note that in many cases you don’t even need a dynamic container. If you know the maximum number of items you are ever going to store, you can just use a regular fixed size C array:


enum {MAX_ITEMS = 64};
uint32_t num_items;
item_t items[MAX_ITEMS];

A caveat here: you should be careful with what you assume when you size these arrays. MAX_MONITORS = 8 may seem reasonable, but some people are crazy.

On the other hand, there is no reason to go to the other extreme either and require everything to be infinite. A menu bar with four billion items is not going to be usable in practice. To figure out when you can use a fixed size array consider this:

  • When you define the API, you can set its boundary conditions. For example, it is perfectly reasonable to say that your API only supports window titles of 512 characters or less, that your log window will only remember the last 1024 messages, etc, etc.

  • Just make sure that your code behaves nice at the boundary conditions. I.e., if someone passes in a longer title, you should truncate it rather than overrun the buffer.

When you don’t define the API you don’t have that luxury. I.e. if the OS supports infinite filenames, your application needs to do it too.

The nice thing about fixed sized arrays is that you can allocate them on the stack or store them in another array without any need for pointer indirection, which makes the code simpler and is nice on the cache. For example:


enum {MAX_PLAYER_NAME_LENGTH = 32};
enum {MAX_PLAYERS = 16};
struct player_t {
    char name[MAX_PLAYER_NAME_LENGTH];
    uint32_t score;
};
struct players_t {
    struct player_t players[MAX_PLAYERS];
    uint32_t num_players;
};

In C++ land you sometimes see fixed size arrays represented by a template class FixedArray<T>. In my opinion, that’s a totally unnecessary overabstraction, since operations on a fixed size array can be written as simple one-liners:


// Push back: items.push_back(x)
items[num_items++] = x;

// Pop back: items.pop_back()
--num_items;

// Swap erase: std::swap(items.begin() + idx, items.back()); items.pop_back();
items[idx] = items[--num_items];

Some people don’t like code like this. They will say that any code that uses postfix and prefix operators is “full of tricks” and “unreadable”. Needless to say, I don’t agree.

The first time you see these operations they may seem tricky and you might need to take a second to figure out what they do, but once you get used to them you will start to immediately recognize them as common idioms. It’s nice to be able to write them tersely as a one-liner. It is also nice to have the code right there and not have to go look up a macro or function definition in a different file to figure out what’s going on.

While I’m talking about idioms — here’s the idiom for writing an iterator-style for loop in C:


for (item_t *it = items; it != items + num_items; ++it)
  // Do something with it

Note how this is shorter than the C++ counterpart:


std::vector<item_t>::iterator end = items.end();
for (std::vector<item_t>::iterator it = items.begin(); it != end; ++it)
    // Do something with it

The “modern C++” variant is shorter, but consider how many concepts you need to know to understand it: iterators, auto, references:


for (auto &it : items)
    // Do something with it

Quick: should that be auto, auto & or auto &&?

In addition to fixed size arrays, fixed size hash tables can also sometimes be useful. Sometimes you have a fixed number of items to look up and don’t need the hash table to grow beyond that. Here’s a little 30-line implementation:


static const uint64_t HASH_UNUSED = 0xffffffffffffffffULL;
typedef struct hash32_static_t
{
    uint64_t *keys;
    uint32_t *values;
    uint32_t n;
} tm_chash32_static_t;

static inline void hash32_static_clear(hash32_static_t *h)
{
    memset(h->keys, 0xff, sizeof(*h->keys) * h->n);
}

static inline void hash32_static_set(hash32_static_t *h, uint64_t key, uint32_t value)
{
    uint32_t i = key % h->n;
    while (h->keys[i] != key && h->keys[i] != HASH_UNUSED)
        i = (i + 1) % h->n;
    h->keys[i] = key;
    h->values[i] = value;
}
static inline uint32_t hash32_static_get(const hash32_static_t *h, uint64_t key)
{
    uint32_t i = key % h->n;
    while (h->keys[i] != key && h->keys[i] != HASH_UNUSED)
        i = (i + 1) % h->n;
    return h->keys[i] == HASH_UNUSED ? 0 : h->values[i];
}

Strings

The Machinery doesn’t have a data type for strings. I wrote a blog post about this a long time ago. To sum it up, I find std::string to be highly overrated:

  • You should almost never use strings in your runtime (prefer hashes).

  • The strings you use should almost always be static. Thus, they can be represented by a const char * and do not need a string class.

  • For the few cases where you need dynamic (growing/shrinking strings), std::string is rarely a good choice. (Appending lots of strings is an O(n^2) operation.)

So how do we deal with dynamic strings? My experience is that most of the time when you do some dynamic string operation, you only need the result temporarily. For example, you might write something like:


std::string path = dir + "/" + file;
open_file(path);

In Our Machinery we handle this with a printf function that uses our temporary allocator system:


char *p = tm_temp_allocator_api.printf(ta, "%s/%s", dir, file);
open_file(p);

This call allocates memory for the string using a temporary allocator, prints the formatted string to that memory and returns it. Unless you allocate a lot of memory, the temporary allocator can get its memory from the stack, which means it doesn’t even touch the heap. Also, I personally prefer using printf() over appending std::strings with operator+.

The other common use case for dynamic strings is when you build up a long string by adding multiple pieces to it. In C++ you might use a std::ostringstream for this, if you happen to like the <iostream> interface, which I don’t. So instead we use — you guessed it — printf!

For this case, we have a separate printf() function that appends the printed text to a stretchy buffer. Here is how it’s implemented, based on the stretchy buffer macros discussed in the last post:


static void array_vprintf(char **a, const char *format, va_list args)
{
    va_list args2;
    va_copy(args2, args);
    int32_t n = vsnprintf(NULL, 0, format, args2);
    va_end(args2);
    uint32_t an = tm_carray_size(*a);
    array_ensure(*a, an + n + 1);
    vsnprintf(*a + an, n + 1, format, args);
    array_resize(*a, an + n);
}
static void array_printf(char **a, const char *format, ...)
{
    va_list args;
    va_start(args, format);
    array_vprintf(a, allocator,format, args);
    va_end(args);
}

// And the usage:
char *a = 0;
array_printf(&a, "Some text: %d\n", 3);
array_printf(&a, "More text: %d\n", 4);

Note the rarely used va_copy() to duplicate the va_list so that we can call vsnprintf() twice — once to get the size of printed string so we can allocate memory for it, and a second time to generate the output. Also note the trick of allocating memory for the NUL byte at the end of the string, but not including that in the size of the array. That way, the array is always terminated by a “virtual” zero and we can use all the normal C string operations on it.

Arrays of arrays

std::vector<T> properly calls constructors and destructors as objects are added and removed. This means that we can use it to represent things such as an array of arrays: std::vector< std::vector<int> >.

Our stretchy buffer implementation treats it’s elements as raw memory blocks, and doesn’t call constructors or destructors, so if you naively try to make an array of arrays it doesn’t work.

What we do instead is store array pointers and then manually allocate/deallocate them as necessary. For example, we might have a setup like this:


struct item_t
{
   /* stretchy buffer */ int32_t *ints;
   ...
};

/* stretchy buffer */ item_t *items;

Now, before we free the stretchy buffer of items, we must loop over the elements and make sure we free all the ints buffers:


for (item_t *i = items; i != array_end(items); ++i)
    array_free(i->ints);
array_free(items);

Part of the argument for destructors in C++ is that this kind of programming is dangerous. It is easy to forget to delete the i->ints object and create a memory leak.

However, in The Machinery we track all memory allocations and at the shutdown of the program we verify that all allocations have been freed. If memory is leaked somewhere we print an error message with the file and the line number where the leak happened, allowing it to be promptly fixed.

This means that this programming practice is much less dangerous. Indeed, since I started using allocation tracking (several years ago), bugs like this have never been a problem. If you don’t yet track and verify your memory allocations, I strongly suggest that you do so.

JikGuard.com, a high-tech security service provider focusing on game protection and anti-cheat, is committed to helping game companies solve the problem of cheats and hacks, and providing deeply integrated encryption protection solutions for games.

Read More>>