Virtual Memory Tricks

Nov. 7, 2017
protect

I’ve been planning for some time to make a post about things you can do with virtual memory, so when @jimsagevid tweeted me about virtual memory in response to my last post, I felt the time was right.

 

Virtual memory is funny. As programmers, we know that it’s there (on all modern CPUs and OSs), but we tend to forget about it, perhaps because it’s not directly exposed in our programming languages. Or, we just think of it as the thing that makes our software run really slow instead of crashing, when it runs out of physical RAM.

But, it turns out, if you actually make use of what virtual memory can do, you can achieve some pretty cool things.

Obscenely big array

To recap — in my last post I faced the problem of wanting a big lookup table from object IDs to object pointers. Furthermore, I wanted this table to stay fixed in memory, so that writers could atomically replace pointers in the table without disturbing readers. This means that using something like a std::vector is not possible, since the underlying data will be reallocated when the vector grows.

A fixed array works:


object_o *objects[MAX_OBJECTS].

But what size should we use for the array? Pick a big size, and we’re wasting memory. Pick a small size, and we might run out of objects. In the article, I used a hierarchical approach, but as @jimsagevid pointed out, we could use virtual memory instead, and avoid the complexity of a hierarchical table layout.

When we allocate memory through the virtual memory system we reserve address space for the pages we allocate immediately (no other code can allocate memory in that address range), but actual physical memory is not allocated until we actually touch the pages.

So for this situation, we can just pick a comfortably large number for our array and virtually allocate it. Say 1 billion of objects, for example:


#define MAX_OBJECTS 1000000000ULL
object_o **objects = virtual_alloc(MAX_OBJECTS * sizeof(object_o *));

We’re using 8 GB of address space and virtual memory, but only as much physical memory as we actually need for our objects. A simple solution that only requires one line of code.

Note: I’m using virtual_alloc() here as my OS agnostic virtual memory allocation call. On Windows you would actually call VirtualAlloc() and on Linux mmap().

Another note: Windows separates virtual memory allocation into separate MEM_RESERVE and MEM_COMMIT calls. MEM_RESERVE reserves address space and MEM_COMMIT commits it to physical memory. That does not mean that physical memory is actually allocated when you MEM_COMMIT, the physical memory is still not allocated until you actually touch the pages. But MEM_COMMIT reserves memory in the page file, and if you try to commit more memory than you have in your page file, MEM_COMMIT will fail. So on Windows, you probably don’t want to MEM_COMMIT the entire lookup table (because you only have a limited size page file). Instead you want to MEM_RESERVE the whole table in the beginning and then just MEM_COMMIT the range of the table that you actually use.

In contrast, Linux allows overcommitting — i.e., you can commit more memory than the size of your page file. For this reason, Linux doesn’t need separate reserve/commit operations which makes the virtual memory a lot simpler to use.

Is reserving 8 GB of virtual memory for the array a problem? Not really. There are two limits that concern us here. The first is address space. In a 64-bit application (which we assume), the address space is 264, a staggeringly large number with room for billions of gigabyte-sized arrays. The second limit is for virtual memory. The OS typically doesn’t let us allocate the whole possible address space as virtual memory. On 64-bit Windows, for example, we can only allocate 256 TB of virtual memory. Still, that’s room for 32 000 arrays of 8 GB each, so as long as we don’t go totally crazy, we’re fine.

In fact, you could probably use this technique for all global or semi-global arrays in your application without any problems.

Remember the old-school C way of writing games with static arrays for all objects:


uint32_t num_tanks;
tank_t tanks[MAX_TANKS];

uint32_t num_bullets;
bullet_t bullets[MAX_BULLETS];

...

If you write code like this, you can be sure that someone will complain that it’s not “future-proof”, since you have caps on the number of objects. That is kind of ridiculous, but to satisfy the complaint in a simpler and more fun way than using a std::vector you can just get rid of the MAX_* defines and allocate 1 GB of virtual memory for each array:


#define GB 1000000000
uint32_t num_tanks;
tank_t *tanks = virtual_alloc(GB);
uint32_t num_bullets;
bullet_t *bullets = virtual_alloc(GB);

Application wide unique IDs

A lot of game engine systems need unique IDs to identify objects. Usually the code looks something like this:


uint64_t allocate_id(system_t *sys)
{
    return sys->next_free_id++;
} 

But virtual memory gives us another option. Instead of using integers as identifiers, we could use addresses reserved from the virtual memory system. This allows us to create IDs that are unique, not just in the current system, but throughout the entire application. And since we never actually use the memory pointed to by these addresses, we won’t consume any physical memory.

It might look something like this:


system_id_t *allocate_id(system_t *sys)
{
    if (!sys->id_block || sys->id_block_used == PAGE_SIZE) {
        sys->id_block = virtual_alloc(PAGE_SIZE);
        sys->id_block_used = 0;
    }
    return (system_id_t *)(sys->id_block + sys->id_block_used++);
}

Note that by using a pointer to an opaque struct for the ID we also get some type safety, that we didn’t have with the uint64_t approach.

Memory overwrite detection

One of the hardest bugs to fix is random memory overwrites, where some piece of code is writing to memory that it shouldn’t touch. Then, later, when some other piece of code tries to use that memory, it will read that trash data and (most likely) crash. The problem here is that while the crash itself is easy to find, it is tricky to know where the bad write that corrupted the data to begin with occurred.

Luckily, virtual memory can help us here too.

To see why, first note that the term random memory overwrite is actually a misnomer. Just like regular space, address space is mostly empty. With a 64 bit address space and an application size of say 2 GB, the address space is 99.999999988 % empty. This means that if the memory overwrites were truly random, most likely they would hit this empty space and cause a page fault/access violation. That would give us a crash at the point of the bad write, instead of the innocent read, which would make the bug much easier to find and fix.

But of course, the writes are usually not truly random. Instead they typically fall in one of two categories:

  • Writing to memory that has been freed.

  • Writing beyond the allocated memory for an object.

In both cases, it is pretty likely that the write will actually hit some other object, rather than empty space. In the first case, the memory will probably have been recycled for something else. In the second case, the write will probably hit a neighboring object, or allocation block headers.

We can make these cases look more like the random case by replacing the standard system allocator with an end-of-page allocator. Such an allocator uses the virtual memory system to allocate each object on its own set of pages, and furthermore, it aligns the object so that the end of the object coincides with the end of the last page.

End of page block placement.

This accomplishes two things. First, when we free the pages, we free them in the virtual memory system. This means that attempting to write to the pages after they have been freed will result in an access violation. Note that this typically doesn’t happen with a regular memory allocator — it will save the freed memory in its freelists to be used for other allocations, instead of returning it to the OS.

Second, as the end of the block coincides with the end of the page, writing beyond the end of the block will also cause an access violation. In other words, with this approach, our typical bad memory writes will crash at the point of the write and allow us to easily diagnose the problem.

Since this allocator rounds up all allocations to the page size, it will waste a lot of memory for small allocations. It is probably not something that you want enabled all the time. What I typically do is work with the standard allocator. Then, if I suspect bad memory overwrites, I switch it out for the end-of-page allocator. Once I’ve fixed the problem, I switch back to the standard allocator. This of course requires you to have an architecture where you can easily change which allocator your system is using.

Writing an end end-of-page allocator is not complicated at all. Here’s what malloc looks like:


void *eop_malloc(uint64_t size)
{
    uint64_t pages = (size + PAGE_SIZE - 1) / PAGE_SIZE;
    char *base = virtual_alloc(pages * PAGE_SIZE);
    uint64_t offset = pages * PAGE_SIZE - size;
    return base + offset;
}

Note: There’s still bad writes that could go undetected with this approach. For example, after we’ve freed the pages, a new page allocation could happen in the same memory range. Also, it is possible that another set of pages could be allocated directly after ours in memory, in which case we wouldn’t detect overwrites beyond the last page.

Both these problems can be fixed. For the first issue, we can leave the pages reserved, but not committed. That way, the physical memory is freed and we will get page faults, but the addresses are reserved and cannot be used by other objects. For the second issue, we could reserve an extra page after our pages, but not commit that page. No other object could then claim those addresses and writing to them would still cause an access violation. (Note: This only works on Windows, where reserve and commit are separate operations.)

In practice though, I’ve never needed to take these extra precautions. For me, the basic end-of-page allocator has always been enough to find and fix the bugs.

Fragment free memory allocation

On old console hardware, memory fragmentation could give programmers nightmares. But even on modern machines, memory fragmentation can be a big problem when trying to implement efficient memory allocators.

A memory allocator works by requesting big chunks of memory from the system and chopping them up into smaller pieces on request of the user. Memory fragmentation occurs when only some of those objects are freed, leaving holes in the used memory block:

Memory fragmentation.

The problem here is that if the user requests a big chunk of memory, none of the “holes” may be big enough. This means we have to allocate it from the free memory block at the end — increasing the memory use of the application. In other words, we

JikGuard.com, a high-tech security service provider focusing on game protection and anti-cheat, is committed to helping game companies solve the problem of cheats and hacks, and providing deeply integrated encryption protection solutions for games.

Read More>>