In-Depth: Quasi Compile-Time String Hashing

Oct. 31, 2011
In-Depth: Quasi Compile-Time String Hashing

Author: by Stefan Reinalter

[In this reprinted #altdevblogaday-opinion piece, FH Technikum Wien lecturer and former Sproing Interactive senior programmer Stefan Reinalter walks readers through quasi compile-time string hashing step-by-step.] One scenario that is quite common in all game engines is having to look-up some resource (texture, shader parameter, material, script, etc.) based on a string, with e.g. code like the following:

Texture* tex = textureLibrary->FindTexture("my_texture");
ShaderParameter param = shader->FindParameter("matrixWVP");

For today, we are not so much concerned about how the key-value pairs are stored, and how the look-up is done internally (read this excellent post instead), but rather how to get rid of any hashing done at run-time. You might be surprised at how good C++ compilers have become when it comes to evaluating expressions known at compile-time, so let us put that to good use. Assuming all our textures are stored in a table using some kind of (Hash, Texture) key-value pair (be it in a std::map, a simple array, or something more advanced), the code for finding a texture would vaguely look as follows:

Texture* TextureLibrary
     ::FindTexture(const char* name)
{
 const unsigned int hash = CalculateHash(name);
 // lookup the texture in some container using 'hash'
}

So even for constant strings, CalculateHash() will still be called, and the hash has to be generated each and every time this function is called with a constant string. This is something we would like to get rid of. The first step is to ensure that each and every class dealing with hashes makes use of the same functionality, while still retaining the same syntax for users of those classes. So instead of taking a const char* as argument, we could define a simple class holding nothing more than a hash internally, which will be passed to the function by value instead:

class StringHash
{
private:
 unsigned int m_hash;
};

Texture* TextureLibrary::FindTexture(StringHash hash)
{
 // no more hash calculations in here
 // lookup the texture in some container using 'hash'
}

This way, our hashed string class can tackle the problem of distinguishing between constant and non-constant strings, and we can be sure that no hash calculations take place inside the function. But still, how do we make sure that constant strings are not re-hashed every time? A Simple Hash Function Let's assume that hashing strings is done using a simple FNV-1a hash with the following implementation:

unsigned int CalculateFNV(const char* str)
{
 const size_t length = strlen(str) + 1;
 unsigned int hash = 2166136261u;
 for (size_t i=0; i<length; ++i)
 {
  hash ^= *str++;
  hash *= 16777619u;
 }

 return hash;
}

Looking at the above FNV-1a hash implementation, let's try to figure out the resulting hash values for known strings (don't forget about the null terminator):

  • "" = (2166136261u^ 0) * 16777619u

  • "a" = (((2166136261u^ 'a') * 16777619u)^ 0) * 16777619u

  • "ab" = (((((2166136261u^ 'a') * 16777619u)^ 'b') * 16777619u)^ 0) * 16777619u)

The algorithm's offset and prime are compile-time constants (I used 2166136261u and 16777619u in the example above), so these expressions really are constant. Helping The Compiler All we need to do is give the compiler some help, which can be achieved by providing concrete implementations for strings of different lengths. Let's put those into our StringHash class:

class StringHash
{
public:
 ME_INLINE StringHash(const char (&str)[1])
  : m_hash((2166136261u^ str[0]) * 16777619u)
 {
 }

 ME_INLINE StringHash(const char (&str)[2])
  : m_hash((((2166136261u^ str[0]) * 16777619u)^ str[1]) * 16777619u)
 {
 }

 // other implementations omitted
};

In case you're not familiar with the syntax (and admittedly it's one of the more awkward ones in C++), the constructors take references to const-char-arrays of sizes 1, 2, and so on. Providing different implementations for strings of different length tremendously helps the compiler/optimizer to fold the constant expressions into the final hash. In addition, we force each constructor to be inlined using ME_INLINE, which is a platform-independent variant of e.g. __forceinline (Visual Studio 2010), passing an additional hint to the optimizer. Spotting The Pattern Many of you may have already spotted the (offset^constant)*prime pattern in the above examples, and indeed we can easily implement our constructors by utilizing the preprocessor:

#define PREFIX(n, data)		((
#define POSTFIX(n, data)	^ str[n]) * 16777619u)

#define ME_STRING_HASH_CONSTRUCTOR(n)						\
ME_INLINE StringHash::StringHash(const char (&str)[n])				\
 : m_hash(ME_PP_REPEAT(n, PREFIX, ~) 2166136261u ME_PP_REPEAT(n, POSTFIX, ~))	\
{										\
}

class StringHash
{
public:
 ME_STRING_HASH_CONSTRUCTOR(1)
 ME_STRING_HASH_CONSTRUCTOR(2)
 ME_STRING_HASH_CONSTRUCTOR(3)
 ME_STRING_HASH_CONSTRUCTOR(4)
 ME_STRING_HASH_CONSTRUCTOR(5)
 ME_STRING_HASH_CONSTRUCTOR(6)
 ME_STRING_HASH_CONSTRUCTOR(7)
 ME_STRING_HASH_CONSTRUCTOR(8)

 // other constructors omitted
};

Did It Work? With this in place, let's take a look at the assembly code generated by the compiler to see whether our trick really worked. We will use the following simple example for that:

printf("Hash test: %d", StringHash("").GetHash());
printf("Hash test: %d", StringHash("test").GetHash());
printf("Hash test: %d", StringHash("aLongerTest").GetHash());
printf("Hash test: %d", StringHash("aVeryLongTestWhichStillWorks").GetHash());

The resulting assembly code (Visual Studio 2010) looks like this:

printf("Hash test: %d", StringHash("").GetHash());
 01311436 push    50C5D1Fh
 0131143B push    offset string "Hash test: %d" (13341ECh)
 01311440 call    printf (1328DD0h)
printf("Hash test: %d", StringHash("test").GetHash());
 01311445 push    0AA234B7Fh
 0131144A push    offset string "Hash test: %d" (13341ECh)
 0131144F call    printf (1328DD0h)
printf("Hash test: %d", StringHash("aLongerTest").GetHash());
 01311454 push    444D1A47h
 01311459 push    offset string "Hash test: %d" (13341ECh)
 0131145E call    printf (1328DD0h)
printf("Hash test: %d", StringHash("aVeryLongTestWhichStillWorks").GetHash());
 01311463 push    6D9D8B39h
 01311468 push    offset string "Hash test: %d" (13341ECh)
 0131146D call    printf (1328DD0h)

As can be seen from the generated instructions (push 50C5D1Fh, push 0AA234B7Fh, push 6D9D8B39h and push 6D9D8B39h), the compiler/optimizer was indeed able to fold every string into its corresponding hash at compile-time, completely eliminating all traces to any StringHash constructor. Finishing Touches All is well for constant strings, but what about non-constant ones? Of course we might want to use some kind of non-hardcoded string (e.g. a std::string, or strings coming from a file) every now and then, and need to provide a constructor for those as well:

class StringHash
{
public:
 StringHash(const char* str)
  : m_hash(CalculateFNV(str))
 {
 }

 ME_STRING_HASH_CONSTRUCTOR(1)
 ME_STRING_HASH_CONSTRUCTOR(2)
 ME_STRING_HASH_CONSTRUCTOR(3)
 ME_STRING_HASH_CONSTRUCTOR(4)

 // other constructors omitted
};
Tags:

No tags.

JikGuard.com, a high-tech security service provider focusing on game protection and anti-cheat, is committed to helping game companies solve the problem of cheats and hacks, and providing deeply integrated encryption protection solutions for games.

Explore Features>>