In-Depth: C/C++ Low Level Curriculum Part 10: User Defined Types

Jan. 8, 2013

In this reprinted #altdevblogaday opinion piece, Gamer Camp technical director Alex Darby looks in detail at user defined types in C/C++. Hello again peoples of the interweb. It has been quite a while since the last one (probably even longer than the gap between part 8 and part 9) so I thought I ought to pull my finger out and get the next post in the C/C++ Low Level Curriculum done. In the previous posts we’ve covered the structural aspects of the language: flow control, functions, and so forth; and so now we move on to looking in detail at user defined types in C/C++ (i.e. struct, class, and associated keywords) which I naively expected to comprise the bulk of this potentially never ending series when I started it. D’oh! Before we start, dear reader, I’m going to assume that you’re the kind of person / recently self aware google web trawling AI entity who likes to understand your jargon terms and so I will be including appropriate links (probably mostly wikipedia or other ADBAD articles) where appropriate. You may also want to read the previous posts in this series (though I don’t think this one will particularly rely on older posts) so, in case you missed them, here are the back-links for preceding articles in the series (warning: reading these might take a while…) : Part 1 Part 2 Part 3 Part 4 Part 5 Part 6 Part 7 Part 8 Part 9

Data Types and Enums

We covered fundamental and intrinsic types in the second post in the series, which also touched on the enum keyword. I deliberately didn’t cover the use of the keywords struct or class in this post, but we did cover some facts about the behaviour of values defined using the enum keyword (i.e. that it was up to the compiler to decide what intrinsic type to use to represent each enumerated type you declare, based on the range required by its values). Helpfully, the C++11 standard made some sweeping changes to the behaviour of enums; amongst which was the ability to specify the the fundamental type used to represent the values of each enum. Tasty. Mentioning this welcome change is the extent of our discussion of enum, so let’s get on with starting to look at struct, class, and union.

Thankyou Visual Studio Devteam

If you have been really paying attention to the older posts, you might remember that I mentioned some undocumented (and unsupported!) command line options for Microsoft’s Visual Studio C++ compiler which can be used to print out the memory layout of data types defined using the struct, class, orunion keywords. These secret compiler options are /d1reportAllClassLayout which reports the layout of all classes in the current project, and its more user friendly sibling /d1reportSingleClassLayoutxxx (where xxx is a string used to do a substring match against classes that you wish to have reported). I will be leaning pretty heavily on this compiler for the next few posts, so we may as well cover how to use it. It definitely works in VS2010 and VS2012; it even works with the Express versions. Woo! Here’s where you type in the command line option in the property pages (n.b. this is the ‘single’ version and matches any class or struct with the string ‘Test’ in its name):

Output from /d1reportSingleClassLayout

So far so froody. Now, it’s about time we looked at a code snippet defining a simple POD struct (POD types being the simplest cases of aggregate data types) and the output produced by /d1reportSingleClassLayout when we build it…

#include "stdafx.h"
struct STest
{
    int iA;
    int iB;
};
int main(int argc, char* argv[])
{
    return 0;
}

When we compile this with the fancy secret compiler switch, as expected we find an extra bit of information in amongst the usual Visual Studio compiler’s output:

1> class STest size(8):
1>   +---
1> 0 | iA
1> 4 | iB
1>   +---

Hopefully this should appear pretty much self explanatory to you, but in case it doesn’t – rest assured we’re about to look at it in a little more detail. The first line contains the name of the class and its size in bytes – STest is a struct, but it is reported as a class – don’t worry about this for now. The struct‘s name contains the string ‘Test’ which is the substring we specified to match against in the compiler option in order to get class layout information. The rest of the information details the member-by-member memory layout of the struct organised by the name of the data members – the number at the start of the line is the memory offset in bytes of that member relative to the start of the struct. The first thing to note is that the member variables are laid out in memory in the order specified in the class declaration. A guarantee is given in both the C and C++ language specifications that memory address of each member will be higher than that of the one declared before it (see this post on Stack Overflow for more detail of the wording). In the case of STest the first member iA is at an offset of 0 bytes from the start of the struct; and the second member iB is at an offset of 4 bytes from the start of the struct. Importantly (by doing a little maths with the offsets and the size of the struct) this also tells us that the size taken up by iA is 4 bytes, and the size taken up by iB is 4 bytes – since sizeof(int) == 4 this matches up with what we would expect.

Accessing the members of a struct in assembly

We all knew this was coming, right? Woo! I know you all live for hexadecimal numbers and assembler mnemonics. As always, the main thing I want you to take away from this is not so much the understanding of the specific assembly code itself (though clearly it has its benefits…), but more of a generalised appreciation for the combinations of assembly instructions that ‘smell like’ the compiler accessing the members of a struct or class. Getting used to the assembly level ‘smells’ of the various high level constructs in compiler generated assembly code will enable you to find your bearings much more quickly in code you see in the disassembly window, and – most importantly (assuming that you are lucky enough to have a valid callstack – and, like a sensible person, you have symbols for your release build) – you should quickly develop the ability to work out which bit of the high level code corresponds to the assembly you’re currently looking at. Win. Here’s a code snippet that accesses the data members of the struct we just defined:

#include "stdafx.h"
struct STest
{
    int iA;
    int iB;
};
int main(int argc, char* argv[])
{
    STest sOnStack;
    sOnStack.iA = 1;
    sOnStack.iB = 2;
    STest* psOnHeap = new STest;
    psOnHeap->iA = 3;
    psOnHeap->iB = 4;
    delete psOnHeap;
    return 0;
}

Before we look at the disassembly we should explain a little about the snippet. Two instances of STest are created:

sOnStack on the Stack – i.e. automatically allocated by the compiler as a local variable
psOnHeap on the Heap – i.e. dynamically allocated.

The reasons for doing this will become clear once we’ve inspected the assembly

Aside: technically the area of dynamic memory managed by new and delete in C++ is called the Free Store, but almost everyone calls it the Heap. I’m pretty sure this is because the dynamic memory in C managed by malloc and free has colloquially and historically been known as “the Heap”, and a lot of C++ implementations define new and delete using malloc and free (and most if not all used to).

So here’s the disassembly generated by the VS2010 debug compiler:

    14: STest sOnStack;
    15: sOnStack.iA = 1;
00A01269 mov dword ptr [ebp-8],1 
    16: sOnStack.iB = 2;
00A01270 mov dword ptr [ebp-4],2 
    17: 
    18: STest* psOnHeap = new STest;
00A01277 push 8 
00A01279 call 00A010F5 
00A0127E add esp,4 
00A01281 mov dword ptr [ebp-54h],eax 
00A01284 mov eax,dword ptr [ebp-54h] 
00A01287 mov dword ptr [ebp-0Ch],eax 
    19: psOnHeap->iA = 3;
00A0128A mov eax,dword ptr [ebp-0Ch] 
00A0128D mov dword ptr [eax],3 
    20: psOnHeap->iB = 4; 
00A01293 mov eax,dword ptr [ebp-0Ch] 
00A01296 mov dword ptr [eax+4],4

Looking at lines 3 and 6 (and remembering what we learned in post 2 about how variables in memory are accessed in assembly); we can see that both sOnStack.iA and sOnStack.iB are being directly accessed by their memory addresses as offsets from ebp ([ebp-8] and [ebp-4] respectively). Looking at lines 15-16 and lines 18-19, we can see that psOnHeap.iA and psOnHeap.iB are being accessed differently. Since this is different to what we have seen before, let’s break it down a little:

For each of these assignments, first the pointer psOnHeap (i.e. memory address of the instance of STest created at line 7) is loaded into eax (line 15 and line 18), and…
… then the member is accessed via the memory address stored in eax (line 16 and line 19 – via [eax] and [eax+4] respectively).

In particular, note that when STest::iB is accessed (at address [eax+4] - line 19) an 4 byte offset is added, which is exactly the offset that the output from /d1reportSingleClassLayout gave us. Hopefully it should now be pretty obvious why the instances of STest are accessed differently like this – and by extension why I showed code accessing an instance on the Stack and on the Heap (via a pointer):

When an instance of a user defined type is on the Stack, the compiler is in charge of where the instance is stored (relative to the stack frame); and so it can access its members by their direct offsets within the stack frame.
When an instance is stored in a memory location that is not known at compile time (e.g. accessed via a pointer) the compiler can’t do this and has to access it via offsets from the instance’s base address (i.e. the memory address the instance starts at).

Tags:

No tags.

Subscribe to our newsletter

About JikGuard.com

JikGuard.com, a high-tech security service provider focusing on game protection and anti-cheat, is committed to helping game companies solve the problem of cheats and hacks, and providing deeply integrated encryption protection solutions for games.

Explore Features>>

Top

Microsoft no longer selling Movies and TV on Xbox (or anywhere)

In-Depth: C/C++ Low Level Curriculum Part 10: User Defined Types

Data Types and Enums

Thankyou Visual Studio Devteam

Output from /d1reportSingleClassLayout

Accessing the members of a struct in assembly

Top

Tags

Recent

Blog

Random

Most Views