[In this reprinted #altdevblogaday in-depth piece, Turbine senior software engineer Rich Skorski introduces readers to the Windows x64 Application Binary Interface, and explains how it works.] I've become fascinated with x64 code recently, and have taken on a quest to learn about it. There's a fair amount of information on the net, but there isn't nearly as much for x64 as for x86 code. Some of the sources I've found were wishy-washy, too, since they were created before or shortly after the rules were agreed upon. I have found very little in the way of explaining the performance considerations that are not immediately apparent and would come as a surprise to x86 experts. If you're here, I'm sure you're just as interested as I am about it. let me tell you what I know… What is an ABI? ABI stands for Application Binary Interface. It's a set of rules that describe what happens when a function is called in your program, and answers questions like how to handle parameters and the stack for a function call, what registers (if any) are special, how big data types are…those sorts of things. These are the rules that the compiler guys follow when they're determining the correct assembly to use for some bit of code. There are a lot of rules in the x64 ABI, but the rules that are most open to interpretation make up what's known as the calling convention. What is a calling convention? A calling convention is a set of rules in an ABI that describes what happens when a function is called in your program. That only applies to an honest to goodness call. If a function is inlined, the calling convention does not come into play. For x86, there are multiple calling conventions. If you don't know about them, Alex Darby does a great job explaining them: start with C/C++ Low Level Curriculum part 3: The Stack and read the later installments as well. Differing ABIs An ABI can be specific for a processor architecture, OS, compiler, or language. You can use that as the short answer as to why Win32 code doesn't run on a Mac: the ABI is different. Don't let the compiler specific implementation scare you. The rules for an OS and processor are quite solid so they'll all have to follow those. The differences can be in how they define the calling convention. If you think about it, a processor doesn't know exactly what the stack or functions are. Those are the crucial parts of a calling convention. There are processor instructions that facilitate the implementation of the concepts, but it's up to programmers to use them for great justice. The compiler takes care of most of that, so we're at the whims of their implementation when it comes to calling conventions. It's more likely that the calling convention rules are influenced by the programming language than anything else. The finer details will only be a burden if you're linking targets built by different compilers. Even then you might not run into any problems because the calling convention is currently standardized for a given platform. I only mention it in case you read this sometime after it was written and the compilers have diverged. If it comes to that, certainly consult vendor documentation. It's worth highlighting that the idea having multiple calling conventions is unique to 32-bit Windows. The reason for that is partly legacy and partly because there are few registers compared to other architectures. Raymond Chen had a series explaining some of the history. Here's the 1st in the 5 part series: The history of calling conventions, part 1. What do you mean by x64? The label x64 refers to the 64-bit processor architectures that extend the x86 architecture. It's full name is x86-64. You can run x86 code on these processors. The x86-64 moniker might be something you see in hardware documentation, but x64 is probably what you'll see most often AMD and Intel chips have different implementations and instructions, thus their own distinct names. AMD is conveniently named AMD 64. Intel has a few: IA-32e, EMT64, and Intel 64. EMT64 and Intel 64 are synonymous, the latter one being the most prominent in Intel's docs. They say there are "slight incompatibilities" between IA-32e and Intel 64, but I don't know what they are. If you are curious, they're buried somewhere in these docs. What is the x64 hardware like? In my opinion, the best thing x64 offers is more registers. This increase in registers is a big contributing factor for the differences in the x64 calling convention when compared to x86. I'll leave it up to vendor documentation to tell you more about the hardware because I won't have much to add. For a quick reference, Here's what new registers are available to you:
RIP, RAX, RBX, RCX, RDX, RSI, RDI, RBP, RSP
These are extensions of the x86 registers with similar names, note the "R" prefix. They are 64 bits wide.
R8, R9, R10, R11, R12, R13, R14, R15
These are new integer registers. The numbers would be sequential if RAX were considered register 0 and kept on counting. These are also 64 bits bytes wide.
XMM8, XMM9, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15
New SSE registers. They are 128 bits wide.
You can still access certain portions of the registers by using mnemonics like EAX, AX, AH, AL. The new integer registers use different suffixes, and they don't have any mnemonic that's equivalent to the "H" suffix. You can read more about that here: MSDN – x64 Architecture. Are there different x64 calling conventions? On Windows, there is only one calling convention aptly named the "Windows x64 calling convention." On other platforms there is another: the "System V calling convention." That "V" is the roman numeral 5. System V is the only option on those systems. So there are 2 calling conventions, but only 1 will be used on a given platform. There are some similarities between the Windows and System V calling conventions, but don't let that fool you. It would be dangerous to treat them as such. I myself made that mistake (or would have if I were developing outside of a Windows environment). There's also a syscall, which is a direct call to the kernel. There are different rules for calling them as opposed to the functions you'll be writing. I won't be discussing System V or syscalls here. I'm not familiar enough with either to speak well about them, and as a game developer you may never deal with them. But be aware that they exist. A tip of the hat toward consistency A theme you'll see with the Windows x64 calling convention is consistency. The fact that there aren't optional calling conventions like there were for Windows x86 is an example of that. The stack pointer doesn't move around very much, and there aren't many "ifs" in the rules regarding parameter passing. I wasn't part of any decisions about the calling convention, so I can't be certain. But looking at how it turned out I get the impression any decision that may seem peculiar was made for consistency. I'm not suggesting that alternative solutions would have led to unbearable pain and destruction. I'm merely suggesting a reason why the calling convention is the way it is. How does the Windows x64 calling convention work? The first 4 parameters of a function are passed in registers. The rest go on the stack. Different registers will be used for floats vs. integers. Here's what registers will be used and the order in which they'll be used:
Integer: RCX, RDX, R8, R9 Floating-point: XMM0, XMM1, XMM2, XMM3
Integer types include pointers, references, chars, bools, shorts, ints, and longs. Floating-point includes floats and doubles. All parameters have space reserved on the stack, even the ones passed in registers. In fact, there's stack space for 4 parameters even if your function doesn't have any params. Those parameters are 8 bytes so that's at least 32 bytes on the stack for every function (every function actually has at least 48 bytes on the stack…I'll explain that another time). This stack area is called the home space. There are few reasons behind this home space:
If the registers need to be used for something else, the called function can store the data in the home space without moving the stack pointer.
It keeps the stack structure easy to determine. That's very handy for debugging, and perhaps necessary for x64′s stack metadata (another point I'll come back to another time).
No tags.
