Hello again peoples of the interweb. It has been quite a while since the last one (probably even longer than the gap between part 8 and part 9) so I thought I ought to pull my finger out and get the next post in the C/C++ Low Level Curriculum done.
In the previous posts we’ve covered the structural aspects of the language: flow control, functions, and so forth; and so now we move on to looking in detail at user defined types in C/C++ (i.e. struct, class, and associated keywords) which I naively expected to comprise the bulk of this potentially never ending series when I started it. D’oh!
Before we start, dear reader, I’m going to assume that you’re the kind of person / recently self aware google web trawling AI entity who likes to understand your jargon terms and so I will be including appropriate links (probably mostly wikipedia or other ADBAD articles) where appropriate.
You may also want to read the previous posts in this series (though I don’t think this one will particularly rely on older posts) so, in case you missed them, here are the back-links for preceding articles in the series (warning: reading these might take a while…) :
- http://altdevblogaday.com/2011/11/09/a-low-level-curriculum-for-c-and-c/
- http://altdevblogaday.com/2011/11/24/c-c-low-level-curriculum-part-2-data-types/
- http://altdevblogaday.com/2011/12/14/c-c-low-level-curriculum-part-3-the-stack/
- http://altdevblogaday.com/2011/12/24/c-c-low-level-curriculum-part-4-more-stack/
- http://altdevblogaday.com/2012/02/07/c-c-low-level-curriculum-part-5-even-more-stack/
- http://altdevblogaday.com/2012/03/07/c-c-low-level-curriculum-part-6-conditionals/
- http://www.altdevblogaday.com/2012/04/10/cc-low-level-curriculum-part-7-more-conditionals/
- http://www.altdevblogaday.com/2012/05/07/cc-low-level-curriculum-part-8-looking-at-optimised-assembly/
- http://www.altdevblogaday.com/2012/09/04/cc-low-level-curriculum-part-9-loops/
Data Types and Enums
We covered fundamental and intrinsic types in the second post in the series, which also touched on the enum keyword. I deliberately didn’t cover the use of the keywords struct or class in this post, but we did cover some facts about the behaviour of values defined using the enum keyword (i.e. that it was up to the compiler to decide what intrinsic type to use to represent each enumerated type you declare, based on the range required by its values).
Helpfully, the C++11 standard made some sweeping changes to the behaviour of enums; amongst which was the ability to specify the the fundamental type used to represent the values of each enum. Tasty.
Mentioning this welcome change is the extent of our discussion of enum, so let’s get on with starting to look at struct, class, andunion.
Thankyou Visual Studio Devteam
If you have been really paying attention to the older posts, you might remember that I mentioned some undocumented (and unsupported!) command line options for Microsoft’s Visual Studio C++ compiler which can be used to print out the memory layout of data types defined using the struct, class, orunion keywords.
These secret compiler options are /d1reportAllClassLayout which reports the layout of all classes in the current project, and its more user friendly sibling /d1reportSingleClassLayoutxxx (where xxx is a string used to do a substring match against classes that you wish to have reported).
I will be leaning pretty heavily on this compiler for the next few posts, so we may as well cover how to use it. It definitely works in VS2010 and VS2012; it even works with the Express versions. Woo!
Here’s where you type in the command line option in the property pages (n.b. this is the ‘single’ version and matches any class or struct with the string ‘Test’ in its name):
Output from /d1reportSingleClassLayout
So far so froody.
Now, it’s about time we looked at a code snippet defining a simple POD struct (POD types being the simplest cases of aggregate data types) and the output produced by /d1reportSingleClassLayout when we build it…
1 2 3 4 5 6 7 8 9 10 11 12 | #include "stdafx.h" struct STest { int iA; int iB; }; int main(int argc, char* argv[]) { return 0; } |
When we compile this with the fancy secret compiler switch, as expected we find an extra bit of information in amongst the usual Visual Studio compiler’s output:
1> class STest size(8): 1> +--- 1> 0 | iA 1> 4 | iB 1> +---
Hopefully this should appear pretty much self explanatory to you, but in case it doesn’t – rest assured we’re about to look at it in a little more detail.
The first line contains the name of the class and its size in bytes – STest is a struct, but it is reported as a class – don’t worry about this for now.
The struct‘s name contains the string ‘Test’ which is the substring we specified to match against in the compiler option in order to get class layout information.
The rest of the information details the member-by-member memory layout of the struct organised by the name of the data members – the number at the start of the line is the memory offset in bytes of that member relative to the start of the struct.
The first thing to note is that the member variables are laid out in memory in the order specified in the class declaration.
A guarantee is given in both the C and C++ language specifications that memory address of each member will be higher than that of the one declared before it (see this post on Stack Overflow for more detail of the wording).
In the case of STest the first member iA is at an offset of 0 bytes from the start of the struct; and the second member iB is at an offset of 4 bytes from the start of the struct.
Importantly (by doing a little maths with the offsets and the size of the struct) this also tells us that the size taken up by iA is 4 bytes, and the size taken up by iB is 4 bytes – since sizeof(int) == 4 this matches up with what we would expect.
Accessing the members of a struct in assembly
We all knew this was coming, right?
Woo! I know you all live for hexadecimal numbers and assembler mnemonics.
As always, the main thing I want you to take away from this is not so much the understanding of the specific assembly code itself (though clearly it has its benefits…), but more of a generalised appreciation for the combinations of assembly instructions that ‘smell like’ the compiler accessing the members of a struct or class.
Getting used to the assembly level ‘smells’ of the various high level constructs in compiler generated assembly code will enable you to find your bearings much more quickly in code you see in the disassembly window, and – most importantly (assuming that you are lucky enough to have a valid callstack – and, like a sensible person, you have symbols for your release build) – you should quickly develop the ability to work out which bit of the high level code corresponds to the assembly you’re currently looking at. Win.
Here’s a code snippet that accesses the data members of the struct we just defined:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | #include "stdafx.h" struct STest { int iA; int iB; }; int main(int argc, char* argv[]) { STest sOnStack; sOnStack.iA = 1; sOnStack.iB = 2; STest* psOnHeap = new STest; psOnHeap->iA = 3; psOnHeap->iB = 4; delete psOnHeap; return 0; } |
Before we look at the disassembly we should explain a little about the snippet.
Two instances of STest are created:
- sOnStack on the Stack – i.e. automatically allocated by the compiler as a local variable
- psOnHeap on the Heap – i.e. dynamically allocated.
The reasons for doing this will become clear once we’ve inspected the assembly.
Aside: technically the area of dynamic memory managed by new and delete in C++ is called the Free Store, but almost everyone calls it the Heap. I’m pretty sure this is because the dynamic memory in C managed by malloc and free has colloquially and historically been known as “the Heap”, and a lot of C++ implementations define new and delete using malloc and free (and most if not all used to).
So here’s the disassembly generated by the VS2010 debug compiler:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | 14: STest sOnStack; 15: sOnStack.iA = 1; 00A01269 mov dword ptr [ebp-8],1 16: sOnStack.iB = 2; 00A01270 mov dword ptr [ebp-4],2 17: 18: STest* psOnHeap = new STest; 00A01277 push 8 00A01279 call 00A010F5 00A0127E add esp,4 00A01281 mov dword ptr [ebp-54h],eax 00A01284 mov eax,dword ptr [ebp-54h] 00A01287 mov dword ptr [ebp-0Ch],eax 19: psOnHeap->iA = 3; 00A0128A mov eax,dword ptr [ebp-0Ch] 00A0128D mov dword ptr [eax],3 20: psOnHeap->iB = 4; 00A01293 mov eax,dword ptr [ebp-0Ch] 00A01296 mov dword ptr [eax+4],4 |
Looking at lines 3 and 6 (and remembering what we learned in post 2 about how variables in memory are accessed in assembly); we can see that both sOnStack.iA and sOnStack.iB are being directly accessed by their memory addresses as offsets from ebp ([ebp-8] and [ebp-4] respectively).
Looking at lines 15-16 and lines 18-19, we can see that psOnHeap.iA and psOnHeap.iB are being accessed differently.
Since this is different to what we have seen before, let’s break it down a little:
- For each of these assignments, first the pointer psOnHeap (i.e. memory address of the instance of STest created at line 7) is loaded into eax (line 15 and line 18), and…
- … then the member is accessed via the memory address stored in eax (line 16 and line 19 – via [eax] and [eax+4] respectively).
In particular, note that when STest::iB is accessed (at address [eax+4] - line 19) an 4 byte offset is added, which is exactly the offset that the output from /d1reportSingleClassLayout gave us.
Hopefully it should now be pretty obvious why the instances of STest are accessed differently like this – and by extension why I showed code accessing an instance on the Stack and on the Heap (via a pointer):
- When an instance of a user defined type is on the Stack, the compiler is in charge of where the instance is stored (relative to the stack frame); and so it can access its members by their direct offsets within the stack frame.
- When an instance is stored in a memory location that is not known at compile time (e.g. accessed via a pointer) the compiler can’t do this and has to access it via offsets from the instance’s base address (i.e. the memory address the instance starts at).
NOTE: this is debug disassembly code, please do not attempt to infer anything about the relative efficiency of Stack vs. Heap memory from this! As far as I am aware, on every machine I’ve ever used Stack and Heap are both stored in the same memory and accessed via the same physical systems so in terms of theoretical minimum access speeds Stack == Heap.
What about class?
The short answer to this question is that there is no difference whatsoever between classes and structs at the implementational level of C++.
The long answer is that, in the C++ language, struct is actually a special case of class with one specific difference – for classes any unsupplied access specifier (i.e. public, protected, or private anywhere in the type declaration) will default to private, but for structs it will default to public.
That’s it. The only difference. Honest.
Access specifiers are ultimately just language level syntactic sugar to allow us to control the way our classes are used; under the hood struct and class are implemented the same way – even with regards to stuff like inheritance and virtual functions.
If you do a search and replace of struct for class in the snippet and add a public: to the top of each class declaration (so it compiles) you will get the exact same output from the class layout information and the same disassembly.
Union
As well as class and struct, there is another keyword that can define a type – the keyword union.
It’s not a frequently seen or used language feature, and so it’s all the more worthwhile discussing here because it can be very useful and its low frequency of use means that a lot of people don’t really know what it’s for, let alone how it works.
Let’s look at this with another example code snippet. This has had two new types added to it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | #include "stdafx.h" struct STest { int iA; int iB; }; struct STestTwo { int iC; int iD; int iE; }; union UTestUnion { STest sTest; STestTwo sTestTwo; }; int main(int argc, char* argv[]) { UTestUnion* psOnHeap = new UTestUnion; psOnHeap->sTest.iA = 1; psOnHeap->sTestTwo.iC = 2; psOnHeap->sTest.iB = 3; psOnHeap->sTestTwo.iD = 4; psOnHeap->sTestTwo.iE = 5; delete psOnHeap; return 0; } |
Compiling this with /d1reportSingleClassLayout we get the following output for the layouts:
1> class STest size(8): 1> +--- 1> 0 | iA 1> 4 | iB 1> +--- 1> 1> class STestTwo size(12): 1> +--- 1> 0 | iC 1> 4 | iD 1> 8 | iE 1> +--- 1> 1> class UTestUnion size(12): 1> +--- 1> 0 | STest sTest 1> 0 | STestTwo sTestTwo 1> +---
The first thing to note is that UTestUnion is the same size as STestTwo. This is exactly as one would expect.
The second thing to note is that both UTestUnion::sTest and UTestUnion::sTestTwo have an offset of 0 bytes within UTestUnion. Again, exactly as one would expect.
So, why is this the case?
The keyword union allows you specify multiple layouts for a chunk of memory. When we declare the union of STest and STestTwo within UTestUnion, we declare our intent to be able to treat the memory of type UTestUnion as either an instance of STest or an instance of STestTwo at our discretion.
This means that, within the type UTestUnion, an instance of STest and an instance of STestTwo exist overlaid on each other. Since the union can be treated as either type, this means that it must necessarily have the same size as the larger of the two types.
Let’s back this up by looking at the disassembly:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | 29: psOnHeap->sTest.iA = 1; 0122127C mov eax,dword ptr [ebp-4] 0122127F mov dword ptr [eax],1 30: psOnHeap->sTestTwo.iC = 2; 01221285 mov eax,dword ptr [ebp-4] 01221288 mov dword ptr [eax],2 31: 32: psOnHeap->sTest.iB = 3; 0122128E mov eax,dword ptr [ebp-4] 01221291 mov dword ptr [eax+4],3 33: psOnHeap->sTestTwo.iD = 4; 01221298 mov eax,dword ptr [ebp-4] 0122129B mov dword ptr [eax+4],4 34: 35: psOnHeap->sTestTwo.iE = 5; 012212A2 mov eax,dword ptr [ebp-4] 012212A5 mov dword ptr [eax+8],5 |
There it is, clear as day :)
In case you’re not seeing it, here’s a quick breakdown:
- we can see that psOnHeap is stored at [ebp-4].
- (line 2) UTestUnion::sTest::iA and (line 5) UTestUnion::sTestTwo::iC are both being accessed directly via the value loaded into eax from [ebp-4] - i.e. at an offset of 0 bytes; the same as their offset within their respective types as shown in the class layout information.
- (line 9) UTestUnion::sTest::iB and (line 12) UTestUnion::sTestTwo::iD are both being accessed via [eax+4] at an offset of 4 bytes from the value loaded into eax from [ebp-4]. Again, the same as their offsets within their respective types as shown in the class layout information.
- (line 16) UTestUnion::sTestTwo::iE is accessed via [eax+8] – an offset of 8 bytes as specified in the class layout information.
A more ‘real world’ example of the use of union might be a data structure used in a vector maths library similar to the one below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | class CTestVec4 { union { struct { float x; float y; float z; float w; }; struct { float vec[ 4 ]; }; }; }; |
The above code declares a vector structure whose data can be accessed either via its components or like an array – e.g. CTestVec4::z occupies the same memory as CTestVec4::vec[ 2 ].
The code looks like it should be illegal but it isn’t – leaving all the names out is entirely deliberate, this defines an “anonymous union” which makes the syntax for accessing the union ‘less cumbersome’ (i.e. basically just less typing).
If you weren’t sure how union worked, or indeed what it was for, now you know :)
Surely we’re due a spanner in the works about now?
We most certainly are! Well spotted.
There is an incredibly important low level aspect of the way memory is laid out within classes and structs that I have deliberately skimmed over until now.
Consider the following snippet containing an innocent looking struct declaration:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | #include "stdafx.h" struct STestSpanner { int iOne; double fdTwo; char chThree; int iFour; }; int main(int argc, char* argv[]) { return 0; } |
When we compile this we get the following output from the class layout information:
1> class STestSpanner size(24): 1> +--- 1> 0 | iOne 1> | <alignment member> (size=4) 1> 8 | fdTwo 1> 16 | chThree 1> | <alignment member> (size=3) 1> 20 | iFour 1> +---
What witchcraft is this!?!
We’ve added a char and a double to our struct - requiring a total of an extra 9 bytes (on win32 sizeof(char) == 1 and sizeof(double) == 8).
However, the total size of the struct has increased by 16 bytes – in addition to the 9 bytes we know we asked for, we have also added an additional 7 bytes of invisible ‘alignment member‘ fields.
What is going on? Padding – that’s what.
Padding!?!
As long as the layout of the data members of a type meet the ordering requirement of the language standard (as covered earlier) they do not have to be immediately adjacent in memory.
The compiler is free, encouraged even, to insert additional padding into your structs / classes at its discretion.
Why might the compiler wish to adjust the layout like this?
The short answer is: to optimise for speed of memory access for the intrinsic types stored in that structure.
The longer answer goes something like this…
On each platform, the various intrinsic types have different sizes and typically also different memory alignment requirements.
In the best case, accessing an intrinsic type with an alignment requirement from an incorrectly aligned memory address will cause the memory access time to be slower than usual (on x86 the cost is typically relatively small, but can be an order of magnitude slower on other platforms); and in the worst case can actually cause the CPU to crash (yes, really – on some platforms unaligned access makes the CPU freak out).
There are three separate factors at play in determining the size of STestSpanner:
- The logical number of bytes required by its constituent types – this is the minimum possible size of the type.
- The ordering of the constituent types within the class declaration.
- The individual alignment requirements of those constituent types.
The compiler honours both the ordering of the constituent types and their individual alignment requirements, and this interaction determines the amount of padding bytes that get added.
Since they affect each other and are closely related, the distinction between alignment, packing, and padding often causes confusion:
- Alignment is a constraint on the start address of instances of a type in memory
- Packing is a constraint on the alignment of adjacent members within the memory of a struct or class
- Padding is the bytes added within a class or struct to maintain its packing constraints
This Stack Overflow question has a good a discussion of the implications of unaligned access on x86 for those of you who are interested.
Sorry, why the padding?
As we have covered before, compiler writers are wily.
You must be able to declare an array of any type that you define, and so the memory layout of a struct or class you define must not only maintain the alignment constraints of its constituent intrinsic types within an instance, but also across an array of instances laid out contiguously in memory.
The simplest way to ensure this is to make the internal structure of the type adhere to the largest alignment constraint of its constituent types – and this means that the size and internal packing of any structure you declare will usually end up being determined by the largest alignment requirement of its constituent types.
In the case of our struct this would be double which has a default alignment of 8 bytes (at least under the Visual Studio x86 complier, it varies with other compilers), and consequently so does the structure STest – hence the 7 extra bytes of padding to take the struct from the 17 bytes ( 2 ints (8bytes) + 1 double (8 bytes) + 1 char (1 byte) ) of data we asked it to contain up to 24 bytes – i.e. the next size that maintains the alignment of double.
You should find that, no matter how you shuffle the members of STestSpanner around, you still end up with a 24 byte structure that includes 7 bytes of padding.
On the plus side, if we needed to add 3 extra char and an extra int into STestSpanner we would get that storage space for free as long as we put them in the correct positions in the type declaration :)
But what about all the wasted bytes?
The compiler knows what it is doing, and 99% of the time you should not worry about the wasted space.
Get a cup of tea and a biscuit and make peace with it – it’s not wasted space, it’s space invested in making your memory access more efficient.
However, you should worry about it a little – because it is entirely possible to cause the compiler to introduce padding into a type that is actually a total waste of memory.
Consider this struct:
1 2 3 4 5 6 | struct STestSpannerTwo { int iOne; double fdTwo; int iThree; }; |
which produces this class layout:
1> class STestSpanner size(24): 1> +--- 1> 0 | iOne 1> | <alignment member> (size=4) 1> 8 | fdTwo 1> 16 | iThree 1> | <alignment member> (size=4) 1> +---
A quick look at the class layout above informs us that there is 4 bytes of padding between STestSpannerTwo::iOne and STestSpannerTwo::fdTwo; and another 4 bytes of padding after STestSpannerTwo::iThree.
The x86 intrinsic 64 bit float format used to represent double is 8 bytes long and clearly has default alignment of 8-bytes under Visual Studio’s compiler.
The constraints of our type declaration combined with the 8 byte alignment constraint for double have resulted in:
- 4 padding bytes after STestSpannerTwo::iOne and before STestSpannerTwo::fdTwo to maintain the alignment within the memory of a single instance of STestSpannerTwo…
- …and 4 padding bytes after STestSpannerTwo::iThree to maintain the 8 byte alignment across instances of STestSpannerTwo that are contiguous in memory (i.e. an array of STestSpannerTwo).
However, we can also see that STestSpannerTwo::iThree is a 4 byte int and so it will fit into the first block of padding; eliminating the need for the 8 padding bytes.
Re-ordering the members by hand like this will save 8 bytes off the total size of the struct, and so we can see that – in this case – we can save 33% of the memory used by the struct basically for free – don’t take my word for it, try it!
whilst this isn’t something you should lose sleep over, you should now be able to see the benefit in always taking a second to consider the most appropriate place to insert a new data member into an existing type ;)
…but what if I really need those padding bytes?
Unsurprisingly, this being C/C++, it is entirely possible to ask the compiler to change its default alignment and packing behaviour.
This is usually accomplished by use of command line compiler options and/or compiler specific commands that are inserted inline in your code.
In Visual Studio, for example, there is the /Zp compiler option, and another 2 ways to affect the alignment of data structures and the packing of their members with compiler commands in the code itself __declspec( align( x ) ) and #pragma pack (x)). There may also be others I’ve never seen or used, but a quick search on t’interwebs didn’t find them.
For example, using #pragma pack to tell the compiler to pack STestSpanner to a 1 byte boundary like this:
1 2 3 4 5 6 7 8 | #pragma pack(1) struct STestSpanner { int iOne; double fdTwo; char chThree; int iFour; }; |
Gives this class layout output:
1> class STestSpanner size(17): 1> +--- 1> 0 | iOne 1> 4 | fdTwo 1> 12 | chThree 1> 13 | iFour 1> +---
Be warned!
Adjusting the packing of a type can (and probably will) break the alignment constraints of its constituent types – in the example above there are no ‘wasted’ bytes; but STestSpanner no longer honours the alignment requirements of its constituent types and so will presumably take significantly longer to access than it would do had we not fiddled about with it.
This means that when you ask the compiler to change the packing of a type you need to be very careful.
My advice is that – in general – you shouldn’t mess with alignment or padding unless you have a very good reason to; altering padding and alignment can get you into a whole world of pain, especially with bigger projects and when using code libraries – here’s a link to a post on the Visual C++ Team Blog that goes into it in some detail.
The decisions to change alignment or padding will typically come down to essentially platform specific trade offs based on data derived from run time profiling – balancing issues like data sizes vs. memory resource constraints, worst case access times of individual parts of the data, and issues relating to system cache sizes and alignments.
Summary
In summary here are the main points I’d like you to take away from this post:
- The undocumented compiler options /d1reportSingleClassLayout and /d1reportAllClassLayout are awesome, and can help you to understand the memory use implications of code you write, as well as being very useful debugging tools
- We now know that, when an instance of a structure is accessed via pointer, its members are accessed via an offset from the instance’s base address in the assembly, and …
- … that, logically, we can use this in the disassembly view to work out which member is being accessed
- The difference between struct and class. i.e. that there is no low level representational difference.
- What the keyword union does and how it works
- What padding and alignment are and some of their implications
Next time we will look at how (simple) inheritance affects this picture…
Epilogue: Debugger Trick 17a
Here’s an alternative way to find out the offset of a member within a user defined data type, a way that you can happily use in the debugger rather than having to compile the code.
This method works with the vast majority of debuggers I’ve used in the last 5 years or so on both PC and console; and it relies on the fact that the standard C style cast syntax works in watch windows (try it! It’s awesome).
One upshot of this is that you can use casting to calculate the byte offset of any member of any user defined type (this is also valid, and very useful, C/C++ code):
That code looks horrible, not to mention dangerous, but what it’s doing is actually very simple and totally safe.
We have seen that, when using a pointer to an instance of a type, the compiler accesses members of that user defined type by adding an offset to the memory address the instance is stored at (its ‘base address’).
Note that the values we’re seeing in the watch window are identical to those seen in the output from /d1reportSingleClassLayout.
Here’s how this works:
- (STest*) 0 - tells the debugger to treat 0 as the value of a pointer to an instance STest. If you’re thinking “but 0 is NULL!” – remember that 0 is only ‘NULL’ by convention (in fact, on some consoles, 0 is a valid memory address and can be accessed…). In any case, this code isn’t accessing the memory – it’s just asking the compiler to treat 0 as the value of a pointer.
- &(((STest*)0)->iB) - tells the debugger to calculate the address of STest::iB. Again, since this is just calculating an address and not attempting to access it it is fine.
This is possibly my favourite thing I have ever found out about debuggers, and has come in incredibly useful over the years :)