Hello, and welcome to the 11th part of the C / C++ low level curriculum. About time? Definitely!
Last time we looked at the basics of User Defined Types: how structs, classes, and unions are laid out in memory; and (some of) the implications of memory alignment on this picture.
In part 11 we’re going to look at how inheritance affects this picture, in particular the implications for memory layout of derived types and also for their behaviour during construction and destruction (note: we’re leaving multiple inheritance and the keyword virtual out of this picture to start with).
Before We Begin
I will assume that you have already read the previous posts in the series, but I will also put in-line links to any important terms or concepts that you might need to know about to make sense of what you’re reading. I’m helpful like that.
Another big assumption I’m going to make is that you’re already very familiar with the C++ language and comfortable using the language features we’re discussing, as well as the accepted usage limitations of those features etc. If I need to demonstrate anything out of the ordinary I’ll explain it – or at least link to an explanation.
In this series I discuss what happens with vanilla unoptimised win32 debug code generated by the VS 2010 compiler – whilst the specifics will differ on other platforms (and probably with other compilers) the general sweep of the code should be basically the same – because it’s assembly that has been generated by a C++ compiler – and so following the same examples given here with a source / disassembly debugger on your platform of choice should provide you with the same insights we get here.
With this in mind, in case you missed them, here are the backlinks to the previous posts in the series:
- http://altdevblogaday.com/2011/11/09/a-low-level-curriculum-for-c-and-c/
- http://altdevblogaday.com/2011/11/24/c-c-low-level-curriculum-part-2-data-types/
- http://altdevblogaday.com/2011/12/14/c-c-low-level-curriculum-part-3-the-stack/
- http://altdevblogaday.com/2011/12/24/c-c-low-level-curriculum-part-4-more-stack/
- http://altdevblogaday.com/2012/02/07/c-c-low-level-curriculum-part-5-even-more-stack/
- http://altdevblogaday.com/2012/03/07/c-c-low-level-curriculum-part-6-conditionals/
- http://www.altdevblogaday.com/2012/04/10/cc-low-level-curriculum-part-7-more-conditionals/
- http://www.altdevblogaday.com/2012/05/07/cc-low-level-curriculum-part-8-looking-at-optimised-assembly/
- http://www.altdevblogaday.com/2012/09/04/cc-low-level-curriculum-part-9-loops/
- http://www.altdevblogaday.com/2013/01/05/cc-low-level-curriculum-part-10-user-defined-types/
I won’t lie – it’s not light reading :)
Class vs. Struct: a Gentle Reminder
The C++ keywords struct and class define types that are identical in implementational detail and what you can do with them (the only difference being at the language level: the default access specifier if none is specified is private for class, and public for struct).
So, whilst I will be using the keyword class throughout this article please take it as read that anything we talk about here applies equally to types defined using the keyword struct.
What happens when we derive from another type?
So, what does happen when you derive a user defined type from another non built-in type?
Clearly the data members you specify in the declarations have to go somewhere, and so do all those specified in the type(s) you are deriving from.
At the level of C++ there is nothing other than the standard to tell you how this works – and nothing other than looking at what happens with the code generated by the compiler you are using will tell you for definite.
As in the last post, we will be relying heavily on the frankly awesome secret 007 compiler flag /d1reportSingleClassLayout in order to tell us exactly how the (Visual Studio 2010 win32 x86) compiler has decided to lay our example structures out in memory.
It’s about time to look at some example code, so, rather than have you go through the usual rigmarole of setting up your project I have kindly set one up for you.
The zip file in this link contains a VS2010 solution with a single project and .cpp file lovingly set up to run the code shown below, which is in 00_Inheritance.cpp
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | class CTestBase { public: int _iA; int _iB; }; class CTestDerived : public CTestBase { public: int _iC; int _iD; }; int main(int argc, char* argv[]) { return 0; } |
When you compile this project you should get the following in your “Build” output window (the magic of /d1reportSingleClassLayout!) :
1> class CTestBase size(8): 1> +--- 1> 0 | _iA 1> 4 | _iB 1> +--- 1> 1> class CTestDerived size(16): 1> +--- 1> | +--- (base class CTestBase) 1> 0 | | _iA 1> 4 | | _iB 1> | +--- 1> 8 | _iC 1> 12 | _iD 1> +---
Looking at this, it should be fairly obvious that the data members of CTestDerived have just been concatenated onto the end of the memory layout of CTestBase - and, more importantly, that the memory layout of CTestBase within CTestDerived is identical to that when it’s not a base class.
It’s that simple! (for certain definitions of ‘it’ and ‘simple’…)
Armed with this information from last post:
“A guarantee is given in both the C and C++ language specifications that memory address of each member will be higher than that of the one declared before it (see this post on Stack Overflow for more detail of the wording).”
it is obvious that – since CTestDerived inherits all of the members of CTestBase - its members must appear after those of CTestBase in memory.
I remember when I had this explained this to me - not long after having started my first job in the industry as a fresh faced graduate – I did the internal equivalent of a double-take, because the information I had just received was so bleedingly obvious that I couldn’t believe I’d ever not known it.
If it’s that easy, why post about it?
Good question!
The fact that the memory layout of a type is identical in all situations is required by the standard – and also by logic – let’s see why…
First, download and open the second zipped VS2010 project file - this contains the code below in 01_InheritanceWithFunctions.cpp:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | class CTestBase { public: int _iA; int _iB; CTestBase( int iA, int iB ) : _iA( iA ) , _iB( iB ) {} int SumBase( void ) { return _iA + _iB; } }; class CTestDerived : public CTestBase { public: int _iC; int _iD; CTestDerived( int iA, int iB, int iC, int iD ) : CTestBase ( iA, iB ) , _iC ( iC ) , _iD ( iD ) {} int SumDerived( void ) { return _iA + _iB + _iC + _iD; } }; int main(int argc, char* argv[]) { CTestBase cTestBase ( argc, argc + 1 ); CTestDerived cTestDerived( argc, argc + 1, argc + 2, argc + 3 ); return cTestBase.SumBase() + cTestDerived.SumBase() + cTestDerived.SumDerived(); } |
Put a breakpoint on the return statement from main, and then compile and run the release build configuration.
The first thing to note is that the memory layouts printed to the output window during the build are unaffected by the addition of these functions.
This is what you would expect, as we know that non-virtual member function calls are resolved at compile time just like regular non-member and static member functions.
Since CTestDerived is derived from CTestBase, we know from our high level knowledge about C++ that we can call both of these functions on an instance of CTestDerived – what we’re looking at right now is how this is implemented.
When the breakpoint is hit, right click and choose “Go To Disassembly”.
I’ve pasted the part I’d like to discuss below…
(N.B. to get the same disassembly as this you should have the following Viewing Options checked in the disassembly window: ‘Show source code’, ‘Show line numbers’, ‘Show address’, and ‘Show symbol names’)
1 2 3 4 5 6 7 8 9 10 11 | 44: return cTestBase.SumBase() + cTestDerived.SumBase() + cTestDerived.SumDerived(); 0129109A lea ecx,[cTestDerived] 0129109D call CTestDerived::SumDerived (1291060h) 012910A2 lea ecx,[cTestDerived] 012910A5 mov esi,eax 012910A7 call CTestBase::SumBase (1291020h) 012910AC lea ecx,[cTestBase] 012910AF add esi,eax 012910B1 call CTestBase::SumBase (1291020h) 012910B6 pop edi 012910B7 add eax,esi |
We’ve previously covered that the win32 calling convention for member functions (‘thiscall’) passes this to member functions in the ecx register.
Correspondingly, you’ll notice that the address of cTestBase and cTestDerived are being stored in ecx using lea (‘load effective address’) immediately before calling their member functions.
Specifically, note that the address of cTestDerived is passed un-tampered with in ecx when calling the base class function CTestBase::SumBase. Remember this for later (and for the next post!).
So, let’s look at the disassembly for CTestBase::SumBase and CTestDerived::SumDerived - I tend to single step the disassembly and step into them, but putting breakpoints in them is more reliable :)
CTestBase::SumBase
1 2 3 4 5 6 7 | 14: int SumBase( void ) 15: { 16: return _iA + _iB; 01291020 mov eax,dword ptr [ecx+4] 01291023 add eax,dword ptr [ecx] 17: } 01291025 ret |
CTestDerived::SumDerived
1 2 3 4 5 6 7 8 9 | 33: int SumDerived( void ) 34: { 35: return _iA + _iB + _iC + _iD; 01291060 mov eax,dword ptr [ecx+0Ch] 01291063 add eax,dword ptr [ecx+8] 01291066 add eax,dword ptr [ecx+4] 01291069 add eax,dword ptr [ecx] 36: } 0129106B ret |
We can see that all offsets from ecx used in both functions correspond to the memory layouts we have in the build output for the type that the function belongs to.
Since _iA and _iB are at the same offset within both CTestBase and CTestDerived (i.e. 0 and 4 bytes respectively), CTestBase::SumBase can safely be called on instances of CTestDerived.
We already know that this is possible from our high level understanding of C++, but now we know the implementational detail that makes it possible.
Whilst the specifics of the disassembly will probably differ from platform to platform, the principles underlying its operation should not.
Summary
To summarise what we’ve established so far :
1) in member functions, member data of a class is accessed via specific offsets from the this pointer
2) these offsets are constants at compile time and are baked into the assembly code for the member functions
3) this means that the memory layout of the members of a given class must always be identical or the member functions won’t work
If we follow this logic through, we can see that:
4) the memory layout of a class B that inherits from another class A must contain class A‘s members in the same memory layout as class A
5) the memory layout of any given class A is identical regardless of whether it is an instance of A, or it is included in the memory of some type derived from A.
6) note: this behaviour is required by the standard, and (more significantly) by logic.
Finally, it follows that (because each member of a struct must have a higher address than those declared before it):
7) the extra memory required by derived class B will be concatenated onto the end of the memory layout of its base class A
That’s all for now – next time we’ll look at how multiple inheritance affects this picture.
I know it’s pretty short, but this just means the next one will get here more quickly :)
Epilogue – for those who wondered what I changed in the project settings
There’s quite few changes to the default VS2010 win32 console app project properties in the projects I’ve zipped up for this post.
The changes have to do with making the optimised release build configuration leave the code structure alone (i.e. not strip out or ‘fold’ functions to save exe size, prevent functions being inlined), and prevent extraneous ‘debug checking’ code being inserted (makes function calls slower, and code less easy to follow in disassembly)
- turning off ‘Whole Program Optimisation’ (Configuration Properties->General)
- turning off ‘Inline Function Expansion’ (Configuration Properties->C/C++ ->Optimisation)
- turning off ‘Basic Runtime Checks’ (Configuration Properties->C/C++ ->Code Generation)
- getting rid of pre-compiled headers to streamline the number of files (Configuration Properties->C/C++ ->Precompiled Headers)
- turning off ‘Enable COMDAT folding’ (Configuration Properties->Linker-> Optimization)
Essentially, this makes the Release configuration assembly have the same structure as the Debug one WRT function calls.
Also, I use the argc parameter to main as input to the code, and return value computed from that so that the optimiser can’t assume constant input or output values.
If you use constant inputs, or don’t output a value computed from the inputs then it’s pretty hard to convince the optimiser not to optimse the entire .exe to ‘return 0;’… ;)
Shout out
Thanks (again) to Bruce – king (or at the very least duke) of advice and peer review.