Quantcast
Channel: #AltDevBlog » Programming
Viewing all articles
Browse latest Browse all 101

C/C++ Low Level Curriculum Part 9: Loops

$
0
0

Welcome to the 9th post in this C/C++ low level curriculum series I’ve been doing. It’s been a long time since post 8 (way longer than I thought it was), a fact I can only apologise for. My 3 year old son stopped having a nap in the afternoon in late April and it’s totally ruined my productivity…

This post covers the 3 built-in looping control structures while, do-while, and for as well as the manual if-goto loop (old school!); as usual, we look in some detail at the assembly generated by the compiler looks like. Did I forget about the new range-based-for loop that was added in the C++11 standard? Nope. If you have access to a C++11 compliant compiler you’re more than welcome to look at that yourself – think of it as homework…

Here are the backlinks for preceding articles in the series (warning: it might take you a while, the first few are quite long):

  1. http://altdevblogaday.com/2011/11/09/a-low-level-curriculum-for-c-and-c/
  2. http://altdevblogaday.com/2011/11/24/c-c-low-level-curriculum-part-2-data-types/
  3. http://altdevblogaday.com/2011/12/14/c-c-low-level-curriculum-part-3-the-stack/
  4. http://altdevblogaday.com/2011/12/24/c-c-low-level-curriculum-part-4-more-stack/
  5. http://altdevblogaday.com/2012/02/07/c-c-low-level-curriculum-part-5-even-more-stack/
  6. http://altdevblogaday.com/2012/03/07/c-c-low-level-curriculum-part-6-conditionals/
  7. http://www.altdevblogaday.com/2012/04/10/cc-low-level-curriculum-part-7-more-conditionals/
  8. http://www.altdevblogaday.com/2012/05/07/cc-low-level-curriculum-part-8-looking-at-optimised-assembly/

 

A brief history of looping

It occurred to me that a sensible order to cover the looping constructs of the C/C++ language might be to address them in the order in which they were introduced into the language.

A couple of years back a friend showed me a brilliant website / article that covered the evolution of the C programming language. It was very interesting, and from what I can remember, contained information on the order in which the various features of the C compiler were added – including which looping construct came first. I tried to find it on t’ internet, but failed. Feel free to link me up in a comment if you happen to know where it is…

Since I couldn’t find the article /website in question I’ve decided to cover them in the order of the amount of work they do automatically for the programmer, which in my opinion is: if-goto, while, do-while, and finally for.

This seems to me to be a sensible order for 2 reasons; firstly because it’s likely to be the order in which they were introduced into programming languages, and secondly because the concepts encapsulated by these constructs sort of build on each other in that order.

 

if-goto

From our previous excursions into the land of assembly we are already familiar with the concept of jumping the execution address, and with the concept of ‘conditional jumping’ (i.e. conditionally changing the execution address). The most direct way to loop the execution of a piece of code several times (as opposed to the simplest to type) is to use the high level keywords that correspond to these assembly level concepts.

We are already familiar with the keyword if, but we’ve not really covered gotopossibly the most maligned of all the language features of C/C++, and almost certainly the most banned by corporate coding standards.

Personally I don’t think that goto is inherently more dangerous than (for example) operator overloading; but, the purpose of this article is not to discuss goto – if you’re interested here’s the Wikipedia page which contains a fair amount of detail (and links to) on the arguments for and against it.

The purpose of this article is not to discuss the merits of goto or, for that matter, operator overloading so let’s get on with it.

Here’s the first code snippet (see the previous article for how to set up a project that will just accept this code…)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#include "stdafx.h"
 
#define ARRAY_SIZE(array) (sizeof(array)/sizeof(array[0]))
 
int main(int argc, char* argv[])
{
    int k_aiData[] = { 1, 2, 3, 4, 5, 6, 7, 8 };
    int iSum       = 0;
    int iLoop      = 0;
 
LoopStart:
    if( iLoop < ARRAY_SIZE(k_aiData) )
    {
        iSum += k_aiData[ iLoop ];
        ++iLoop;
        goto LoopStart;
    }
 
    return 0;
}

You should be able to see that this code is simply looping over the values in the array k_aiData and summing them, other than the use of if and goto it’s essentially a standard loop to iterate an array.

The pre-processor macro ARRAY_SIZE that I’ve used here is a simple way to make dealing with statically allocated arrays less error prone. Essentially we could initialise the array k_aiData with any number of elements we wanted to and the rest of the code would still just work. There are simple ways to achieve this in a type safe manner using templates too, but I chose to use a macro here because a readable version of the code takes up less vertical space than the template.

If you are wondering why I am not incrementing iLoop inside the square brackets, this is so that the high level code that is doing the work of the loop is identical across all code snippets.

If you are also wondering why I am using the prefix as opposed to postfix version of operator++ then well done to you – award yourself 6.29 paying attention points. In this case it makes no difference to the assembly generated, but in these days of operator overloading it’s generally better to use the prefix version as a point of good practice – unless of course you require postfix behaviour (the first comment on the first answer to this question on Stack Overflow should prove illuminating if you don’t know what implications of the different behaviours are).

Since we’re using two keywords that have a very clear relationship to assembly level concepts, it’s reasonable to assume that the disassembly for this code will be pretty much as we wrote it at the high level. As we all know, we should never assume; so let’s check our assumptions.

Here is the debug x86 disassembly for the looping section:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
    11: LoopStart:
    12:     if( iLoop < ARRAY_SIZE(k_aiData) )
00BB1299  cmp         dword ptr [ebp-2Ch],8  
00BB129D  jae         LoopStart+1Eh (0BB12B7h)  
    13:     {
    14:         iSum += k_aiData[ iLoop ];
00BB129F  mov         eax,dword ptr [ebp-2Ch]  
00BB12A2  mov         ecx,dword ptr [ebp-28h]  
00BB12A5  add         ecx,dword ptr [ebp+eax*4-24h]  
00BB12A9  mov         dword ptr [ebp-28h],ecx  
    15:         ++iLoop;
00BB12AC  mov         eax,dword ptr [ebp-2Ch]  
00BB12AF  add         eax,1  
00BB12B2  mov         dword ptr [ebp-2Ch],eax  
    16:         goto LoopStart;
00BB12B5  jmp         LoopStart (0BB1299h)  
    17:     }
    18: 
    19:     return 0;
00BB12B7  xor         eax,eax

As expected, the disassembly for this is very straightforward, and you should be familiar with almost all of it from previous posts.

As we saw in the first article on conditionals, the assembly code (lines 3 & 4) that maps to the if statement (line 2) tests the logical opposite of the high level code. This is because the high level if conceptually ‘steps into’ the curly brackets it controls if its test passes, whereas the assembly has to jump past the assembly code generated by the content of the if in order to not execute it (remember: curly brackets are a high level convenience for programmers!).

In this case, line 3 compares iLoop (at address [ebp-2Ch]) to 8 (the size of the array obtained from ARRAY_SIZE is a compile time constant), and (line 4) uses jae (jump if above or equal) to conditionally jump execution to LoopStart+1Eh (0BB12B7h) – which is the memory address immediately after the assembly generated by the content of the curly brackets controlled by the if statement.

The next block of assembly adds the iLoop-th element of k_aiData to iSum. By this point, we should all be familiar with the assembly for adding two integers, and the way in which the elements of k_aiData are accessed is the only real new assembly code idiom that we’re seeing in this disassembly.

The instruction that accesses the iLoop-th element from the array is doing a surprising amount of work for an assembly instruction; certainly this is the first time that we’ve seen any significant computation being performed within a single line of assembly code, and it’s all occurring in the square brackets in the place that usually contains the address of the value we wish to access.

So, let’s look at it in detail:

9
add ecx,dword ptr [ebp+eax*4-24h]

When line 9 is executed, the eax register holds the value of iLoop and [ebp-24h] is the address of the array k_aiData.

Since k_aiData is an array of int, the address of k_aiData[ 0 ] is [ebp-24h] and sizeof( int ) is 4 on the x86, it should be pretty obvious that the computation [ebp+eax*4-24h] on line 9 equates to the memory address of the iLoop-th element of k_aiData.

If you’re having trouble seeing it, here is the address computation seen in the disassembly rearranged step by step so that we can swap out the registers and memory addresses for the high level variables:

ebp+eax*4-24h

= ebp + ( eax*4 ) + (-24h)

= ebp + (-24h) + ( eax*4 )

= ( epb – 24h ) + (eax * 4 )

 = &k_aiData[ 0 ] + ( iLoop * sizeof( int ) )

Now we’ve examined the new elements of the disassembly we’ve not seen before, the rest of this post should clip along fairly quickly :)

So, after the value stored in the iLoop-th element of k_aiData has been added to iSum, all that remains is to ++iLoop ( lines 12-14) and then jump back to the label at the start of the loop (line 16).

Clearly this will continue until iLoop >= 8, and so we can see that the assembly is isomorphic with the high level code.

 

Why add Looping Constructs?

Since looping behaviour can simply be achieved using the if-goto, this begs the question “Why did Dennis Ritchie (sadly no longer with us) bother with the rest of the looping constructs available in C?”

There are three main reasons that spring to my mind, the first is efficiency (of typing rather than execution), the second is robustness, and the third is clarity of intent.

Writing a loop using the if-goto idiom involves a fair amount of typing, and loops are very common in most code bases. No-one likes to type more than they have to – especially programmers. Since the programmers using the language were probably originally the programmers of the language it was more or less an inevitability that a more textually terse method of writing loops would come about.

Secondly, and more importantly, the code involved in any writing two given if-goto loops is very similar and doing it by hand would be more prone to error (as well as tedious) than using a code construct specifically made to handle looping which removes the need for the explicit goto and associated label.

Thirdly, and possibly even more importantly, an explicit looping construct makes the intent of the code far more clear, if and goto both have plenty of other uses as well as looping, and so any programmer coming along later to read code containing an if-goto loop would have to expend significant mental effort just to get to the point where they can see that the code is in fact a loop; which would clearly be very bad.

Taken together, these three reasons mean that you will almost certainly never write a loop using if-goto for any reason other than just for fun; and you certainly won’t need to write one. The only reason I am covering it is because I feel that it’s worth considering as a step in the evolution of looping constructs in languages.

 

while

So, we come to while. The while loop is basically an automatic if-goto, and we will see this when we look at the disassembly (which is essentially why I covered the if-goto in the first place).

Here’s the code snippet upgraded to use while

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#include "stdafx.h"
 
#define ARRAY_SIZE(array) (sizeof(array)/sizeof(array[0]))
 
int main(int argc, char* argv[])
{
    int k_aiData[] = { 1, 2, 3, 4, 5, 6, 7, 8 };
    int iSum       = 0;
    int iLoop      = 0;
 
    while( iLoop < ARRAY_SIZE(k_aiData) )
    {
        iSum += k_aiData[ iLoop ];
        iLoop++;
    }
 
    return 0;
}

Clearly the high level code looks neater already, and (more importantly) the manual elements of putting the if and goto in the right places have been removed; so it’s a lot harder to do something wrong as a result of human error, and it’s instantly obvious that the code is looping over the content of the array k_aiData.

Much better – well done programming language designers of yesteryear!

Now let’s have a look at the (dis)assembly that it generates…

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
    11:     while( iLoop < ARRAY_SIZE(k_aiData) )
013E1299  cmp         dword ptr [ebp-2Ch],8  
013E129D  jae         main+77h (13E12B7h)  
    12:     {
    13:         iSum += k_aiData[ iLoop ];
013E129F  mov         eax,dword ptr [ebp-2Ch]  
013E12A2  mov         ecx,dword ptr [ebp-28h]  
013E12A5  add         ecx,dword ptr [ebp+eax*4-24h]  
013E12A9  mov         dword ptr [ebp-28h],ecx  
    14:         iLoop++;
013E12AC  mov         eax,dword ptr [ebp-2Ch]  
013E12AF  add         eax,1  
013E12B2  mov         dword ptr [ebp-2Ch],eax  
    15:     }
013E12B5  jmp         main+59h (13E1299h)  
    16: 
    17:     return 0;
013E12B7  xor         eax,eax

Almost entirely unsurprisingly, the assembly that has been generated from the while is essentially identical to that generated for the if-goto we just looked at – only the addresses that are being jumped to have changed.

This is the sort of thing that restores my faith in humanity; well, in compiler programmers specifically but they’re still human. I assume.

 

do-while

Let’s move swiftly on with the code snippet for the next type of loop, the do-while.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#include "stdafx.h"
 
#define ARRAY_SIZE(array) (sizeof(array)/sizeof(array[0]))
 
int main(int argc, char* argv[])
{
    int k_aiData[] = { 1, 2, 3, 4, 5, 6, 7, 8 };
    int iSum       = 0;
    int iLoop      = 0;
 
    do 
    {
        iSum += k_aiData[ iLoop ];
        ++iLoop;
    } 
    while( iLoop < ARRAY_SIZE(k_aiData) );
 
    return 0;
}

Essentially the same code, but now we’re testing the loop’s exit condition at the end of each loop rather than at the beginning.

All being sane in the universe, I think it would be reasonable to expect the assembly generated for this code to turn out very similar to the previous two loops – except that the testing code is likely to be after the body of the loop rather than before it….

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
    11:     do 
    12:     {
    13:         iSum += k_aiData[ iLoop ];
00CC1299  mov         eax,dword ptr [ebp-2Ch]  
00CC129C  mov         ecx,dword ptr [ebp-28h]  
00CC129F  add         ecx,dword ptr [ebp+eax*4-24h]  
00CC12A3  mov         dword ptr [ebp-28h],ecx  
    14:         ++iLoop;
00CC12A6  mov         eax,dword ptr [ebp-2Ch]  
00CC12A9  add         eax,1  
00CC12AC  mov         dword ptr [ebp-2Ch],eax  
    15:     } 
    16:     while( iLoop < ARRAY_SIZE(k_aiData) );
00CC12AF  cmp         dword ptr [ebp-2Ch],8  
00CC12B3  jb          main+59h (0CC1299h)  
    17: 
    18:     return 0;
00CC12B5  xor         eax,eax

As expected then, the code doing the work of the loop and incrementing iLoop is basically identical.

Also as expected, the conditional jump that keeps the loop going is a little different – it’s using the jump instruction jb (jump if below) so, unlike pretty much all the other assembly code we’ve looked at generated by high level conditionals, this is testing the same condition as the high level code – but why?

As discussed earlier, the high level language concept of ‘curly bracket scope’ doesn’t exist at the assembly level. Despite this, the compiler has to generate assembly code that is logically isomorphic with the high level code; so in order to satisfy the high level behavioural constraint of ‘stepping into’ the curly bracketed code if a pre-condition is met, the assembly skips over the code within the curly brackets if the condition isn’t met.

So, since the looping condition is a post-condition in a do-while loop (i.e. at the end of the ‘curly bracket scope’ it controls) the high level code and assembly code both need to jump back to the start of the loop if the looping condition is met, and so the test in the assembly code is the same as that at the high level.

 

for

So, we come to the for loop, the loop you probably use the most often.

The for loop was the looping construct that worked the hardest for you until the new C++11 ANSI standard introduced the ‘range-based’ for to the language this time last year (not counting the various template based solutions). Unfortunately (although it’s obviously supported in the recently released VC2012) support for the C++11 standard is patchy at best on most video game platforms so the for loop is still the default solution.

Let’s take a second to look at the ‘anatomy of a loop’. More or less any looping code it has 3 responsibilities in addition to the work it does per iteration of the loop:

a) declare and/or initialise loop state variables
b) test loop exit condition
c) update state variables for the next loop

These 3 responsibilities define the scope and manner of the iteration the loop is doing, and therefore can be seen as the ‘fingerprint’ of that iteration.

The for loop is a ‘language level refactoring’ that gathers these three responsibilities into one construct giving them textual adjacency, thus making the entire fingerprint visible in one place.

Whilst this is pretty obvious when you stop to examine it, the importance of explicitly stating this should not be underestimated.

Why? Let’s look at for compared to while, replacing the code with the corresponding a, b, or c from the list above.

1
2
3
4
for( a; b; c)
{
    //do work
}

as opposed to:

1
2
3
4
5
6
a;
while( b )
{
    //do work
    c;
}

So, the for loop takes up less vertical space than the while (in this instance at least) but what, if anything, are the other advantages:

  • variables declared by a in the for are scoped to the loop. Smaller scope == less entropy == less bugs.
  • c is obviously distinct from the work code of the loop in the for, but not so in the while (be honest; how many times have you accidentally done an infinite while because you forgot to increment at the end?)
  • the adjacency of a, b, and c in the for allows possible bugs with loop conditions to be spotted more easily

Whoever invented the for loop deserves a pat on the back, because for takes the improvements made by the while and do-while loops to the next level – by reducing human error and increasing the clarity of intent even further.

I looked him up and it turns out that the earliest equivalent to for I found by googling is the DO loop in FORTRAN which was invented in 1957 by a team led by the late John Backus at IBM. Since that’s about as close to an answer as I feel I need to get, I now invite you to join me in a posthumous air high-five to John to celebrate his team’s sterling work.

Let’s look at one now shall we? Here’s the code snippet:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#include "stdafx.h"
 
#define ARRAY_SIZE(array) (sizeof(array)/sizeof(array[0]))
 
int main(int argc, char* argv[])
{
    int k_aiData[] = { 1, 2, 3, 4, 5, 6, 7, 8 };
    int iSum       = 0;
 
    for( int iLoop = 0; iLoop < ARRAY_SIZE(k_aiData); ++iLoop )
    {
        iSum += k_aiData[ iLoop ];
    } 
 
    return 0;
}

…and here’s the disassembly (n.b. I un-ticked the ‘Show symbol names’ check box in the disassembly display options for this…)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
    10:     for( int iLoop = 0; iLoop < ARRAY_SIZE(k_aiData); ++iLoop )
00DC1292  mov         dword ptr [ebp-2Ch],0  
00DC1299  jmp         00DC12A4  
00DC129B  mov         eax,dword ptr [ebp-2Ch]  
00DC129E  add         eax,1  
00DC12A1  mov         dword ptr [ebp-2Ch],eax  
00DC12A4  cmp         dword ptr [ebp-2Ch],8  
00DC12A8  jae         00DC12B9  
    11:     {
    12:         iSum += k_aiData[ iLoop ];
00DC12AA  mov         eax,dword ptr [ebp-2Ch]  
00DC12AD  mov         ecx,dword ptr [ebp-28h]  
00DC12B0  add         ecx,dword ptr [ebp+eax*4-24h]  
00DC12B4  mov         dword ptr [ebp-28h],ecx  
    13:     } 
00DC12B7  jmp         00DC129B  
    14: 
    15:     return 0;
00DC12B9  xor         eax,eax

Sooooo … this one looks a little different, right? It’s not very different though, just re-organised a little:

  1. Line 2-3: is initialising iLoop (i.e. [ebp-2Ch]) to 0, and then jumping over lines 4-6
  2. Lines 4-6: are incrementing iLoop
  3. Lines 7-8: are comparing iLoop with 8 and exits the loop by jumping to line 19 if iLoop >= 8 (n.b. pre-condition check so opposite of high level)
  4. Lines 11-14: indexing the array and accumulating the sum of element values (should look very familiar by now)
  5. Line 16: loops back to line 4

So, the assembly in each of steps 1, 2, and 3 implements one of the semi-colon separated parts of the for loop’s ‘parameters; in fact, steps 1 to 3 correspond to a (initialise), c (increment), and b (test exit condition) respectively in our  ‘anatomy of a loop’ list above.

Only steps 1 and 3 are executed on the first iteration of the loop, and only steps 2 and 3 on all other iterations.

Also note that steps 2 and 3 are in the opposite order in the assembly compared to the high level code – this is, again, down to the disparity between high level nicety and low level execution.

So, the assembly that is generated from a for loop is more or less as you might expect. We’ve covered all the (non-templated-non-C++11) looping constructs now, end of story – next article. Move along please.

 

Wait! I’m not quite finished!

Hold on! The reason the last post was about how to look at optimised assembly is mostly because I  wanted to look at the optimised assembly generated by the C++ looping constructs in this post.

So, rather than re-compile all the snippets one by one let’s set up the project just like in post 8, and then download and paste in this code (massive ‘snippet’): CPPLLC_Part9MoreLoops.

This file contains a simple program that has 4 functions in addition to main – they are:

  • SumGoto – sums the elements of an array using an if-goto loop
  • SumWhile – sums the elements of an array using a while loop
  • SumDo – sums the elements of an array using a do-while loop, and
  • SumFor – sums the elements of an array using a for loop

All very straightforward really. The only unusual thing you might notice is that main looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
int main( int argc, char* argv[] )
{
    // array and a nice const for the size
    const int k_iArraySize = 8;
    int       k_aiData[ k_iArraySize ] = { 0, 1, 2, 3, 4, 5, 6, 7 };
 
    int iSumGoto  = SumGoto ( k_aiData, atoi( argv[ 1 ] ) );
    int iSumWhile = SumWhile( k_aiData, atoi( argv[ 2 ] ) );
    int iSumDo    = SumDo   ( k_aiData, atoi( argv[ 3 ] ) );
    int iSumFor   = SumFor  ( k_aiData, atoi( argv[ 4 ] ) );
 
    std::cout << iSumGoto << iSumWhile << iSumDo << iSumFor;
    return 0;
}

So it’s using command line arguments as input, and printing to stdout for output. This is a relatively simple way to prevent the overzealous optimising compiler from removing all the code – we force it to keep it in there by doing input and output at runtime.

Before we compile and run it, you’ll also need to make a couple of changes in your project’s property pages – make sure you have the ‘Release’ build configuration selected…

The first is to pass some command line arguments to the code – apart from any other reasons, this is shockingly naive code and will crash if it doesn’t get the arguments it expects, so add the following (which will make it iterate k_aiData fully for each function):

screenshot of adding command line parameters in the project's property pages

 

We also need to turn off function inlining or the compiler will optimise away all the function calls making the disassembly much harder to follow:

screenshot of propertioes page showing how to turn off function inlining

Final pre-launch check: add a breakpoint to the C++ line in each loop that sums the loop’s elements (i.e. ‘iSum += xxxx’), and off we go!

 

Optimised Disassembly O’clock!

Build and run the code and you should end up with your debugger stopped on the breakpoint you have put in SumGoto.

Right click and choose ‘Go To Disassembly’, you should see something like the image below – but before we look at it in detail, a brief aside is needed:

The code in main that calls SumGoto looks like this:

00DB191A  push        eax  
00DB191B  lea         esi,[ebp-24h]  
00DB191E  call        SumGoto (0DB1880h)

eax (which contains k_iArraysize) is pushed onto the stack, but the address of k_aiData[ 0 ] (which is stored at [ebp-24h]) is stored into esi rather than being pushed onto the stack.

“Wait!” I hear you say “They just did who in a whatnow? I thought we covered calling conventions, and no-one said anything about using esi for parameter passing!

Don’t worry about this for now, just accept that – for whatever reason – in this case the address of k_aiData[ 0 ] is being passed via the esi register (I investigate this in the article’s epilogue if you’re really interested).

So, here’s the disassembly for SumGoto:

make sure you have the same view options checked in the context menu, or your disassembly may look very different!

Interestingly this bears little visible relation to the debug disassembly we looked at for the if-goto earlier. So let’s pick it apart to see what it’s doing differently:

  1. 00DB1880 to 00DB1884 – function prologue of SumGoto.
  2. 00DB1885moving function parameter iDataCount (i.e. the number of loops) into the edi register.
  3. 00DB1888 to 00DB188E – initialising registers ecx, edx, ebx, and eax to 0 (n.b. anything XOR itself is 0).
  4. ooDB1890 to 00DB1893 – compare edi (number of loops remaining) with 2; if less jump to 00DB18A7 (2nd instruction in step 9) otherwise continue.
  5. ooDB1895 – another new assembly instruction; dec decreases its register operand by 1 – in this case edi (iDataCount).
  6. 00DB1896 to 00DB1899 – we know that the address of k_aiData[0] is in esi, so from the address calculation in the square brackets it is pretty obvious that these two lines are indexing into k_aiData and summing the odd and even elements into edx and ecx respectively.
  7. 00DB189D – is incrementing eax by two. eax clearly contains the count of elements that have been looped over so far – because…
  8. 00DB18A0 to ooDB18A2 – …are comparing eax to edi. If eax < edi execution jumps back to step 6.
  9. 00DB18A4 to 00DB18AB – this ties in with the decrement to edi made at step 5. Since the code is looping and summing 2 elements at a time, this code checks if iDataCount was odd or even. If odd it jumps to step 11, if even it jumps to step 12.
  10. 00DB18AD – leaves ecx unchanged. What is it for? It’s essentially a nop instruction (no operation), nop instructions are used in assembly code for various reasons such as memory maintaining alignment of certain instructions (the 1st answer to this question on Stack Overflow explains sufficiently for our requirements at this point). In any case, both possible code paths through step 9 will skip this instruction entirely.
  11. 00DB18B0 – if iDataCount was odd, this code moves the value of the array element that would have been missed by iterating 2 elements at a time into ebx.
  12. 00DB18B3 – this uses lea to add the sums of odd and even elements of k_aiData that have been accumulating in edx and ecx and store them in eax (remember, eax is used to return integer values from functions).
  13. 00DB18B6 – this is actually the start of the epilogue of SumGoto – restoring edi to the value it stored before SumGoto was called. There’s no particular reason for this to have been put in before the next instruction. Optimising compilers do this sort of thing relatively often, as long as the code it generates is correct it’s not worth worrying about too much.
  14. 00DB18B7 – this line adds the value from ebx (see step11) to the sum to be returned in eax.
  15. 00DB18B9 to 00DB18BB – function epilogue of SumGoto.

Ouch. That seems far more complex than the debug assembly code for the if-goto loop. You may have to read through it a few times before you satisfy yourself about how it works – I recommend stepping through it in the debugger looking at the registers in a watch window.

Somewhat surprisingly, SumWhile and SumFor look pretty much exactly like SumGoto, but SumDo is way smaller:

1
2
3
4
5
6
7
8
SumDo:
00DB1830  xor         eax,eax  
00DB1832  xor         ecx,ecx  
00DB1834  add         eax,dword ptr [esi+ecx*4]  
00DB1837  inc         ecx  
00DB1838  cmp         ecx,edx  
00DB183A  jl          SumDo+4 (0DB1834h)  
00DB183C  ret

This is incredibly simple to follow, and intuitively much more the sort of thing I would have intuitively expected to see for all of the looping constructs, but there is method to the compiler’s seeming madness…

Summing two elements per iteration of the loop like the assembly of SumGoto, SumWhile, and SumFor are doing is actually a form of loop unrolling. Although (in this code) the compiler doesn’t know how many iterations of the loop it will end up doing, it can still improve the overall ‘looping instructions to working instructions’ ratio of the loop by this pairwise unrolling. Over a large enough array it should be faster than code that is not unrolled in the same way.

By changing the compiler options (under C/C++ -> Optimisation) from ‘Maximize Speed (/02)’ to ‘Minimize Size (/01)’ you can generate assembly that looks a lot more as you would expect. Since /02 is the default for release build configurations under Visual Studio 2010 I thought I should explain this assembly, and I leave looking at the assembly generated by /01 as an exercise for you dear reader :)

Conclusions

So, there we have it. looping constructs, and a genuine taste of the differences between optimised and debug assembly – albeit in a massively simplified scenario compared to real code.

What should we take away from this? Well, I guess primarily the point of this was to demonstrate that whilst the optimising compiler is constrained to generate assembly code that is isomorphic with your high level code, you should never take it for granted that the code it generates will look how you expect it to.

This should, I think, about finish up the program control / structural aspects of C/C++ and leave us free to move on to look at the way other mechanics of the language work at the assembly level.

I feel that there might possibly be a post on the range based for and on recursion at some point, but we’ll see – feel free to leave a comment if you think there’s something glaring that I’ve left out and I’ll try to rectify that before moving on…

Finally, a hearty thank you to all the AltDevAuthors who chipped in with sage advice on this post – Tony, Paul, Ted, Bruce, Ignacio, and Rich.

 

Epilogue

Notes on [ebp+eax*4-XXh]

Whilst this addressing mode seems like magic, there are limitations on the computations that can be performed within the square brackets in this way – see this article on Wikipedia for a summary of the limits.

Regardless of this, it is commonly seen used in conjunction with another x86 assembly instruction called lea (load effective address) (as seen in the optimised SumGoto assembly) which will load the result of the address computation (rather than the value at that address) into a specific register.

When I’ve seen the mnemonic lea in the disassembly window it has most often been used for this purpose – though don’t assume that it is! Since we’re not (necessarily) assembly programmers, we don’t need to worry about this too much but I thought I’d mention it.

Notes on Using esi to pass parameters to functions

So, this is certainly not what we’d expect given the coverage of calling conventions we did earlier in the series.

I googled for at least 10 minutes (clearly not exhaustive, but usually long enough to find a trail to an answer) and couldn’t find any specific information pertaining to the use of esi to pass parameters in a documented calling convention; however I did find several other people who had observed this behaviour and were looking for answers about it.

So, in the spirit of discovery I decided to see what happened if I compiled the looping functions (SumGoto, SumWhile, SumDo, and SumFor) into a separate library and then linked to that library instead of having them compile inside the same logical compilation unit as main. As anticipated, this sorted out the parameter passing so that it conformed to the cdecl calling convention, no more kooky use of esi to pass the array.

What do we conclude from this then? Well, it seems that if the compiler knows that the code it’s generating isn’t going in a library (or you have Link Time Code Generation enabled) – and so code only has to conform to the ‘local’ calling conventions of the executable it’s generating – then the compiler takes liberties with the calling conventions in order to optimise function parameter passing – here’s a couple of links from Bruce on the matter: from MSDN (mentions it, but no specifics to speak of) and from StackOverflow.

Final take away point: if something makes no sense when you’re debugging, don’t assume anything – put on your Deerstalker and Sherlock Holmes your way to the bottom of it.

A final note on the genesis of loops

I’ve already mentioned that I didn’t find the page on the history of C that I was looking for, so I can’t say with any degree of certainty which order the various looping constructs were actually added to the language.

However, in my searching I did find this interesting little nugget of information on Stack Exchange about the history of looping – my personal gut feeling on this matter is that whoever first coined the use of Sigma in mathematical notation is probably the father (or mother) of programmatic looping, but whoever invented knitting is the true originator ;)

 

 

 


Viewing all articles
Browse latest Browse all 101

Trending Articles