Monday, February 19, 2007 12:53 AM bart

Inlining - yes, it happens

Some time ago I was giving a small lecture on .NET development talking about some CLR internals (largely based on SSCLI). During the talk on the JIT, the question of inlining popped up (does it happen?). This post shows inlining in action based on a little perf measurement. For more details, visit David Notario's blog for a few detailed posts on inlining. However, I was out of luck at the time of delivering the talk since I didn't have an internet connection (or more precise, only a crappy wireless one that wasn't to be called "reliable") so I had to come up with a sample on the fly. Luckily my temp folder (the one I keep all my demos in) contained a simple quicksort implementation that I used as a running example to illustrate some array stuff (unsafe code included), so I did instrument it with some Stopwatch stuff and a few large array sorting requests to see the impact.

The magic in the fragment below happens in the Swap function (one that C/C++ guys often would like to macro-ize or inline) - or better, when this particular function is called. Inlining basically means that the method call is eliminated and the code is put in place of the call (in a kind of "copy-paste" manner), which wipes out the overhead associated with the method call. In the managed world, this is something the JIT compiler can take care of. One limitation is that the method shouldn't be any larger than 32 bytes of IL, which is the case in our sample - so we're in luck.

1 using System; 2 using System.Diagnostics; 3 4 class Program 5 { 6 static Random rand = new Random(); 7 8 static void Main() 9 { 10 long ms = 0; 11 12 Stopwatch sw = new Stopwatch(); 13 for (int i = 0; i < 10; i++) 14 { 15 int[] array = GetArray(); 16 sw.Start(); 17 Qsort(array); 18 sw.Stop(); 19 ms += sw.Elapsed.Ticks; 20 } 21 22 Console.WriteLine(ms); 23 } 24 25 static int[] GetArray() 26 { 27 int[] array = new int[1000000]; 28 for (int i = 0; i < array.Length; i++) 29 array[i] = rand.Next(); 30 return array; 31 } 32 33 static void Qsort(int[] array) 34 { 35 Qsort(array, 0, array.Length - 1); 36 } 37 38 static void Swap(int[] array, int i, int j) 39 { 40 int t = array[i]; 41 array[i] = array[j]; 42 array[j] = t; 43 } 44 45 static void Qsort(int[] array, int from, int to) 46 { 47 if (from < to) 48 { 49 int b = from, e = to; 50 int p = array[b]; 51 52 while (b < e) 53 { 54 while ( array[e] > p) e--; 55 while (b < e && array[b] <= p) b++; 56 57 if (b < e) 58 Swap(array, b, e); 59 } 60 61 Swap(array, from, b); 62 Qsort(array, from, b - 1); 63 Qsort(array, b + 1, to); 64 } 65 } 66 }

Execute (release build) this piece of code and observe the execution time. (Note: The implementation above isn't the ideal quicksort implementation; use Array.Sort for "production" purposes, which is much better and more generic. One caveat in the code is on line 50 for the pivot choice - why?).

Next, change the code for the Swap-method as follows:

[System.Runtime.CompilerServices.MethodImpl(System.Runtime.CompilerServices.MethodImplOptions.NoInlining)] static void Swap(int[] array, int i, int j) { int t = array[i]; array[i] = array[j]; array[j] = t; }

and execute (release build) again. Observe the execution time now. On my machine the results are like this:

  • With inlining - 111631616 ticks
  • Without inlining - 130200309 ticks

So, yes, it matters (about 15% perf difference in this sample)! And even better ... you get it for free!

For the IL geeks, the latter code fragment produces the following IL:

1 .method private hidebysig static void Swap(int32[] 'array', 2 int32 i, 3 int32 j) cil managed noinlining 4 { 5 // Code size 16 (0x10) 6 .maxstack 4 7 .locals init (int32 V_0) 8 IL_0000: nop 9 IL_0001: ldarg.0 10 IL_0002: ldarg.1 11 IL_0003: ldelem.i4 12 IL_0004: stloc.0 13 IL_0005: ldarg.0 14 IL_0006: ldarg.1 15 IL_0007: ldarg.0 16 IL_0008: ldarg.2 17 IL_0009: ldelem.i4 18 IL_000a: stelem.i4 19 IL_000b: ldarg.0 20 IL_000c: ldarg.2 21 IL_000d: ldloc.0 22 IL_000e: stelem.i4 23 IL_000f: ret 24 } // end of method Program::Swap

Observe the noinlining implementation flag on line 3 which tells the JIT compiler not to inline the method. Needless to say, you could also ildasm /out the original .exe file, add noinlining and re-ilasm the IL code to an .exe file ("roundtripping") to illustrate the same instead of using MethodImplAttribute as shown above.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Filed under: ,

Comments

# re: Inlining - yes, it happens

Monday, February 19, 2007 1:54 AM by Mitch Wheat

Hi Bart

The choice of pivot is important, because if the data is already ordered, the vanilla quicksort algorithm exhibits its pathological behaviour of O(N^2).  I don'y know what the framework uses under the covers (I'd be interested to know...) but it's hard to beat Sedgewick's Median of three (i.e pivot) with insertion sorting of small subsets for a general purpose sort.

# re: Inlining - yes, it happens

Monday, February 19, 2007 4:18 AM by bart

Hi Mitch,

I like the dynamism on my blog lately. As you pointed out, the choice of the pivot element (hence the variable name 'p') is important for (almost-)ordered inputs.

The framework uses a quicksort algorithm for Array.Sort with the median-of-3 algorithm to find the pivot value (static private method GetPivotValue in Array). You can inspect the sources in the SSCLI distribution if you want.

-Bart

# Exception Stack Trace difference between Debug and Release mode - Programmers Goodies

Pingback from  Exception Stack Trace difference between Debug and Release mode - Programmers Goodies