Saturday, March 26, 2005 12:06 AM bart

Adventures in Comega - part 2 (Streams)

Introduction

Streams in Cw are a way to create a kind of arrays (which consist of elements of a certain defined type) that are only created when these are needed (we call this lazy construction). As the matter in fact, streams are nothing more than autogenerated classes that are spit out by the compiler upon compilation. However, the concept op streams makes their usage completely transparent because of the automatic implementation of IEnumerable, which provides support for the "foreach iterator" usage, and - as explained further on - even more mechanisms to iterate over the elements.

 

Declaration

The first thing to do is to declare a stream in Cw. As I mentioned before, a stream is kind of an array with elements of a certain type. Therefore, we need to declare the type of course. To indicate you want to construct a stream, you're using the * operator. As an example, consider a stream of integers:

int* a;

C/C++ folks will recognize a pointer notation in this. It might help to think of this notation as one of the notations to declare an array in C/C++, and that idea makes sense pretty much as the concept of a stream is based on the concept of arrays.

 

Yield

Now it's time to populate the stream. As a stream is a "lazy built array", the system will build it dynamically by "yielding" the values in it. A simple approach looks like this:

int* GetStream()
{
     yield return 0;
}

By calling the GetStream method, you'll end up with a stream that contains the value 0. Not that exciting, but enough to start explaining the concepts a little further. The usage of the stream looks now as follows:

void UseIt()
{
     int* a;
     a = GetStream();
     foreach(int i in a)
          Console.WriteLine(i);
}

By executing this code, you'll see ... 0 on the screen. Predictable I guess. Now the point is that the GetStream method could do more than just one yield too to build the stream. Even more, you can populate the stream based on decision logic, loops, and so on, like this:

int* GetStream(int s, int e)
{
     while(s <= e)
          yield return s++;
}

By calling GetStream(1,5), you'll get a stream that contains 1,2,3,4,5.

 

How does it work?

Okay, you've seen the basic principles of the stream and yield right now. Let's take a look at how this gets constructed internally. Because Cw runs on the .NET Framework v1.1, it's just generating (that is, the cwc.exe compiler) MSIL code. When you inspect the generated assembly through ildasm, you'll see your method GetStream in the IL-code looking like this:

.method private hidebysig static class System.Collections.Generic.'IEnumerable'
        GetStream(int32 s,
                  int32 e) cil managed
{
  // Code size       31 (0x1f)
  .maxstack  2
  .locals init (class Streams/'closure:765' V_0,
           class System.Collections.Generic.'IEnumerable' V_1,
           class System.Collections.Generic.'IEnumerable' V_2)
  IL_0000:  newobj     instance void Streams/'closure:765'::'.ctor$PST06000007'()
  IL_0005:  stloc.0
  IL_0006:  ldloc.0
  IL_0007:  ldarg.0
  IL_0008:  stfld      int32 Streams/'closure:765'::s$PST04000001
  IL_000d:  ldloc.0
  IL_000e:  ldarg.1
  IL_000f:  stfld      int32 Streams/'closure:765'::e$PST04000002
  IL_0014:  ldloc.0
  IL_0015:  stloc.1
  IL_0016:  br         IL_001b
  IL_001b:  ldloc.1
  IL_001c:  stloc.2
  IL_001d:  ldloc.1
  IL_001e:  ret
} // end of method Streams::GetStream

What's going on here? Quite a lot, but the most interesting part is actually the fact that the GetStream method is creating an instance of some "closure:765" class, which was generated during the compilation and has the following signature:

.class auto ansi sealed nested private specialname 'closure:765'
       extends [mscorlib]System.Object
       implements [mscorlib]System.Collections.IEnumerable,
                  System.Collections.Generic.'IEnumerator',
                  [mscorlib]System.Collections.IEnumerator,
                  [mscorlib]System.IDisposable,
                  System.Collections.Generic.'IEnumerable'
{
} // end of class 'closure:765'

As you can see, the class is nested and is implementing a bunch of IEnumera* interfaces, both generic ad "classic" (notice that the System.Collections.Generic namespace is present in Cw on .NET v1.1 too, whileas this is one of the big features in C# 2.0 today).

Secondly, this class has two privatescope-d variables s and e that are used by the GetStream method to pass through the parameters to the nested class:

.field privatescope int32 s$PST04000001
.field privatescope int32 e$PST04000002

Beside of this, there's also the field "currentValue" that's being used to report the current value of the stream to the caller (via the enumerator):

.field private int32 'current Value'

The real "magic" is going on in the MoveNext method that is called every time the next element has to be retrieved. The contents of this method is quite predictable and will make decisions based on the current value together with s and e to return the desired value in the stream:

.method public virtual instance bool  MoveNext() cil managed
{
  // Code size       74 (0x4a)
  .maxstack  5
  .locals init (class Streams/'closure:765' V_0,
           int32 V_1)
  IL_0000:  ldarg.0
  IL_0001:  stloc.0
  IL_0002:  ldarg.0
  IL_0003:  ldfld      int32 Streams/'closure:765'::'current Entry Point: '
  IL_0008:  switch     (
                        IL_0015,
                        IL_0046)
  IL_0015:  ldloc.0
  IL_0016:  ldfld      int32 Streams/'closure:765'::s$PST04000001
  IL_001b:  ldloc.0
  IL_001c:  ldfld      int32 Streams/'closure:765'::e$PST04000002
  IL_0021:  bgt        IL_0048
  IL_0026:  ldarg.0
  IL_0027:  ldloc.0
  IL_0028:  ldfld      int32 Streams/'closure:765'::s$PST04000001
  IL_002d:  stloc.1
  IL_002e:  ldloc.0
  IL_002f:  ldloc.1
  IL_0030:  ldc.i4.1
  IL_0031:  add
  IL_0032:  stfld      int32 Streams/'closure:765'::s$PST04000001
  IL_0037:  ldloc.1
  IL_0038:  stfld      int32 Streams/'closure:765'::'current Value'
  IL_003d:  ldarg.0
  IL_003e:  ldc.i4.1
  IL_003f:  stfld      int32 Streams/'closure:765'::'current Entry Point: '
  IL_0044:  ldc.i4.1
  IL_0045:  ret
  IL_0046:  br.s       IL_0015
  IL_0048:  ldc.i4.0
  IL_0049:  ret
} // end of method 'closure:765'::MoveNext

First, there is some branching going on based on the current value of s and e, and if still in the scope, s is incremented (add) and set to the current value and the method returns.

Finally, main calls the GetStream method and calls the enumerator to iterate over the collection in order to Console.WriteLine the values to the screen:

.method public hidebysig static void  Main() cil managed
{
  .entrypoint
  // Code size       66 (0x42)
  .maxstack  7
  .locals init (class System.Collections.Generic.'IEnumerable' V_0,
           class System.Collections.Generic.'IEnumerable' V_1,
           class System.Collections.Generic.'IEnumerator' V_2,
           int32 V_3,
           int32 V_4)
  IL_0000:  ldc.i4.1
  IL_0001:  ldc.i4.5
  IL_0002:  call       class System.Collections.Generic.'IEnumerable' Streams::GetStream(int32,
                                                                                                       int32)
  IL_0007:  stloc.0
  IL_0008:  ldloc.0
  IL_0009:  stloc.1
  IL_000a:  ldloc.1
  IL_000b:  brfalse    IL_003b
  IL_0010:  ldloc.1
  IL_0011:  callvirt   instance class System.Collections.Generic.'IEnumerator' System.Collections.Generic.'IEnumerable'::GetEnumerator()
  IL_0016:  stloc.2
  IL_0017:  ldloc.2
  IL_0018:  brfalse    IL_003b
  IL_001d:  ldloc.2
  IL_001e:  callvirt   instance bool System.Collections.Generic.'IEnumerator'::MoveNext()
  IL_0023:  brfalse    IL_003b
  IL_0028:  ldloc.2
  IL_0029:  callvirt   instance int32 System.Collections.Generic.'IEnumerator'::get_Current()
  IL_002e:  stloc.3
  IL_002f:  ldloc.3
  IL_0030:  stloc.s    V_4
  IL_0032:  ldloc.s    V_4
  IL_0034:  call       void [mscorlib]System.Console::WriteLine(int32)
  IL_0039:  br.s       IL_001d
  IL_003b:  call       string [mscorlib]System.Console::ReadLine()
  IL_0040:  pop
  IL_0041:  ret
} // end of method Streams::Main

Notice the return type for the int*; it's just a generic enumerable of Int32 values. For the geeks, take a look at the closure:765 nested class's get_Current method. You'll remark that it's using boxing, something that has to do with the usage of a non-generic class (boxing/unboxing). For more information about these issues and the evolution in .NET v2.0, consult documentation about generics in C# v2.0 and so on.

 

Intermediate wrap-up

So, what did we see so far? By declaring a stream, you're in fact declaring a class that is IEnumerable and builds its content at runtime by executing a yield statement that was translated to code inside the MoveNext method of the IEnumerable implementation of the stream type. Thus, a stream is effectively building its contents when the program is executing in an incremental fashion, whereas classic collections (arrays or System.Collection objects) are typically built upfront and then iterated over by means of the enumerator code (e.g. by using foreach).

 

Even more stuff ... apply-to-all-expressions

But there is more, something we call "apply-to-all-expressions". In my code samples you saw the typical usage of the foreach loop construct to iterate over the values in the collection (in this case, in the stream). Cw supports another construct that doesn't require the declaration of another variable to hold the values of the elements in the collection by means of the keyword "it". Basically what happens is that you attach a code-block to an instance of a stream and inside that codeblock the "it" keyword has the right type (that is, the type of the elements in the stream) that can be used to retrieve the value for the current iteration. Let's show you:

void UseIt()
{
     int* a;
     a = GetStream();
     a.{ Console.WriteLine(it); };
}

This code can of course be abbreviated to:

void UseIt()
{
     GetStream().{ Console.WriteLine(it); };
}

Or the code block can contain multiple statements. When you go back to the IL code for this program, you'll notice two things:

  • The nested closure class has another identifier.
  • The is another nested closure class in the class.

The second remark is the most interesting one. So, locate the original closure and the new one and open up the new one to look at more details. In my case, the new one is called closure:561 and contains a function called "Function:544" that has the following IL code inside it:

.method privatescope instance void  'Function:544$PST06000010'(int32 it) cil managed
{
  .param [0]
  .custom instance void [mscorlib]System.Diagnostics.DebuggerHiddenAttribute::.ctor() = ( 01 00 00 00 )
  .custom instance void [mscorlib]System.Diagnostics.DebuggerStepThroughAttribute::.ctor() = ( 01 00 00 00 )
  // Code size       12 (0xc)
  .maxstack  8
  IL_0000:  ldarg.1
  IL_0001:  call       void [mscorlib]System.Console::WriteLine(int32)
  IL_0006:  br         IL_000b
  IL_000b:  ret
} // end of method 'closure:561'::'Function:544'

This is where the code of the apply-to-all-expression is compiled to. One parameter is passed to the method, containing the strongly typed "it" value. The caller function has changed a little too, in order to call this function:

.method public hidebysig static void  Main() cil managed
{
  .entrypoint
  // Code size       99 (0x63)
  .maxstack  9
  .locals init (class Streams/'closure:561' V_0,
           class System.Collections.Generic.'IEnumerable' V_1,
           int32 V_2)
  IL_0000:  newobj     instance void Streams/'closure:561'::'.ctor$PST0600000F'()
  IL_0005:  stloc.0
  IL_0006:  ldloc.0
  IL_0007:  ldc.i4.1
  IL_0008:  ldc.i4.5
  IL_0009:  call       class System.Collections.Generic.'IEnumerable' Streams::GetStream(int32,
                                                                                                       int32)
  IL_000e:  stfld      class System.Collections.Generic.'IEnumerable' Streams/'closure:561'::p$PST04000005
  IL_0013:  ldloc.0
  IL_0014:  ldfld      class System.Collections.Generic.'IEnumerable' Streams/'closure:561'::p$PST04000005
  IL_0019:  stloc.1
  IL_001a:  ldloc.1
  IL_001b:  brfalse    IL_005c
  IL_0020:  ldloc.0
  IL_0021:  ldloc.1
  IL_0022:  callvirt   instance class System.Collections.Generic.'IEnumerator' System.Collections.Generic.'IEnumerable'::GetEnumerator()
  IL_0027:  stfld      class System.Collections.Generic.'IEnumerator' Streams/'closure:561'::'foreachEnumerator: 2$PST04000006'
  IL_002c:  ldloc.0
  IL_002d:  ldfld      class System.Collections.Generic.'IEnumerator' Streams/'closure:561'::'foreachEnumerator: 2$PST04000006'
  IL_0032:  brfalse    IL_005c
  IL_0037:  ldloc.0
  IL_0038:  ldfld      class System.Collections.Generic.'IEnumerator' Streams/'closure:561'::'foreachEnumerator: 2$PST04000006'
  IL_003d:  callvirt   instance bool System.Collections.Generic.'IEnumerator'::MoveNext()
  IL_0042:  brfalse    IL_005c
  IL_0047:  ldloc.0
  IL_0048:  ldfld      class System.Collections.Generic.'IEnumerator' Streams/'closure:561'::'foreachEnumerator: 2$PST04000006'
  IL_004d:  callvirt   instance int32 System.Collections.Generic.'IEnumerator'::get_Current()
  IL_0052:  stloc.2
  IL_0053:  ldloc.0
  IL_0054:  ldloc.2
  IL_0055:  call       instance void Streams/'closure:561'::'Function:544$PST06000010'(int32)
  IL_005a:  br.s       IL_0037
  IL_005c:  call       string [mscorlib]System.Console::ReadLine()
  IL_0061:  pop
  IL_0062:  ret
} // end of method Streams::Main

 

Constructing new streams based on existing streams

Based on an apply-to-all-expression you can build up a new stream, that's built by converting the type or by calling some method in order to make a conversion. A basic sample looks like this:

string* newStream = GetStream().{ return it.ToString() };

This will be created in a similar fashion as the previous example. This time another function will be created for the apply-to-all-expression that performs the return it.ToString(); code. But there is more going on, because we are declaring another stream type based on a string this time. This results in another stream class being created, nested inside the other stream class:

.class auto ansi sealed nested private specialname 'closure:1241'
       extends [mscorlib]System.Object
       implements [mscorlib]System.Collections.IEnumerable,
                  System.Collections.Generic.'IEnumerator<System.String>',
                  [mscorlib]System.Collections.IEnumerator,
                  [mscorlib]System.IDisposable,
                  System.Collections.Generic.'IEnumerable<System.String>'
{
} // end of class 'closure:1241'

Inside the MoveNext method you'll find code that calls the conversion function this time:

.method public virtual instance bool  MoveNext() cil managed
{
  // Code size       118 (0x76)
  .maxstack  10
  .locals init (class Streams/'closure:561'/'closure:1241' V_0,
           class System.Collections.Generic.'IEnumerable' V_1,
           int32 V_2,
           int32 V_3)
  IL_0000:  ldarg.0
  IL_0001:  stloc.0
  IL_0002:  ldarg.0
  IL_0003:  ldfld      int32 Streams/'closure:561'/'closure:1241'::'current Entry Point: '
  IL_0008:  switch     (
                        IL_0015,
                        IL_0072)
  IL_0015:  ldloc.0
  IL_0016:  ldfld      class System.Collections.Generic.'IEnumerable' Streams/'closure:561'/'closure:1241'::Collection$PST04000009
  IL_001b:  stloc.1
  IL_001c:  ldloc.1
  IL_001d:  brfalse    IL_0074
  IL_0022:  ldloc.0
  IL_0023:  ldloc.1
  IL_0024:  callvirt   instance class System.Collections.Generic.'IEnumerator' System.Collections.Generic.'IEnumerable'::GetEnumerator()
  IL_0029:  stfld      class System.Collections.Generic.'IEnumerator' Streams/'closure:561'/'closure:1241'::'foreachEnumerator: 3$PST0400000D'
  IL_002e:  ldloc.0
  IL_002f:  ldfld      class System.Collections.Generic.'IEnumerator' Streams/'closure:561'/'closure:1241'::'foreachEnumerator: 3$PST0400000D'
  IL_0034:  brfalse    IL_0074
  IL_0039:  ldloc.0
  IL_003a:  ldfld      class System.Collections.Generic.'IEnumerator' Streams/'closure:561'/'closure:1241'::'foreachEnumerator: 3$PST0400000D'
  IL_003f:  callvirt   instance bool System.Collections.Generic.'IEnumerator'::MoveNext()
  IL_0044:  brfalse    IL_0074
  IL_0049:  ldloc.0
  IL_004a:  ldfld      class System.Collections.Generic.'IEnumerator' Streams/'closure:561'/'closure:1241'::'foreachEnumerator: 3$PST0400000D'
  IL_004f:  callvirt   instance int32 System.Collections.Generic.'IEnumerator'::get_Current()
  IL_0054:  stloc.2
  IL_0055:  ldloc.2
  IL_0056:  stloc.3
  IL_0057:  ldarg.0
  IL_0058:  ldloc.0
  IL_0059:  ldfld      class Streams/'closure:561' Streams/'closure:561'/'closure:1241'::Closure$PST0400000A
  IL_005e:  ldloc.3
  IL_005f:  call       instance string Streams/'closure:561'::'Function:595$PST06000014'(int32)
  IL_0064:  stfld      string Streams/'closure:561'/'closure:1241'::'current Value'
  IL_0069:  ldarg.0
  IL_006a:  ldc.i4.1
  IL_006b:  stfld      int32 Streams/'closure:561'/'closure:1241'::'current Entry Point: '
  IL_0070:  ldc.i4.1
  IL_0071:  ret
  IL_0072:  br.s       IL_0039
  IL_0074:  ldc.i4.0
  IL_0075:  ret
} // end of method 'closure:1241'::MoveNext

Remark the nesting depth and the call to the function to perform the conversion, which looks pretty simple:

.method privatescope instance string  'Function:595$PST06000014'(int32 it) cil managed
{
  // Code size       17 (0x11)
  .maxstack  3
  .locals init (string V_0,
           string V_1)
  IL_0000:  ldarga.s   it
  IL_0002:  call       instance string [mscorlib]System.Int32::ToString()
  IL_0007:  stloc.0
  IL_0008:  br         IL_000d
  IL_000d:  ldloc.0
  IL_000e:  stloc.1
  IL_000f:  ldloc.0
  IL_0010:  ret
} // end of method 'closure:561'::'Function:595'

I'd recommend to mess around in the IL a little more to see what's going on if you're really interested in this stuff. Once you understand the basic tricks, it's fairly easy to understand what's the magic stuff all about.

 

More samples?

Comega comes with a bunch of examples of streams that are interesting to check out further. I strongly recommend to ildasm the generated code to get a better image of the overall structure and ideas.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Filed under:

Comments

No Comments