Saturday, July 05, 2008 1:35 AM bart

1.To(3) - Ruby-style Internal Iterators in C#

External or internal?

C# introduced the concept of iterators in C# 2.0 but it's a less-known fact that there are two sorts of iterators. The ones provided in C# are so-called external iterators. The distinction lies in the party that controls the enumeration of the iteration, e.g.

public IEnumerator<int> FromTo(int from, int to)
{
    for (int i = from; i <= to; i++)
        yield return i;
}

does by itself not perform any iteration; it's only when the consumer starts to suck data out of the enumerator object (typically using a foreach loop) that the data is fetched:

foreach (int i in FromTo(1, 3))
    Console.WriteLine(i);

In other words, the object returned by FromTo (which is a little state machine implementing IEnumerator<T> or IEnumerable<T>) just captures the capability of iterating over the produced results in an on-request basis. Notice using C# 3.0 syntax, we can make the use of it 'different', obtaining a more Ruby-like style:

public static IEnumerator<int> To(this int from, int to)
{
    for (int i = from; i <= to; i++)
        yield return i;
}

using an extension method so it can be called like this:

foreach (int i in 1.To(3))
    Console.WriteLine(i);

Still no different than what we started with but as you can guess: besides external iterators we have internal iterators. Internal iterators perform the iteration themselves (they push the generated values into the code body associated with them) and are therefore very similar to built-in loop control structures in the language. A possible implementation looks like this:

public static IEnumerator<int> To(this int from, int to, Action<int> a)
{
    for (int i = from; i <= to; i++)
        a(i);
}

which can be called like this:

1.To(3, i => { Console.WriteLine(i); });

using a C# 3.0 lambda expression. Notice there is no blue (syntax coloring) left, we have defined our own control structure. Nothing fancy but one thing we've actually given up here is our ability to define an object that captures the "iterator's potential" to iterate from 1 to 3 and that can be invoked multiple times. (Whether or not this is a cosmetic detail is left for judgement by the reader.) In other words, we'd like to be able to define a control structure like this:

var myLoop = 1.To(3);
myLoop(i => { Console.WriteLine(i); });
myLoop(i => { Console.WriteLine(i + " again!"); });

What we're doing here is making use of partial function application. It looks like we're calling the To method with just one parameter, allowing the other parameters to be filled in later, realizing higher-order functions. I deliberately obfuscated the type of 1.To(3) in the fragment above to elaborate on it a bit more now. What does the type of 1.To(3) need to look like? Well, we should be able to call it with one parameter of type Action<int>. The result of calling through the 1.To(3) object is purely side-effect based, i.e. the call doesn't return anything by itself, so the return type of the type we're looking for is void. To wrap up, a void-returning callable thing is nothing less than an Action<T> where T is our parameter, an Action<int>, so the resulting type is Action<Action<int>>. This level of "action-indirection" signifies the possible delayed characteristic of the iterator invocation. The reason the delayed execution is optional is obvious since one can call it like this:

1.To(3)(i => { Console.WriteLine(i); });

In Ruby one would write:

1.upto(3) { |i| print i }

 

Implementing internal iterators

So how to implement this? Not that difficult at all:

public static Action<Action<int>> To(this int from, int to)
{
    return a => { for (int i = from; i <= to; i++) a(i); };
}

Notice how the iteration moved inside the "iterator", how it's indirected through one level of lambda-abstraction and now yield keywords have been used (making it no longer an iterator in C# lingo). In fact, there's lots of magic going on in this fragment. First of all, the type inference carried out by the compiler. Since the signature of the To method indicates we're returning an Action<Action<int>> - which is fancy speak for delegate void _(delegate void_(int)) - the compiler can infer that the lambda's parameter a has type Action<int>:

return (Action<int> a) => { for (int i = from; i <= to; i++) a(i); };

Furthermore, lambda's are pure syntactical sugar for anonymous methods (at least in this case where no expression trees are involved):

return delegate (Action<int> a) { for (int i = from; i <= to; i++) a(i); };

The end result with sample looks like this:

image

We could stop here but if you're curious for the technical details that make this work, read on.

 

Behind the compiler curtains

The real question though is what gets returned. This is where closures kick in:

image 

The <>c__DisplayClass1 captures the from and to arguments to the iterator "constructor" call (since that's ultimately what our To call does) and has a method called <To>b__0 that contains our iteration logic:

image

Notice the call through the supplied Action<int> delegate on line IL_000c. An obvious question is whether or not a closure in this case is useful. What we're capturing here are the parameters from and to that are supplied to the To method (that lot's of to's too...). Since closures in C# capture lvals (mimicking copy by ref semantics) there's a code-window where the variables can change and where that change is propagated into the closure. Assume our To-implementation would look like this:

public static Action<Action<int>> To(this int from, int to)
{
    Action<Action<int>> res = a => { for (int i = from; i <= to; i++) a(i); };
    from++; to--;
    return res;
}

when calling To(1,3)(i => { Console.WriteLine(i); }); you'd now see only 2 getting printed on the screen. Why? The locals from and to are getting captured by the closure (also known as captured outer variables, see C# specification §14.5.15.3.1), so wherever one refers to from and to in the local scope the captured variables are getting referenced instead. So by the time a call is made through To(1,3) the captured values have both changed to 2 (from++, to--). However, this closure doesn't propagate outside the scope of the To method, in other words - and again assuming the original implementation of To - you won't see the effects of it when writing code like this:

int f = 1;
int t = 3;
var loop = f.To(t);
f++;
t--;
loop(i => { Console.WriteLine(i); });
loop(i => { Console.WriteLine(i + " again!"); });

This will still print all 6 expected lines. Let's eliminate our To method for a moment and try to write everything compacted in one place, like this:

Func<int, int, Action<Action<int>>> To = (from, to) => a => { for (int i = from; i <= to; i++) a(i); };
To(1, 3)(i => { Console.WriteLine(i); });
To(1, 3)(i => { Console.WriteLine(i + " again!"); });

Again 6 lines will be printed to the screen. Notice how the (inline) To "method" above has the same signature as our previous extension method To: we take in two ints and return an Action<Action<int>>. Also notice how lambda arrows are right associative, you should read it as:

(from, to) => (a => { for (int i = from; i <= to; i++) a(i); })

What now with closures? Actually, nothing changes. I did say closures capture the outer scope; however, from and to used in the body of the inner lambda refer to the locals from and to defined in the outer lambda, so when calling To(f, t) below:

int f = 1;
int t = 3;
var loop = To(f, t);
f++;
t--;
loop(i => { Console.WriteLine(i); });
loop(i => { Console.WriteLine(i + " again!"); });

the values f and t are copied into the anonymous method of the outer lambda; where they get captured in a closure used in the inner lambda's body. When you'd loosen the abstraction of the To method like this:

int from = 1;
int to = 3;

Action<Action<int>> To = a => { for (int i = from; i <= to; i++) a(i); };

from++;
to--;

To(i => { Console.WriteLine(i); });
To(i => { Console.WriteLine(i + " again!"); });

there's just one lambda left for which the outer scope is the directly surrounding method, so from and to are captured by a closure and changing them will affect the operation of To.

The closure inside the To method doesn't really gain us anything. Obviously the question is which behavior one wants to realize: one where the loop's boundaries can be changed afterwards (much like closures would allow us to do) or one where the loop object is immutable. To realize the former - if you'd really insist - some creativity is required. 1.To(3) could return an object with properties From and To while also exposing a method called Invoke to invoke the loop with the specified Action<int>. An object with an Invoke method looks pretty much like a delegate, so geeks could hack up IL to create a delegate with From and To properties. Not worth the effort IMO (apart from being a very interesting experiment), immutability is great and this definitely applies to this particular case. However (:-)) if you still insist, there's another escape valve: perform one level of indirection to the arguments to the iterator itself:

public static Action<Action<int>> To(this Func<int> from, Func<int> to)
{
    return a => { for (int i = from(); i <= to(); i++) a(i); };
}

Notice the orthogonal structure between functions/classes (code versus data) and lifted functions/pointers (=> versus * and - the reverse - () versus *). Calling it doesn't look pretty anymore though (partially because the left-hand side used in an extension method doesn't work with a lambda expression since a lambda has no type by itself, it could either be a Func<...> or an Expression<Func<...>> depending on its - in this case non-existent - assignment):

int b = 3;
var loop = (new Func<int>(() => 1)).To(() => b);
b = 2;
loop(i => { Console.WriteLine(i); });
loop(i => { Console.WriteLine(i + " again!"); });

but here b is caught by a closure, so changing it after obtaining the loop variable will change it's semantics. Brrrr...

 

A dynamic approach

Sticking with immutable internal iterators, we could eliminate the closure and produce the delegate at runtime using Reflection.Emit. This would look as follows:

public static Action<Action<int>> To(this int from, int to)
{
    DynamicMethod m = new DynamicMethod("", null, new Type[] { typeof(Action<int>) });
    ILGenerator ilgen = m.GetILGenerator();

    Label loop = ilgen.DefineLabel();
    Label ret  = ilgen.DefineLabel();

    //
    // for (int i = from; i <= to; i++) { a(i); } return;
    //      ^^^^^^^^^^^^
    //
    var var_i = ilgen.DeclareLocal(typeof(int));
    ilgen.Emit(OpCodes.Ldc_I4, from);
    ilgen.Emit(OpCodes.Stloc, var_i);

    //
    // for (int i = from; i <= to; i++) { a(i); } return;
    //                         ^^
    //
    var var_to = ilgen.DeclareLocal(typeof(int));
    ilgen.Emit(OpCodes.Ldc_I4, to);
    ilgen.Emit(OpCodes.Stloc, var_to);

    //
    // for (int i = from; i <= to; i++) { a(i); } return;
    // ^^^^
    //
    ilgen.MarkLabel(loop);

    //
    // for (int i = from; i <= to; i++) { a(i); } return;
    //                    ^^^^^^^
    //
    ilgen.Emit(OpCodes.Ldloc, var_i);
    ilgen.Emit(OpCodes.Ldloc, var_to);
    ilgen.Emit(OpCodes.Cgt);
    ilgen.Emit(OpCodes.Brtrue, ret);

    //
    // for (int i = from; i <= to; i++) { a(i); } return;
    //                                  ^ ^^^^^
    //
    ilgen.Emit(OpCodes.Ldarg_0);
    ilgen.Emit(OpCodes.Ldloc, var_i);
    ilgen.Emit(OpCodes.Callvirt, typeof(Action<int>).GetMethod("Invoke"));

    //
    // for (int i = from; i <= to; i++) { a(i); } return;
    //                             ^^^
    //
    ilgen.Emit(OpCodes.Ldloc, var_i);
    ilgen.Emit(OpCodes.Ldc_I4_1);
    ilgen.Emit(OpCodes.Add);
    ilgen.Emit(OpCodes.Stloc, var_i);

    //
    // for (int i = from; i <= to; i++) { a(i); } return;
    //                                          ^
    //
    ilgen.Emit(OpCodes.Br, loop);

    //
    // for (int i = from; i <= to; i++) { a(i); } return;
    //                                            ^^^^^^^
    //
    ilgen.MarkLabel(ret);
    ilgen.Emit(OpCodes.Ret);

    return (Action<Action<int>>)m.CreateDelegate(typeof(Action<Action<int>>));
}

Performance-wise you'll see that the latter obviously is slower (and actually on the functional level you don't gain anything since the redundant closure is harmless as we have proven higher up - also concerning memory allocation we're trading closure objects for DynamicMethod and other Reflection.Emit object instances). It shows however how a 3rd party compiler (or interpreter) could emit IL code that realizes an internal iterator with very little code involved (only 17 IL instructions).

 

Other iterators

Of course the "To" operator iterator is just one of the many possibilities. Here's just one of the many:

public static Action<Action<T>> ForEach<T>(this IEnumerable<T> source)
{
    return a => { foreach (T item in source) a(item); };
}

where the indirection of actions is a little artificial, agreed:

new int [] { 1, 2, 3 }.ForEach()(i => { Console.WriteLine(i); });

This is a place where extension properties would come in handy, or obviously we could just write the following instead:

public static void ForEach<T>(this IEnumerable<T> source, Action<T> action)
{
    foreach (T item in source) action(item);
}

Enjoy!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Filed under: ,

Comments

# Dew Drop &ndash; July 5, 2008 | Alvin Ashcraft's Morning Dew

Pingback from  Dew Drop &ndash; July 5, 2008 | Alvin Ashcraft's Morning Dew

# re: 1.To(3) - Ruby-style Internal Iterators in C#

Saturday, July 05, 2008 9:51 PM by Daniel Spiewak

Actually, that's still not an internal iterator.  Internal iterators by definition are defined by mutable containers.  What you showed was simply a higher-order function which encapsulates an external iterator (via the for loop).  A true internal iterator would be something like this (example from Ruby):

class ArrayList

def initialize

@data = []

@index = 0

end

def push(e)

@data.push e

end

def get

@data[index]

end

def next

@index += 1

end

def prev

@index -= 1

end

def has_next?

@index < @data.size

end

end

You'll notice that we maintain an index *within* the actual container.  Thus, the process of iterating over the collection requires first changing internal state of the collection itself and then grabbing the value at the current location.  Internal iterators are an inherently imperative construct, whereas higher-order functions are inherently functional.  That's not to say that the concepts cannot mix, but if you are using one, chances are it will obviate the other.

# re: 1.To(3) - Ruby-style Internal Iterators in C#

Sunday, July 06, 2008 12:30 AM by bart

Hi Daniel,

Thanks for your feedback. In fact the point you bring up causes a déjà-vu at my side since a couple of weeks before writing this post - and that's the cause of it - I had a discussion here on iterator patterns with respect to some of my LINQ-related work (cf. pull versus push semantics when brining LINQ to pipeline-driven environments like PowerShell).

Personally I believe it's a matter of purity. I'm a purist myself, albeit a pragmatic one (purity is nice to persue but it shouldn't yield impracticality :-)). The question here is whether or not an encapsulated external iterator can be seen as an internal iterator - it depends where you draw the line between what's considered external versus internal. And that's a fuzzy thing IMO: (!external != internal), due to possible layering...

In this particular case, can the result of the higher-order function be seen as the boundary provider between external and internal? I believe so - to the original collection it's still external but to the user it looks like internal. The phrase "looks like" obviously conflicts with a purist's view considering that the state of the iteration needs to be carried by the object-being-iterated-over itself.

Another way to look at it is with respect to where the cursor is held: purily consumer-side, purily provider-side or somewhere in the middle. The implementation outlined in this post follows the latter pattern, much like a database cursor (the database itself doesn't change nor does the client itself hold the physical cursor, instead you have a party in the middle - typically held server-side though - that captures the iteration's state as part of some bigger session state). One could argue we get in the area of the memento pattern here...

Still, the characteristic of push versus pull is maintained here. The "internal iterator" is cruel with respect to the iteration: once you call it (in this case through the higher-order function) there's no way to control the iteration itself (in terms of a break statement) - it simply pushes all data through the associated action. An "external iterator" on the other hand would keep this control (and is therefore more practical and flexible in most cases).

In a pure world, holding on to a definition that mandates the iteration state to be kept in the object itself, there's no way to provide this pattern without changing the underlying objects. Extension methods are completely external (no instance-level "injection" magic possible), so they are ruled out. The only way one would achieve this is by defining some IIteratable interface (a horrible name):

interface IIteratable<T>

{

   void Iterate(Action<T> action);

}

which would intentionally be asymmetric compared to IEnumerable in that it doesn't define an IIterator<T> retrieved through a GetIterator() method since that would push the iteration state out the underlying object once more. Alternatively one would have methods like HasNext, HasPrevious, MoveNext, MovePrevious, Reset and get_Current but then the client controls the iteration again: we pushed the state inside but leave control to the client, which would again be seen as an external iterator (at least according to The Gang of Four, pg 260).

Afer all, I'm not a huge fan (to say the least) of mutable collections with respect to what I'd call "aspect state" (the fact one can iterate over a collection is rather an aspect to the fact it's a container preserving some well- or ill-defined order). It's too much like friends in C++, and indeed that's oftenly used to implement this pattern in those languages.

I'd say the key take-away of the post is the thing I'm advocating between the lines: functions can serve as great abstractions (sometimes providing imperative-style of constructs in a functional coat, like switch-"expressions") but at the moment developers tend to think more in terms of data-driven decomposition (OO) rather than functional decomposition. In this very particular case, the functional problem decomposition keeps the underlying collection "pure" (just like external iterators do, modifications inside the body set apart obviously) while still maintaining the - IMO - core distinction between internal versus external iterators: who controls the iteration.

Thanks,

-Bart

# re: 1.To(3) - Ruby-style Internal Iterators in C#

Sunday, July 06, 2008 1:51 PM by Daniel Spiewak

Good thoughts, Bart.  I'll agree that the core take-away from your post is quite accurate: functions provide excellent abstractions.  I personally think that they go quite nicely side-by-side with object-oriented abstractions.  C# >=3.0 is really moving that way (at least somewhat with lamdas and delegates) and developers seem to really like the way that a taste of FP inside OOP can be very nice and actually *helpful*.  Scala goes quite a bit further by encouraging a more functional style, but it does this without de-emphasizing object-oriented design and abstractions.

Oh, and I agree, immutable collections are a joy once you get used to them.  They aren't perfect for everything, but I've found myself using them more and more recently and I haven't missed a thing.

# Link Listing - July 6, 2008

Monday, July 07, 2008 12:07 AM by Christopher Steen

Link Listing - July 6, 2008

# Link Listing - July 6, 2008

Monday, July 07, 2008 12:08 AM by Christopher Steen

Sharepoint Having WSPBuilder exclude your dll when creating a .wsp [Via: Sahil Malik ] WPF TemplateBinding:...

# Reflective Perspective - Chris Alcock &raquo; The Morning Brew #130

Pingback from  Reflective Perspective - Chris Alcock  &raquo; The Morning Brew #130

# Websites tagged "opcodes" on Postsaver

Sunday, January 04, 2009 12:47 PM by Websites tagged "opcodes" on Postsaver

Pingback from  Websites tagged "opcodes" on Postsaver