Sunday, July 12, 2009 12:08 AM bart

Bart’s Control Library – Not What You Think It Is – Part 1

Introduction

In the last installment of our control library exploration, we kept things relatively simply by looking at the if-statement. In fact, we avoided a bunch of complications that have to do with non-local return constructs like break, continue and return (and put throw on that pile as well if you like). As an implication, we aren’t able to provide full-fidelity loop constructs just yet. Some glue is missing, and today’s write-up reflects the quest for such a tube of glue. Be prepared to be shocked initially though :-).

 

The while-loop naively translated

C#’s while-loop can be considered the mother of all loops. For and foreach derive from it. Okay, there’s still the do-loop as well, but let’s ignore this one for now. It will be simple enough to derive from the while-loop’s treatment. What about a sample loop to translate?

int i = 0;
while (i < 10)
{
    Console.WriteLine(i);
    i++;
}

Yes, I know, it’s a for-loop (almost, what’s missing?) but bear with me. There’s one observation to be made here that contrasts the while-loop’s structure from the if-statement one’s: the condition expression needs to be reevaluated multiple times, so just passing a Boolean argument to our control flow library CC# function won’t fly. We really need a function here, of type Func<bool>. Other than that, we can trivially (?) translate the loop body to an Action and feed it as an argument to the control flow function:

int i = 0;
Control.While(() => i < 10, () =>
{
    Console.WriteLine(i);
    i++;
});

Easy enough, isn’t it? How would such a function be defined, let’s have a look:

static class Control
{
    public static void While(Func<bool> condition, Action body)
    {
        while (condition()) body();
    }
}

But as you already know, we’re missing out on something: support for break and continue. In fact, the Parallel Extensions loop constructor can praise themselves happy a naive approach works for them, as I can’t come up with a meaningful interpretation of break and continue in a parallel loop construct. Well, maybe for break (which could be encoded using a return from the lambda function). But what about continue? Maybe just the same as break in such a context? Who knows.

 

Non-local return

The concept of non-local return is made clear in the following sample:

int i = 0;
while (i < 10)
{
    Console.WriteLine(i);
    if (i > 5)
        break;
    i++;
}

Useless code, but of conceptual importance. If we naively port this code to our naive CC# equivalent, we end up with a compile-time error:

int i = 0;
Control.While(() => i < 10, () =>
{
    Console.WriteLine(i);
    Control.If(() => i > 5, () => {
        break;
    });
    i++;
});

In fact, since our mission in last post was nicely summarized to be “eliminating the blue”, we can see we’re failing here already: the break keyword still lights up as primitive syntactical language surface (notice int isn’t to be treated as such, it’s shorthand for Int32 which is a type).

Always breaks at compile-time

Notice as a curiosity you can provide a proof that naive translation of a loop constructs with our current control flow library always yields a compile-time error in case break or continue statements are used. You can see this intuitively, since break and continue only work when embedded in a loop construct (what about switch’s overloaded use of break?). As we’re compiling away all such loop constructs, break and continue statements will end up orphaned.

In fact, we’re lucky it always produces a compile-time error as we’re not spoiling the program’s semantics that way: non-compiling code is the best we can hope for and it works out nicely for break and continue. Use of return is more worrisome:

int i = 0;
while (i < 10)
{
    Console.WriteLine(i);
    if (i > 5)
        return;
    i++;
}

translates into

int i = 0;
Control.While(() => i < 10, () =>
{
    Console.WriteLine(i);
    Control.If(() => i > 5, () => {
        return;
    });
    i++;
});

which will compile-fine. But we’ve just changed semantics: return here plays the role of … continue. Why? While we’re repeatedly executing the body through delegate invocation, returning from that delegate won’t stop that repetition. All it does it stopping the current loop body invocation prematurely, after which Control.While will move on to the next iteration. That’s what continue is used for.

Notice how the translation of the code to control flow library method calls gave rise to closures over the outer scope’s variables:

int i = 0;
Control.While(() => i < 10, () =>
{
    Console.WriteLine(i);
    Control.If(() => i > 5, () => {
        break;
    });
    i++;
});

We even got to a nested situation here. Pop quiz: how many heap-allocated integer values exist here? (You may be counter-surprised.) Does this pose another semantics-preserving challenge? Why (not)?

 

Staying sane?

Based on the observation in the previous side-bar we can think of a translation where break and continue translate into return statements from the loop body delegate, signaling whether execution should continue or break from the loop. Does this even work? Let’s have a look. Here’s a possible attempt (don’t ask me what it means):

int i = 0;
while (i < 10)
{
    Console.WriteLine(i++);

    if (i > 5)
        return;
    if (i % 2 == 0)
        continue;
    Console.WriteLine(i);
}

becomes

int i = 0;
Control.While(() => i < 10, () =>
{
    Console.WriteLine(i++);
    Control.If(() => i > 5, () => {
        return false; // break
    });
    Control.If(() => i > 5, () => {
        return true; // continue
    });
    i++;
    return true; // continue
});

Of course I’m the devil in disguise again, as my not-so-contrived sample above uses break and continue in a nested if-statement (and quite frankly, how many times would you have a plain break or continue statement directly embedded in a loop body, without intermediate statements like if?).

Now we have the problem of having the loop-behavior-controlling return statements buried inside an inner lambda expression that’s used by the if-statement. Things start to become more clumsy every minute. Do we need to tweak Control.If to return a Boolean just because it so happens that it may be used inside a loop? Or should we be emitting code that adds Boolean “shouldBreak” and/or “shouldContinue” variables to the loop body, make break and continue statements set that, and check whether the value has changed at any point it could (e.g. after returning from Control.If). And what if loops are nested? Oh my.

Nested loops

One thing C# (amongst many other imperative languages) doesn’t have is the ability to break or continue from another loop that the immediately enclosing one. I.e. you can’t break from outer loops in any way directly supported by the language, unless you use addition Boolean flags to drive such an execution path manually. No, I’m not advocating such a style of coding would be desirable but it’s worth thinking about how such a thing could be achieved.

What about using nesting levels integers for breaking? For example, break 0 means to break from the directly enclosing loop (where 0 could be optional of course), while break 1 stands for “break from the loop one level up”. Hmm, readable? Sometimes maybe. Also think about refactoring: you’ll have to be extremely careful not to introduce additional breakable statements unless you update the break level numbers manually. Error-prone for sure and a curse for refactoring engines. Visual Studio already avoids refactoring loop bodies if a break or continue statement occurs (“The selection contains a “break” statement but not the enclosing statement”) but you’re not making things better by introducing this new “feature”.

An alternative would be to have loops denoted by symbolic names. If every loop has a name, you could break or continue from it by using that name. So, control flow structures no longer are just anonymous blocks of code, but become named entities. An obvious choice here would be to have control structures represented as objects, so their identifiers become variable (or better, constant) names:

var outer = while (condition1)
{
    var inner = while (condition2)
    {
        // …
        break outer;
        // …
    };
};

So loops no longer are statements but have become expressions (things that produce a value). With some hypothetical call-with-block-juxtaposition-lifting feature (what the ****?), this could be turned into a method call where the block is passed as a lambda.

We have a couple of problems here, but let’s first detail this “call-with-block-juxtaposition-lifting” animal. Juxtaposition simply means “two things written adjacent to one another”. Some languages even make juxtaposition a proper (overloadable) operator. In mathematics juxtaposition is often used to denote multiplication: a b stands for a * b. Here we’re concerned with juxtaposition of a method call and a code block:

method(args) { /* this is a block */ };

If blocks of code are shorthand for Action delegates, this actually corresponds to something along those lines:

Action __temp = () => { /* this is a block */ };
method(args) __temp;

What we want is something that allows to “eat” the object following the method call, and lift it into the call itself. In fact, this is the mess we’re in for using parentheses around argument lists (as opposed to whitespace-separated argument lists like add 1 2 in various functional languages). But you’ll agree lots of built-in primitives looks like this: keyword (args) { /* block */ }: using, lock, while, if, switch, for, do, etc. What we don’t want is to add syntactical noise to the block-passing code: “, () =>)”. This is what you see in the Parallel Extensions’ loop constructions.

In other words, the “method” above would indicate its desire to allow a juxtaposed argument at the end somehow:

WhileLoop while(Func<bool> condition, juxtaposed Action body) { … }

That fancy juxtaposed keyword would have similar rules as params in that it can only occur at the end. Of course you also want to drop the () => … noise for specifying the delayed condition, so that you can write the following:

var loop = while (condition)
{
    // Loop body
};

to stand for the equivalent:

var loop = while(() => condition, () =>
{
    // Loop body
});

Overload resolution could be the driver here, but it can get quite (and maybe too) subtle: if no Boolean first parameter exists, but a Func<Boolean> does exist the expression gets lifted in a lambda expression for delayed evaluation. In fact, you’re adding a calling mechanism next to by-value (the default) and by-ref (needs to be specified on both sides using “ref”)… And no, decolorization of while above is not a mistake as it has become a (global?) method call.

Now the remaining problem is how to refer to the loop variable’s name from inside the block itself. We’re hitting a violation of definite assignment:

Action blowUpStack = () => { blowUpStack(); };

The compiler rightfully complains about blowUpStack not to be assigned inside the body of the lambda expression with statement block. Yes, you’re right, it could be made smarter because it knows that the code in the lambda body can’t execute before the assignment has happened, but it isn’t. We could either use a fixpoint to solve this issue, or just do a bogus intermediate assignment:

Action blowUpStack = null;
blowUpStack = () => { blowUpStack(); };

Unfortunately for our while-loop code we’d loose our handy use of local variable type inference if we were to do so, as the initial value of “null” has no type in and by itself. Okay you could write the following, but you’ve just pushed the type to the other side:

var loop = default(WhileLoop);

Either way, all of this is not the point we’re making here. The language syntax could allow a loop to be assigned to a variable, and that variable could be used inside the loop to specify break or continue statements targeting it. Now, believe it or not but we’re gonna get such functionality for free in our mission to create a better control library that’s friendly in the face of non-local return.

What we learn from such a hypothetical juxtaposition mechanism is it can permit for dropping syntactical noise and allow people to write their own control flow structures simply by declaring fancy methods. For example, lots of people misuse the using statement to do performance measurement by implementing a Stopwatch-wrapping IDisposable implementation. If one could write a measure method with a juxtaposed block tagged to the end, allowing it to be called as if it were a proper keyword:

measure
{
    // Code goes here.
}

Extensible syntax? Not really, just much cleaner method calls. Language future-limiting? The best you could hope for is “call is ambiguous” error messages if the language itself decides to add a construct the user has invented already. That would happen if such additions would be themselves implemented as methods (possibly with compile-time rewrite treatment as shown in our previous post, to turn them into real efficient IL) which are always in the global scope. If the user defines a construct herself, e.g. the same while method as the compiler did, it’d have to be disambiguated by qualifying the method (MyExtensions.while or so). You get the idea.

Looks like this second naive approach is going to leave us with headaches once more: signaling break and continue is contagious to all sorts of statement functions that need to propagate loop behavior.

 

Going insane

A possible approach to tackle this non-local return requirement is by using a well-known non-local return mechanism in the CLR. You can feel it coming: we’re going to use “exceptions” as “regulars”. Performance folks are likely about to open a Facebook group on the spot against this practice, but I don’t care. Remember the mission of this whole series is to think about how things can be done and nothing more? It’s not coincidental this series is tagged as “Crazy Sundays”. And, as I’ve shown in the previous post, it’s possible to envision a compile-time rewrite mechanism that would translate use of the original primitives into efficient jump-based IL. In such a world we’d not be loosing anything; we’re gaining extensibility and generalization of built-in concepts. That’s what a theoretician loves: reducing concepts. Praise yourself lucky I’m not going bananas again with S and K.

So, how are we going about translating the following fragment in a semantics-preserving way without loosing break and continue usage?

int i = 0;
while (i < 10)
{
    Console.WriteLine(i++);

    if (i > 5)
        break;
    if (i % 2 == 0)
        continue;
    Console.WriteLine(i);
}

Recall we’re gaining control over the whole loop’s execution by turning this into a method. We’re playing a mini virtual machine essentially and just need a way for the user to signal a jump needs to be carried out. Such jumps need to be scoped to the enclosing loop, or in fact any loop, requiring us to have a mechanism to refer to a loop. In the previous side-bar we’ve seen an approach that’d name a loop by assigning it to a variable, but that’s a bit roundabout. We only need a way inside the loop body to refer to the loop. Since the loop’s body is represented as a lambda expression, what about parameterizing that one with a “control flow control variable” (a CFCV if you’re paid for creating fancy words rather than focusing on the technology itself <g>).

Here’s what we want to be able to write:

int i = 0;
Control.While(() => i < 10, ctrl =>
{
    Console.WriteLine(i++);

    Control.If(() => i > 5, () => {
        ctrl.Break();
    });
    Control.If(() => i % 2 == 0, () => {
        ctrl.Continue();
    });
    Console.WriteLine(i);
});

Wow, all the significant blue is gone. Bart is happy. Signaling a break or continue is done by calling the corresponding method on the CFCV of the corresponding loop. Notice this allows nested loops to break out of the not immediately enclosing loop as well. We’ve gained flexibility in fact.

Nested loops, again

Recall our sample from the previous side-bar? Let’s rewrite it here:

var outer = while (condition1)
{
    var inner = while (condition2)
    {
        // …
        break outer;
        // …
    };
};

Think away the var and maybe put the identifiers somewhere else (like “while (…) as name” or so, whatever) and imagine the translation of “break outer” to be “outer.Break()”. You’ve just enabled this generally applicable approach through a simple syntax-directed translation. In fact, I’ll detail such translations more formally later on.

The key question is of course how to make this work in the control library implementation. First of all, notice we’re not contagiously affecting other statements. In the sample above, the Control.If can continue to work as it did before. It just so happens its block contains a call on the lambda parameter from the outer (loop’s) lambda statement. That’s a perfectly valid thing to do, we don’t need any infrastructure to signal break or continue behavior through other kinds of statements, as exceptions (well, “regulars”) will take care of that.

Exception cost

The problem with exceptions is they serve a very particular use of non-local control flow. The kind that can span multiple stack frames, allows for filters requiring a two-pass stack walk treatment, and is layered on SEH to preserve full-fidelity exceptions in the face of native code calling into it. Non-exceptional use of exceptions is not something the CLR is optimized for. Though the mechanics are exactly what we need here, it’d have to be promoted to a first-class control-flow mechanism with its own performance optimizations to be considered a viable option. That this is not a crackpot idea is shown by other platforms.

Let’s reveal the goodies now. Here’s what the While method looks like:

public static void While(this Func<bool> condition, Action<LoopControl> body)
{
    var control = new LoopControl();

    while (condition())
    {
        if (!control.Do(body))
            break;
    }
}

At its heart, not surprisingly this is simply a while-loop. Recall the whole idea is to allow people to override the default behavior (a good idea or not, let’s leave that in the middle), in which case the implementation would be quite different (maybe peppering some additional checking onto the loop body, as in while-if). And the above could be compile-time rewritten to the original form as I alluded to before: call-sites of LoopControl.Break and LoopControl.Continue would be replaced by the original jump mechanisms.

Notice the loop itself only has a break-statement inside it. The continue is realized somewhere else. How this works is not unsurprisingly hidden behind the LoopControl.Do method, which obviously will execute the body and somehow get to know whether or not a break or continue is requested. If a continue is requested (or nothing has been requested at all and the body ran to completion), Do will return true, in case of a break it will return false instead. So, what does Do do?

interface ILoopControl : IControl
{
    void Break();
    void Continue();
}

sealed class LoopControl : ILoopControl
{
    public void Break()
    {
        throw new LoopControlException(this, LoopControlAction.Break);
    }

    public void Continue()
    {
        throw new LoopControlException(this, LoopControlAction.Continue);
    }

    internal bool Do(Action<LoopControl> body)
    {
        try
        {
            body(this);
        }
        catch (LoopControlException ex)
        {
            if (ex.Block != this)
                throw;

            if (ex.Action == LoopControlAction.Break)
                return false;
        }

        return true;
    }
}

enum LoopControlAction
{
    Break,
    Continue
}

Ignore the interfaces for now (IControl is plain empty in fact). Those are their for extensibility scenarios I’m not going to talk about just yet. The main idea is inside LoopControl where Break and Continue signal a control flow request by means of an exception. Do itself has as its main task to execute the body, passing in the loop control object as the parameter on which the user can signal such requests. This also allows the loop controller to know which loop has to be broken out of, which is what the first part of the LoopControlException handler does: is the request for us or for something higher up? Notice the use of a filter would be useful here.

The two cases of a LoopControlException are straightforward to see from the code. In case the exception was used to request a break, we return false spot on. Otherwise, we fall through (and the body execution has stopped anyway at the point the exception for continue was thrown) and return true to say the loop can continue to move on.

Pretty simply glue after all, isn’t it?

 

A word on compile-time rewrite

To get the whole loop code back to the original form, which leads to efficient IL and eliminates the cost of exceptions in use for the runtime-assisted execution form, we need to do an expression tree to expression tree rewrite operation. Since .NET 4.0 we have statement trees in the System.Core library’s System.Linq.Expressions namespace, so one could envision those to be the object model compilers will use to represent code. Today that’s the case for IronRuby and IronPython, but this is feasible for C# and VB as well. Assuming you are granted access to those trees by some mechanism (e.g. our [CompileTimeRewrite] attribute), you could rewrite them e.g. to implement an optimizer or to weave in aspects, etc.

Below is the .NET 4.0 way to new up an AST that is the equivalent to our code with the control flow structures library:

var ctrl = Expression.Parameter(typeof(LoopControl), "ctrl");
var cwi = (from mi in typeof(Console).GetMethods()
           where mi.Name == "WriteLine"
           let p = mi.GetParameters()
           where p.Length == 1 && p[0].ParameterType == typeof(int)
           select mi).Single();
var i = Expression.Variable(typeof(int), "i");
var m = Expression.Block(new [] { i },
    Expression.Assign(i, Expression.Constant(0)),
    Expression.Call(
        typeof(Control).GetMethod("While"),
        Expression.Lambda(Expression.LessThan(i, Expression.Constant(10))),
        Expression.Lambda(
            Expression.Block(
                Expression.Call(cwi, Expression.PostIncrementAssign(i)),
                Expression.Call(
                    typeof(Control).GetMethod("If"),
                    Expression.Lambda(Expression.GreaterThan(i, Expression.Constant(5))),
                    Expression.Lambda(
                        Expression.Call(ctrl, typeof(LoopControl).GetMethod("Break"))
                    )
                ),
                Expression.Call(
                    typeof(Control).GetMethod("If"),
                    Expression.Lambda(
                        Expression.Equal(
                            Expression.Divide(i, Expression.Constant(2)),
                            Expression.Constant(0)
                        )
                    ),
                    Expression.Lambda(
                        Expression.Call(ctrl, typeof(LoopControl).GetMethod("Continue"))
                    )
                ),
                Expression.Call(cwi, i)
            ),
            ctrl
        )
    )
);

The debug view of this code looks as follows, which starts to look a bit as fragmented C# code if you look from a distance :-). But rest assured it’s equivalent to the original form.

image

Notice closures over outer variables is something that isn’t reflected in the expression trees at this stage. Code to hoist locals for features like closures or iterators is something that’s deferred till a later stage of the compilation. Next, we need to have a look at our target which is the expression tree corresponding to the plain old while-loop in C#:

var cwi = (from mi in typeof(Console).GetMethods()
           where mi.Name == "WriteLine"
           let p = mi.GetParameters()
           where p.Length == 1 && p[0].ParameterType == typeof(int)
           select mi).Single();
var i = Expression.Variable(typeof(int), "i");
var @break = Expression.Label("break");
var @continue = Expression.Label("continue");
var m = Expression.Block(new [] { i },
    Expression.Assign(i, Expression.Constant(0)),
    Expression.Loop(
        Expression.Block(
            Expression.IfThen(
                Expression.Not(Expression.LessThan(i, Expression.Constant(10))),
                Expression.Break(@break)
            ),
            Expression.Call(cwi, Expression.PostIncrementAssign(i)),
            Expression.IfThen(
                Expression.GreaterThan(i, Expression.Constant(5)),
                Expression.Break(@break)
            ),
            Expression.IfThen(
                Expression.Equal(
                    Expression.Divide(i, Expression.Constant(2)),
                    Expression.Constant(0)
                ),
                Expression.Continue(@continue)
            ),
            Expression.Call(cwi, i)
        ),
        @break,
        @continue
    )
);

which looks like this in the debugger visualizer:

image

Notice there’s no concept of While or Do. There’s just Loop, with break and continue labels peppered on it. The goal of a compile-time rewriter would be to (recursively of course):

  • Turn a .Call for Control.While into a .Loop with the negated condition lambda expression’s body turned into an .If … .Break block.
  • Maintain a table that maps the LoopControl-typed ParameterExpression fed in to the second argument of Control.While onto the corresponding .Loop node from which the .Break and .Continue labels for that loop can be found.
  • Turn instance method calls to LoopControl.Break and LoopControl.Continue into .Break and .Continue statements targeting the corresponding loop’s labels based on the backwards mapping through the table being built up.
  • Make the aforementioned table scope-based, i.e. cleaning up entries upon leaving a loop body’s scope.

This is left as an exercise for the reader. In fact, the expression tree above can trivially be compiled into executable IL at runtime already in .NET 4.0:

Expression.Lambda<Action>(m).Compile()();

The latter pair of parentheses simply calls through the resulting delegate, with the following result:

image

 

Wrapping up

Time to conclude for today. We’ve seen how some basic non-local returns can be fueled by exceptions, here used as “regulars”. One could envision the classic usage of those to be optimized by compile-time rewrite mechanisms, so we’re not loosing anything, we’re only gaining stuff:

  • Breaking from and continuing loops other than the immediately enclosing one.
  • Potential for extensibility, where control flow structures simply are resolved by means of method (overload) resolution.
  • An idea of simplifying method calls taking in code blocks.
  • Unified treatment of all control flow structures.

Next time we’ll play the same tricks for other loop structures and see how dreaded some control flow structures are to translate like this. We’ll also show a more formal syntax-directed translation using a denotational semantics approach. This concludes my 11th of July (some Flemish holiday) “celebration” in a rather geeky manner.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Filed under:

Comments

# re: Bart’s Control Library – Not What You Think It Is – Part 1

Sunday, July 12, 2009 2:59 AM by Aaron Powell

I can't work out if you're insane or a genius.

I can definately say that reading this post and part 0 in a single sitting had really made my head hurt.

# Reflective Perspective - Chris Alcock &raquo; The Morning Brew #388

Pingback from  Reflective Perspective - Chris Alcock  &raquo; The Morning Brew #388

# re: Bart’s Control Library – Not What You Think It Is – Part 1

Monday, July 13, 2009 1:55 AM by bart

Hi Aaron,

I can settle for insane. For if I really am, it doesn't really matter. And if I am a genious, I should possess enough relativistic powers to downplay the importance of it.

Maybe this stuff needs a couple of reads to be fully consumed. I believe it does. When the code becomes available, I bet you'll see things much more clearly through breakpoint-driven exploration means.

Thanks,

-Bart

# Daily Links for Tuesday, July 14th, 2009

Tuesday, July 14, 2009 4:33 AM by Daily Links for Tuesday, July 14th, 2009

Pingback from  Daily Links for Tuesday, July 14th, 2009

# Bart’s Control Library – Not What You Think It Is – part 2

Tuesday, July 14, 2009 5:07 AM by B# .NET Blog

Introduction In the latest episode in this series I talked about hypothetical compile-time rewriting

# Bart’s Control Library – Not What You Think It Is – Part 1 - Bart De Smet

Tuesday, July 14, 2009 3:40 PM by DotNetShoutout

Thank you for submitting this cool story - Trackback from DotNetShoutout