Saturday, July 11, 2009 5:12 AM bart

Bart’s Control Library – Not What You Think It Is – Part 0

Wow, can’t believe how long the blog silence over here has been. Things have been quite hectic on my side the last few months, virtually running two jobs in parallel. One to cooperate in shipping our upcoming .NET 4.0 release, and one for side projects, more explorations of type systems literature and recent innovations, but most importantly a quite big writing project. More about that later, but let this opening paragraph suffice as a lame argument for keeping your RSS feed under control. What about changing that now?

 

Introduction

As stated repeatedly, the LIN part in LINQ (for Language Integrated) is a matter of syntactical sugar. Sweet sugar, but still. It’s all about translation of keywords into method calls with (mostly) lambda expression arguments. To confirm this characteristic for yourself, refresh your mind using my C# 3.0 Query Expression Translation Cheat Sheet.

Sometimes those translations can get quite complicated due to transparent identifiers, but ultimately no magic is left in the translated form. In fact, we’re compiling query expression syntax away to a subset of C# 3.0 without LINQ. And from there, a translation exists to turn the resulting C# 3.0 program into a C# 2.0 equivalent. C# 3.0 features (including LINQ) are therefore lost in translation.

image

This makes people wonder: if C# 3.0 is so expressive to accommodate for the LINQ syntax and semantics we’re used to, what else is there to be done? Notice though C# 3.0 is not strictly more powerful than C# 2.0: for every C# 3.0 program, a semantically equivalent C# 2.0 program can be formulated. Going further back in time to C# 1.0 poses troubles due to the use of generics and such. We’ll not go there.

What about C# 4.0 to C# 3.0?

The cutting edge reader will wonder whether C# 4.0 programs can be translated into a C# 3.0 equivalent form. In fact, this is not longer the case. Not because of changes to the CLR (as far as I know there are no bare essential changes to CLR that are leveraged by C# 4.0, except for no PIA maybe which leverages a mix of compiler features and enhancements to the CLR type system) but rather because of the lack of semantically equivalent features in C# 3.0 to turn the translation into. For example, optional parameters use a metadata construct and so do generic co- and contra-variance annotations.

What about dynamic? In fact just a bunch of CallSite objects with use of C#-specific binder types. Assuming you have the new assemblies containing those types available, a C# 3.0 equivalent form for the generated code can be written down. Maybe surprising that the flagship feature of C# 4.0 seems translatable to C# 3.0 while some other relatively minor features aren’t.

I’m not saying the generated code for dynamically dispatched calls is simply to write by hand. Check it for yourself. Do you prefer the next fragment …

dynamic d = "Bart";
string s = d.ToLower();
Console.WriteLine(s);

… over this one?

object d = "Bart";
var site1 = CallSite<Func<CallSite, object, string>>.Create(
                Binder.Convert(
                    typeof(string),
                    CSharpConversionKind.ImplicitConversion,
                    false
                )
            );
var site2 = CallSite<Func<CallSite, object, object>>.Create(
                Binder.InvokeMember(
                    CSharpCallFlags.None,
                    "ToLower",
                    typeof(Demo),
                    null,
                    new [] {
                        CSharpArgumentInfo.Create(CSharpArgumentInfoFlags.None, null)
                    }
                )
            );
string s = site1.Target(site1, site2.Target(site2, d));
Console.WriteLine(s);

You’ll agree the latter one is much more clear (for geeks at least) <g>.

Either way, people only start to see the possibilities of the C# 3.0 features such as lambda expressions. But sometimes they do quite naively. And that’s where this blog post series comes in. But first, have a look at the parallel extensions to the .NET Framework with its Parallel loop constructs, such as Parallel.For. Thanks to the availability of lambda expressions, calling this method becomes very much like a regular for-loop:

Parallel.For(1, 10, i => {
    // Do something with i.
});

versus

for (int i = 1; i <= 10; i++) {
    // Do something with i.
}

Okay, there are a few differences in that there’s no iteration expression and no explicit loop condition, and the type for the variable is tied to int. But syntactically the shape looks pretty much as the built-in for.

 

Eliminating “the blue”

Lots of languages have an big syntactical surface. Some more than others of course, but still. C# definitely isn’t in the minimalistic camp. This is more of a philosophical thing, but it suits the message of the blog post very well. Just look at the following code snippet and observe the number of keywords used here:

image

Keywords are indicated with blue, hence the title of this section. The question we ask ourselves in this journey is whether we can do with less “language primitives”, potentially buying us additional advantages actually. In other words, do we really need loops? What about if-statements or switches? Exception handling “primitives”? Can we demote them to syntactical sugar over patterns of more essential building blocks like method invocation and lambda expressions?

Those are all valid questions to ask. But even the simplest loops in the fragment above aren’t very straightforward to turn into such a pattern. Pause here and think about it for a while. Get your inspiration from the Parallel.For. You’ll conclude there’s something going on here that comes and bites you. I’ll leave it an open-ended question for just a while.

 

Warming up with if

If all of this sounds like fannying about, it may help to see a sample of what we’re trying to do here. Notice this is purely a brain exercise with lots of implications. Some good, some bad, some neither of the two. Don’t ask just yet “why”, focus on “how” for now.

So here it is: the if-statement. In essence a branching mechanism to execute blocks of code conditionally based on a Boolean-valued expression. Obviously this implementation has a huge benefit: branch instructions are a low-level IL primitive that yields great performance. Why would we want to trade that for something else? Bad question for now. Let’s first see whether it’s feasible. And believe me, ultimately in this series you’ll say: “not that much of a crackpot idea after all”.

Starting simple, what does an if-statement look like exactly? Different forms exist:

if (expression)
    embedded-statement

and

if (expression)
    embedded-statement
else
    embedded-statement

We want to turn both in cosmetically minimally pleasing method calls, effectively compiling away the blue words. For now, we’ll just go with static methods and want to end up with something like this:

Control.If(expression, () => {
    embedded-statement
});

and

Control.If(expression, () => {
    embedded-statement
}).Else(() => {
    embedded-statement
});

The reason the latter form leverages a fluent interface design (tagging and else-clause on by “dotting into” Else) will become clear later. And in fact, we’ll tweak the form a bit as we go forward to enable some nice extensibility capabilities (oh my), but let’s keep things relatively easy for now. How to implement the guys above?

In fact, the second form above is a bit subtle already. So, let’s think away the Else call for now and just provide two If method overloads. One that only has a positive-case embedded-statement, and another one that has an else-clause too. Obviously the blocks are represented as Action delegates though a functional variant could be thought of (see further). Here’s a sample implementation:

static class Control
{
    public static void If(bool condition, Action @if)
    {
        if (condition) @if();
    }

    public static void If(bool condition, Action @if, Action @else)
    {
        if (condition) @if(); else @else();
    }
}

which would be invoked along those lines:

Control.If(age > 18, () => {
    AllowAccess();
}, /* else */ () => {
    DenyAccess();
});

I hear the reader scream: boring. Right, but things will improve soon. To turn this into a functional if, i.e. a conditional expression really, one could add a generic parameter TResult and replace Action by … Hmm, by what really? Just Func<TResult>? Not really, since the first overload needs to produce “nothing” in case the condition did evaluate false. What we really need here is a Maybe<T> construct, similar to Nullable<T> but that works over every type. Let’s not go there in the context of this journey.

Observe how we’ve trivially pushed down the keywords into methods. Why would we do so? Because methods could really be overloaded. Static ones not directly of course in the strict sense of the word. But if we turn them into extension methods, sure enough. Assume the if-statement syntax stood for:

expression.If(() => { embedded-statement }).Else(() => { embedded-statement });

Based on the static type of expression, either an instance method would be chosen (but what for the dreaded null reference?) or an extension method would be chosen (no risk for null dereference exceptions, so there’s a workaround). Or nothing would be found of course. Here’s how it would work for the supported Boolean condition expression, again ignoring the .Else part:

static class Control
{
    public static void If(this bool condition, Action @if)
    {
        if (condition) @if();
    }

    public static void If(this bool condition, Action @if, Action @else)
    {
        if (condition) @if(); else @else();
    }
}

But that’s clearly very wasteful in case of a plain old Boolean. We’ve just made the if-statement extensible and overridable, at the cost of superior performance in the regular case. A hypothetical compiler could special-case Boolean or “party along” and keep itself honest. Here we enter meta-programming facilities. Assume for a moment a hypothetical future version of the language has statement trees and we could declare our extension methods for Boolean-based if as follows:

static class Control
{
    [CompileTimeRewrite(typeof(Hypohetical.Future.CSharp.Compiler.SecondPhase.Rewriter))]

    public static void If(this Expression<bool> condition, Expression<Action> @if)
    {
    }

    [CompileTimeRewrite(typeof(Hypohetical.Future.CSharp.Compiler.SecondPhase.Rewriter))]

    public static void If(this Expression<bool> condition, Expression<Action> @if, Expression<Action> @else)
    {
    }
}

Here I’m also fantasizing about a hypothetical Lazy<T> construct in an expression tree format, written as Expression<bool>. Essentially it’d mean the Boolean expression gets packaged up in an expression tree, just like assigning a lambda expression to an Expression<T> today (with T a delegate type) causes an expression tree to be generated.

The idea of the CompileTimeRewrite attributes above is to detour the code generation through a rewriter that takes in the MethodCallExpression (assuming such a code object model would be tightly integrated with the compiler) for the method it has been tagged on to. A Rewriter type would essentially have an entry-point that gets the whole expression tree and returns a rewritten expression tree, which is then fed in to the next phase of compilation where trees turn into IL code. In other words, the Boolean case would be rewritten at compile time and have no performance penalty as the emitted code could be the same as the original. However, we’ve gained some extensibility mechanism and managed to keep the front-end honest by introducing a rewriting concept.

We still want to make control over “keyword overloading” more granular. For the if-statement that’s a bit of overkill you may say, but we’ll see such a thing come back in subsequent posts, so let’s introduce it here. We want to bring the .Else call back so that

Control.If(age > 18, () => {
    AllowAccess();
}, /* else */ () => {
    DenyAccess();
});

turns into

Control.If(age > 18, () => {
    AllowAccess();
}).Else(() => {
    DenyAccess();
});

One keyword, one method. Just like LINQ (give or take some operators that are more fancy). Hmm, but how can we make this work? Contrast to LINQ we don’t want to delay execution of the if-statement somehow. It really needs to be eager. The age > 18 part is already evaluated because methods are called by value in C# (no by need semantics as in Haskell). Say it evaluated true, then the If method needs to make sure the Else following it doesn’t have an effect. The call will be there, but we want it to be a no-op. If the condition was false, the action fed in to If should not execute but if there is an Else method tagged on, we need to run that one’s action block.

To make this work, we can come up with a handy trick. First we use the idea of encoding states through the language grammar by means of types. To be able to call Else after calling If, the If method needs to have a return type that exposes an Else method. After an Else we don’t want users to dot into anything more, so Else should return void. So, we end up with the following for the entry-point If method that’s always required:

public static IfBlock If(this bool condition, Action @if)
{
    return IfBlock.If(condition, @if);
}

The IfBlock type will provide an Else method on its turn, that’s clear. But what does IfBlock.If have to do internally? Well, if the condition is true, it can readily execute @if but should still return an IfBlock object that is immunized for subsequent Else calls. Otherwise it should return an object that will execute an else-block if the corresponding Else method is ever called. Here’s the trick:

class IfBlock
{
    internal static IfBlock Instance = new IfBlock();

    public virtual IfBlock ElseIf(bool condition, Action elseIf)
    {
        return IfBlock.If(condition, elseIf);
    }

    public virtual void Else(Action @else)
    {
        @else();
    }

    internal static IfBlock If(bool condition, Action action)
    {
        if (condition)
        {
            action();
            return IdleIfBlock.Instance;
        }

        return IfBlock.Instance;
    }
}

class IdleIfBlock : IfBlock
{
    new internal static IdleIfBlock Instance = new IdleIfBlock();

    public override IfBlock ElseIf(bool condition, Action @else)
    {
        return this;
    }

    public override void Else(Action @else)
    {
    }
}

Start with the If factory method. If the condition is true, we execute the if-clause action straight away and return the IdleIfBlock singleton. The behavior of that object is to do nothing upon calling Else. Otherwise, if the condition was false, we return the IfBlock singleton. That one simply runs the action passed to Else upon such a call. ElseIf is a bonus one. The idea is simply: once idle, always idle. It effectively means “we already executed a branch of the multi-way if-statement”, no-op’ing all subsequent branches or conditions.

Caveat

This stuff is tricky. We’ve already introduced a big caveat by naively adding ElseIf. Can you see why? Call by value semantics are going to bite us here. Consider the following fragment:

if (person.Age < 18)
    embedded-statement
else if (person.Age > 21)
    embedded-statement 
else
    embedded-statement

If we turn this into our “Control C#” (CC#) language, we end up with:

Control.If    (person.Age < 18, () => { … })
       .ElseIf(person.Age > 21, () => { … })
       .Else  (                 () => { … });

What if person.Age has a side-effect? How many times does it occur for the fragment above and how many times for the original fragment? Right, one time for the C# fragment, two times for the CC# fragment due to call by value evaluation for the ElseIf call. How do we eliminate this problem?

First of all, observe C# doesn’t have a proper elseif (only #elif in the preprocessor language, which not really is a preprocessor in fact). Instead, the code above is equivalent to:

if (person.Age < 18)
    embedded-statement 
else
    if (person.Age > 21)
        embedded-statement 
    else
        embedded-statement

In fact we’re entering the realm of the dangling else problem here. This one here is a true believer in significant whitespace. Either way, we can avoid the problem by staying faithful to the original language’s behavior and not try to invent ElseIf here:

Control.If(person.Age < 18, () => {
   …
}).Else(() => {
   Control.If(person.Age < 21, () => {
      …
   }).Else(() => {
      …
   });
})

Alternatively we should delay the effect of the evaluation of the condition arguments: every bool argument so far becomes a Func<bool> and every evaluation of the argument becomes a delegate call. Inside the IdleIfBlock we never call through such delegates, therefore eliminating the unwanted side-effects once the control flow structure enters the idle state.

My current CC# implementation uses delayed execution to accommodate for this problem and still has the ElseIf method as a source for some brain-teasers. The transformation of the shown code above to the delayed one is trivial. No worries, by the end of this blog series you’ll get the whole source to play with.

 

Cooling down again

That was a pretty heavy warm-up exercise, wasn’t it? Fairly trivial stuff after all but lots of hidden pitfalls and some hacking required to get the desired result. It’s clear what our next challenges in this series will be: eliminate other control flow structures, preserving their full flexibility and semantics. Based on the above exercise, can it get much harder you may wonder? In fact, it does.

Recall my little mental challenge mentioned in the Eliminating “the blue” section:

Those are all valid questions to ask. But even the simplest loops in the fragment above aren’t very straightforward to turn into such a pattern. Pause here and think about it for a while. Get your inspiration from the Parallel.For. You’ll conclude there’s something going on here that comes and bites you. I’ll leave it an open-ended question for just a while.

Have a glimpse over the prime algorithm again:

int i = 0;
for (i = 2; i < 20; i++)
{
    bool cont = false;

    int j = 0;
    for (j = 2; j <= Math.Sqrt(i); j++)
    {
        if (i % j == 0)
        {
            cont = true;
            break;
        }
    }

    if (cont)
        continue;

    Console.WriteLine(i);
}

Let’s do the mechanical translation of what the code above may look like, employing similar techniques as the if-statements for the for-loops: operands become lambda expressions, keywords become calls:

int i = 0;
Control.For(() => i = 2, () => i < 20, () => i++, () =>
{
    bool cont = false;

    int j = 0;
    Control.For(() => j = 2, () => j <= Math.Sqrt(i), () => j++, () =>
    {
        Control.If(() => i % j == 0, () =>
        {
            cont = true;
            break;
        });
    });

    Control.If(() => cont, () => {
        continue;
    });

    Console.WriteLine(i);
});

Not bad. But we have a serious problem. How to translate break and continue? And what about method returns?

void Bar()
{
    if (condition)
        return;

   
// Do more.
}

Turning this into the following doesn’t yield the correct result:

void Bar()
{
    Control.If(() => condition, () => {
        return;
    });

   
// Do more.
}

Instead of returning from the outer method, we return from the lambda expression. We’re actually unlucky the code above compiles, for if the outer method would return something, the resulting return-statement would be invalid:

int Foo()
{
    Control.If(() => condition, () => {
        return 42; // Can’t return an int from an Action delegate!
    });

   
// Do more.
}

Throughout the series we’ll have a look into these problems and come up with solutions everyone is going to beat the author for on the head. But again, this is a brain exercise if nothing more.

 

Next time on CC#…

Simple loops and non-local returns. Stay tuned!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Filed under:

Comments

# re: Bart’s Control Library – Not What You Think It Is – Part 0

Saturday, July 11, 2009 6:49 PM by Cheetah

Maybe I'm nuts, but the obvious motivation for experimenting with these kinds of ideas is to make it possible to represent at least the body of any C# method as an Expression object, or perhaps a somewhat more capable object.

LINQ has already demonstrated how useful it can be to be able to express code as data, and this seems like it would take that many steps further.

Having, built into the runtime, the ability to parse code into an expression tree, turn an expression tree into IL, would be very useful.  Code analysis programs get easier, runtime generated code becomes much simpler, the possibilities seem pretty endless to me.

Having the "default" compiler use the structures seems a bit silly at first, but if you want to be able to process them at runtime, it seems like a good way to ensure that there aren't hidden bugs and differences in the way the two compilers operate to only have one compiler.

# Reflective Perspective - Chris Alcock &raquo; The Morning Brew #388

Pingback from  Reflective Perspective - Chris Alcock  &raquo; The Morning Brew #388

# re: Bart’s Control Library – Not What You Think It Is – Part 0

Monday, July 13, 2009 2:02 AM by bart

Hi Cheetah,

Definitely not nuts. And in fact, part 1 shows how .NET 4.0 has statement trees. Having a code object model is definitely a good thing and opens for all the scenarios you mention, and more. It's sort of a lucky coincidence LINQ came around to establish such notions in the framework and even as a proper language feature, and around the same time the DLR created similar concepts. In .NET 4.0 those are merged into one object model.

As a matter of fact, as revealed on PDC 2008, the managed languages teams are currently rewriting the compilers in C# itself. How this will precisely look ultimately is yet unknown, but consolidating parallel stacks for this kind of stuff is definitely an important aspect to this.

Thanks,

-Bart

# Bart’s Control Library – Not What You Think It Is – Part 0 - Bart De Smet

Monday, July 13, 2009 5:51 AM by DotNetShoutout

Thank you for submitting this cool story - Trackback from DotNetShoutout

# Daily Links for Tuesday, July 14th, 2009

Tuesday, July 14, 2009 4:33 AM by Daily Links for Tuesday, July 14th, 2009

Pingback from  Daily Links for Tuesday, July 14th, 2009

# Learning resources for C# 4.0 and .NET 4.0 new features &laquo; Bogdan Brinzarea&#8217;s blog

Pingback from  Learning resources for C# 4.0 and .NET 4.0 new features &laquo;  Bogdan Brinzarea&#8217;s blog

# Learning resources for C# 4.0 and .NET 4.0 new features | Noman Ali Blog

Pingback from  Learning resources for C# 4.0 and .NET 4.0 new features | Noman Ali Blog

# Less Language | AllGraphicsOnline.com

Monday, April 25, 2011 4:45 AM by Less Language | AllGraphicsOnline.com

Pingback from  Less Language | AllGraphicsOnline.com