Sunday, July 29, 2007 7:37 PM bart

Get familiar with expression trees using the ExpressionLearner sample

In order to help people to understand the hows of expression trees, I've created a simple application called the ExpressionLearner (download it here). It's written for Orcas Beta 2 and shows how expression trees work. A quick overview.

 

What are expression trees?

I assume you know a bit of programming with functions (I'm intentionally not talking about "functional programming" as in languages like Haskell, although we're getting closer and closer to such languages with the advent of .NET Framework 3.5). To set our mind, consider the following piece of code in C# 1.0:

delegate int BinOp(int a, int b);

class Calculator
{
   static void Main()
   {
      Console.WriteLine(DoBinOp(new BinOp(Add), 1, 2));
   }

   static int Add(int a, int b)
   {
      return a + b;
   }

   static int DoBinOp(BinOp op, int a, int b)
   {
      return op(a, b);
   }
}

I think you agree this code is pretty heavy for what it's supposed to do. In C# 2.0 things got easier with anonymous methods:

delegate int BinOp(int a, int b);

class Calculator
{
   static void Main()
   {
      Console.WriteLine(DoBinOp(delegate(int a, int b) { return a + b; }, 1, 2));
   }

   static int DoBinOp(BinOp op, int a, int b)
   {
      return op(a, b);
   }
}

But what about that ugly piece of inline code just to add two numbers? That's why we now have lambdas in C# 3.0:

delegate int BinOp(int a, int b);

class Calculator
{
   static void Main()
   {
      Console.WriteLine(DoBinOp((a,b) => a + b, 1, 2));
   }

   static int DoBinOp(BinOp op, int a, int b)
   {
      return op(a, b);
   }
}

and we could even get rid of the BinOp delegate thanks to the BCL generic Func<T1,T2,R> delegate (other "overloads" on generic type parameters exist; the T params stand for inputs, R is the - functional - output):

class Calculator
{
   static void Main()
   {
      Console.WriteLine(DoBinOp((a,b) => a + b, 1, 2));
   }

   static int DoBinOp(Func<int,int,int> op, int a, int b)
   {
      return op(a, b);
   }
}

So far, so good. But what does this have to do with expression trees? Before I can tell this, reinspect the code above. All of this C# 3.0 code gets compiled into IL code that's ready for execution by the CLR:

image

where the lambda was translated into an anonymous method, called "<Main>b__0":

image

Now, side-step to LINQ. Using LINQ, you can write queries like this:

var res = from p in products where p.UnitPrice >= 100 select p.ProductName;

In reality, this piece of code gets translated into a chain of (extension) method calls, like this:

var res = products.Where(p => p.UnitPrice >= 100).Select(p => p.ProductName);

Observe the two lambdas. But what's next? Where do Where and Select come from? It depends. If you're using LINQ to Objects, it comes from a set of extension methods on System.Linq.Enumerable (simplified code below):

static class Enumerable
{
   public static IEnumerable<T> Where<T>(this IEnumerable<T> source, Func<T,bool> predicate)
   {
      foreach (T item in source)
         if (predicate(item))
            yield return item;
   }

   public static IEnumerable<R> Select<T,R>(this IEnumerable<T> source, Func<T,R> project)
   {
      foreach (T item in source)
         yield return project(item);
   }
}

So, essentially the LINQ query above will get translated into IL code from A to Z, ready for direct execution on the target computer. This is because we've assume "products" is an IEnumerable<T>, thus the extension methods for IEnumerable<T> take effect. But what if "products" isn't an IEnumerable<T> (for example an IQueryable<T> as I'll blog about again in a future post)? Assume it's some kind of class that acts as a proxy to write queries against while the written queries are intended to be executed remotely, e.g. in a target language such as SQL or CAML or ... In such a case we can't do anything with IL code (well, übergeeks could use IL and disassemble/decompile it prior to converting it to the target query language); instead, we'd like to have the same query represented in some other intermediate format we can deal with ourselves in the way we see fit (we = the implementor of the class you write the queries against). This is where expression trees enter the stage.

Let's go back to the original calculator sample and change the code a little:

class Calculator
{
   static void Main()
   {
      Console.WriteLine(DoBinOp((a,b) => a + b, 1, 2));
   }

   static int DoBinOp(Expression<Func<int,int,int>> op, int a, int b)
   {
      return (int)op.Compile().DynamicInvoke(a, b);
   }
}

There's one core difference in here: the signature of the DoBinOp method now has an Expression<Func<int,int,int>> as its first parameter, instead of Func<int,int,int>. Although I've changed the implementation of the DoBinOp method (which I'll discuss later), you can ignore this for now. Observe however that the caller of the code doesn't see a change. However, the underlined lambda is now compiled to something completely different than regular IL instructions; instead, it's compiled to IL code that generates an in-memory expression tree representation of the original code at runtime. Why does the compiler take that decision? Because it has to assign the lambda to a variable of type Expression<Func<...>> (in this case to such a parameter). In IL, it looks like this:

image

This Main method code is equivalent to the following (self-written) code:

static void Main()
{
   ParameterExpression a = Expression.Parameter(typeof(int), "a");
   ParameterExpression b = Expression.Parameter(typeof(int), "b");
   BinaryExpression add = Expression.Add(a, b);
   Expression<Func<int,int,int>> l = Expression.Lambda<Func<int,int,int>>(add, a, b);

   Console.WriteLine(DoBinOp(l, 1, 2));
}

Thus, essentialy, you can think of expression trees are the data-representation of an AST (abstract syntax tree). In other words, an expression tree represents a piece of code as data. Not any piece of code can be transformed into an expression tree however. In contrast to code-generation mechanisms like CodeDOM, expression trees cannot represent statements (CS0834: A lambda expression with a statement body cannot be converted to an expression tree):

Func<int, int> abs = (int a) => { if (a >= 0) return a; else return -a; }; //compiles

Expression<Func<int, int>> abs = (int a) => { if (a >= 0) return a; else return -a; }; //doesn't compile (CS0834)

So, what the DoBinOp method gets from its caller is an expression tree instead of a delegate instance. One thing it can do with that expression tree (that represents a lambda expression) is to compile it into IL at that stage of the game, which is what happens in this line of code:

return (int)op.Compile().DynamicInvoke(a, b);

Instead, you could interpret the expression tree in order to execute it. Or, and that's what LINQ custom query providers do, you could translate the expression tree in some target language for further execution by another system (e.g. a DBMS).

 

The Expression Learner

The goal of the Expression Learner sample is to show translations of various lambdas to their corresponding expression trees. In the System.Linq.Expressions namespace one can find the enumeration called ExpressionType that hosts 46 values:

namespace System.Linq.Expressions
{
    // Summary:
    //     Describes the node types for the nodes of an expression tree.
    public enum ExpressionType
    {
        Add = 0,
        AddChecked = 1,
        And = 2,
        AndAlso = 3,
        ...
        TypeIs = 45,
    }
}

For each of these possible expression types, one method has been supplied that illustrates the C# 3.0 equivalent (where applicable, i.e. in almost all cases) to such an expression. At the same time, the code allows for dynamic compilation and invocation of the lambdas in order to test their functionality. The 'learner' should be of most interest to anyone who wants to use expression trees explicitly (i.e. not just for writing LINQ queries, but to parse trees, e.g. as part of a LINQ query provider implementation or as part of some kind of expression interpreter engine). Below is a screenshot of the sample in action:

image

And here's a screenshot of the dynamic invocation:

image

You might want to take a look at the Main method too, since it uses a few LINQ to Objects queries itself in order to get all of the sample methods dynamically.

Download now and enjoy!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Filed under: ,

Comments

# 9 Links Today (2007-07-30)

Monday, July 30, 2007 8:25 AM by 9 Links Today (2007-07-30)

Pingback from  9 Links Today (2007-07-30)

# .Sitecore &raquo; Running trough my bloglist

Tuesday, August 14, 2007 4:48 AM by .Sitecore » Running trough my bloglist

Pingback from  .Sitecore &raquo; Running trough my bloglist

# http://bartdesmet.net/blogs/bart/archive/2007/07/29/get-familiar-with-expression-trees-using-the-expressionlearner-sample.aspx

# C# 3.0 Feature Focus – Link Collection

Saturday, August 09, 2008 7:13 AM by B# .NET Blog

Collecting a few of my posts for easy quick reference: C# 3.0 Feature Focus - Part 1 - Local Type Inference