Wednesday, April 16, 2008 2:00 AM bart

Pattern Matching in C# - Part 8

In the last handful of posts in this series we've been looking at ways to match a whole set of patterns, including:

There's not that much left to apply (meaningful) matches for (feel free to think of others of course) so from this post on, we'll change the theme and start to focus a little more on translation of pattern matches into various executable forms. So far, we've been considering interpretation and a lightweight form of compilation by invoking the lambda expression compiler of .NET 3.5. We already saw in part 3 that the overall performance of a compiled pattern match is very reasonable given the fact we're faced with more work at runtime but once parsed and compiled, execution goes amazingly fast. So, we won't focus much on performance tuning again for now, but rather analyze the translation mechanism employed.

 

Where we're at right now

Our current form of compilation basically consists of this:

  • The Pattern<T> object aggregates pairs of match clause expression trees and match body delegates into MatchEntry objects.
  • Compilation is carried out on individual MatchEntry objects.
  • Each MatchEntry, when compiled, is turned into an efficient matcher and invoker:

    internal class MatchEntry
    {
        public LambdaExpression Match { get; set; }
        public Delegate Action { get; set; }

        public Func<object, bool> CompiledMatch { get; set; }
        public Func<object, T> CompiledInvoker { get; set; }

        internal void Compile()
        {
            ...
        }
    }

Although this works fine, we still do have some plumbing around that's looking a bit ugly. In order to match an object, we cycle through the list of MatchEntry objects and evaluate the compiled matcher. If it evaluates true, we hand off the object to the compiled invoker and return the resulting object with type T.

 

On to a more functional matcher

The interesting thing in the previous observation is a single letter: 'T'. Indeed, the pattern matcher returns an object of type T representing the result. In other words, it's a (fairly) complex function, or stated otherwise a valued expression. From this point on, it really matters what camp you're in:

  • Pattern matches as switch statements - or -
  • pattern matches as switch expressions

The latter obviously doesn't exist in C# (and many other languages of course) since switches are statements. In other words, they are used for their interesting side-effects (which might or might not set one piece of state, yielding some kind of hidden function). Switching to an expression type of switch would conceptually look like this:

int? age = switch (name)
{
    case "Bart":
        25;
        break;
    case "John":
        52;
        break;
    default:
        null;
        break;
}

It looks weird (hence the qualifier "conceptually") but here's another hypothetical form that looks more functional:

int? age = switch (name)
{
    case "Bart" => 25;
    case "John" => 52;
    default => null;
}

A similar approach could be taken for an if statement turning it into an expression:

int? age = if (name == "Bart")
    25;
else if (name == "John")
    52;
else
    null;

but if you look carefully we already have this kind of thing with the ternary operator:

int? age = (name == "Bart" ? 25 : (name == "John" ? 52 : null))

which can be nested arbitrarily to form a nice expression. Taking it back to the switch, you could see it as:

int? age =
(
    name == "Bart" ? 25 : (
    name == "John" ? 52 :
    null)
)

By the way, to make this really work in C#, you'll have to give the type inference machinery a hand like this (cf. CS0173):

int? age =
(
    name == "Bart" ? 25 :
    name == "John" ? 52 :
    (int?) null
)

(nullables strike again). More important though is the fact we can omit parentheses in the above because of the ternary operators right associativity. Now, let's generalize a bit:

T result =
(
    condition-expr-1 ? result-expr-1 :
    condition-expr-2 ? result-expr-2 :
    ...
    condition-expr-n ? result-expr-n :
    else-expr
)

Wait a minute, don't these conditions represent match clauses? Precisely. The only thing we need to change is the fact the match body needs to become an expression rather than a statement. So ultimately our match entry should comprise the following two:

    public LambdaExpression Clause { get; set; }
    public LambdaExpression Body { get; set; }

The remaining question is how we should chain those pieces together.

 

ConditionalExpression

We've already picked our target (ternary operators), but we need to know what it looks like under the covers. The simplest way to find out about it is the following C# one-liner:

Expression<Func<bool,int>> cond = b => b ? 1 : 0;

By now you should already be prepared to see (and filter out mentally) a ParameterExpression, two ConstantExpressions and one LambdaExpression for the whole thing. The body of the lambda is what interests us and geeks as we are, let's rely on ILDASM once more:

IL_0000:  nop
IL_0001:  ldtoken    [mscorlib]System.Boolean
IL_0006:  call       class [mscorlib]System.Type [mscorlib]System.Type::GetTypeFromHandle(valuetype [mscorlib]System.RuntimeTypeHandle)
IL_000b:  ldstr      "b"
IL_0010:  call       class [System.Core]System.Linq.Expressions.ParameterExpression [System.Core]System.Linq.Expressions.Expression::Parameter(class [mscorlib]System.Type,
                                                                                                                                               string)
IL_0015:  stloc.1
IL_0016:  ldloc.1
IL_0017:  ldc.i4.1
IL_0018:  box        [mscorlib]System.Int32
IL_001d:  ldtoken    [mscorlib]System.Int32
IL_0022:  call       class [mscorlib]System.Type [mscorlib]System.Type::GetTypeFromHandle(valuetype [mscorlib]System.RuntimeTypeHandle)
IL_0027:  call       class [System.Core]System.Linq.Expressions.ConstantExpression [System.Core]System.Linq.Expressions.Expression::Constant(object,
                                                                                                                                             class [mscorlib]System.Type)
IL_002c:  ldc.i4.0
IL_002d:  box        [mscorlib]System.Int32
IL_0032:  ldtoken    [mscorlib]System.Int32
IL_0037:  call       class [mscorlib]System.Type [mscorlib]System.Type::GetTypeFromHandle(valuetype [mscorlib]System.RuntimeTypeHandle)
IL_003c:  call       class [System.Core]System.Linq.Expressions.ConstantExpression [System.Core]System.Linq.Expressions.Expression::Constant(object,
                                                                                                                                             class [mscorlib]System.Type)
IL_0041:  call       class [System.Core]System.Linq.Expressions.ConditionalExpression [System.Core]System.Linq.Expressions.Expression::Condition(class [System.Core]System.Linq.Expressions.Expression,
                                                                                                                                                 class [System.Core]System.Linq.Expressions.Expression,
                                                                                                                                                 class [System.Core]System.Linq.Expressions.Expression)

IL_0046:  ldc.i4.1
IL_0047:  newarr     [System.Core]System.Linq.Expressions.ParameterExpression
IL_004c:  stloc.2
IL_004d:  ldloc.2
IL_004e:  ldc.i4.0
IL_004f:  ldloc.1
IL_0050:  stelem.ref
IL_0051:  ldloc.2
IL_0052:  call       class [System.Core]System.Linq.Expressions.Expression`1<!!0> [System.Core]System.Linq.Expressions.Expression::Lambda<class [System.Core]System.Func`2<bool,int32>>(class [System.Core]System.Linq.Expressions.Expression,
                                                                                                                                                                                        class [System.Core]System.Linq.Expressions.ParameterExpression[])
IL_0057:  stloc.0
IL_0058:  ret

The mechanism is surprisingly simple again thanks to the composable nature of the System.Linq.Expressions namespace types. Here's the C# equivalent of the above:

var b = Expression.Parameter(typeof(bool), "b");
var cond = Expression.Lambda(
    Expression.Conditional(
        b,
        Expression.Constant(1),
        Expression.Constant(0)
    ),
    b
);

The remaining thing for us to do is to map the entire matcher onto on big nested Expression.Conditional wrapped in a lambda. What does the lambda need to take in? Obviously the object being matched, so ultimately we'll be able to take the entire expression for the pattern matcher, compile it using the Compile method, and have a single delegate of type Func<object, T> that does the whole matching for us using efficient IL.

 

Revamping Pattern<T>

Our previous incarnation of Pattern<T> was rather quick and dirty in a sense that it didn't really enforce the fluent pattern at compile time. What we really want is a chain of Match calls that render a useless partial pattern which gets completed by adding an Else clause. Once that's done, the pattern can be compiled (for which different strategies can be considered, either we do it straight away - as shown below - or we require the user to trigger the compilation explicitly, e.g. at the first application of the pattern match on an object or by a separate Compile method call). So we'll start redefining our Pattern<T> class implementing the fluent interface pattern:

class Pattern<T>
{
    private List<MatchEntry> _entries = new List<MatchEntry>();

    public Pattern<T> Match<TR>(Expression<Func<TR>> clause, Expression<Func<T>> body)
    {
        return MatchInternal(clause, body);
    }

    public Pattern<T> Match<T1, TR>(Expression<Func<T1, TR>> clause, Expression<Func<T1, T>> body)
    {
        return MatchInternal(clause, body);
    }

    public Pattern<T> Match<T1, T2, TR>(Expression<Func<T1, T2, TR>> clause, Expression<Func<T1, T2, T>> body)
    {
        return MatchInternal(clause, body);
    }

    public Pattern<T> Match<T1, T2, T3, TR>(Expression<Func<T1, T2, T3, TR>> clause, Expression<Func<T1, T2, T3, T>> body)
    {
        return MatchInternal(clause, body);
    }

    private Pattern<T> MatchInternal(LambdaExpression clause, LambdaExpression body)
    {
        _entries.Add(new MatchEntry() { Clause = clause, Body = body });
        return this;
    }

    public ClosedPattern<T> Else(Expression<Func<T>> @else)
    {
        return new ClosedPattern<T>(_entries, @else);
    }
}

The MatchEntry class now looks like this:

internal class MatchEntry
{
    public LambdaExpression Clause { get; set; }
    public LambdaExpression Body { get; set; }
}

Notice how the Else method returns a ClosedPattern which represents a pattern that has all the information required to start applying the pattern to objects:

class ClosedPattern<T>
{
    private List<MatchEntry> _entries;
    private LambdaExpression _else; 

    private Func<object, T> _pattern;

    internal ClosedPattern(List<MatchEntry> entries, LambdaExpression @else)
    {
        _entries = entries;
        _else = @else;

        Compile();
    }

    private void Compile()
    {
        //
        // Compilation goes here.
        //

    }

    public T Execute(object o)
    {
        return _pattern(o);
    }
}

The most notable differences with the previous implementation are indicated in bold and outline the fact we're now fully expression-based.

 

Compilation of ClosedPattern<T>

At the core of our new pattern matcher implementation is the compilation of the closed pattern. Basically we have information representing something like:

T result =
(
    condition-expr-1 ? result-expr-1 :
    condition-expr-2 ? result-expr-2 :
    ...
    condition-expr-n ? result-expr-n :
    else-expr
)

where all the condition and result variables contain a reference to the object passed in. The way we match and map results is irrelevant for now, let's just assume we have a Func<object, bool> for each match clause and a Func<object, T> for each match body, which all live in peace inside the list of MatchEntry objects. In addition we have the lambda expression for the else clause. The way to thread things up is by starting at the bottom (in pseudo-code):

Expression pattern = else;

=>

pattern <- Expression.Condition(condition-expr-n, result-expr-n, pattern)

=>

...

=>

pattern <- Expression.Condition(condition-expr-1, result-expr-1, (..., Expression.Condition(condition-expr-n, result-expr-n, pattern)...))

Assuming we have this rewritten MatchEntry class:

internal class MatchEntry
{
    public LambdaExpression Clause { get; set; }
    public LambdaExpression Body { get; set; }

    public Expression Match { get; private set; }
    public Expression Map { get; private set; }

    public void Compile(ParameterExpression input)
    {
        //
        // Compilation goes here.
        //

    }
}

we can now write our compilation for the pattern like this:

private void Compile()
{
    ParameterExpression input = Expression.Parameter(typeof(object), "o");

    Expression pattern = _else.Body;

    foreach (var entry in Enumerable.Reverse(_entries))
    {
        entry.Compile(input);

        pattern = Expression.Condition(
            entry.Match,
            entry.Map,
            pattern
        );
    }

    _pattern = Expression.Lambda<Func<object,T>>(pattern, input).Compile();
}

This precisely captures the algorithm outlined above, building up the nested conditional expressions bottom-up. Ultimately, one pattern gets translated into on single delegate.

 

Compilation of MatchEntry objects

The remaining pieces of the puzzle are the Match and Map properties on MatchEntry:

internal class MatchEntry
{
    public LambdaExpression Clause { get; set; }
    public LambdaExpression Body { get; set; }

    public Expression Match { get; private set; }
    public Expression Map { get; private set; }

    public void Compile(ParameterExpression input)
    {
        //
        // Compilation goes here.
        //

    }
}

The Compile method needs to take Clause and Body and translate those into Match and Map respectively. The types represented by the expression objects will make things clear: Match is an expression producing a Boolean determining whether or not the input object (represented by ParameterExpression input) matches the entry; Map on the other hand takes a (matched) object and turns it into the output type T.

What's even better is we can reuse most of the original code of the unification engine. Below you can see the changes needed to our current Compile method in order to generate the new match expression (omitted a bunch of calls for specific expression types besides NewExpression):

internal void Compile(ParameterExpression input)
{
    //
    // Lambda input parameter.
    //
    ParameterExpression o = Expression.Parameter(typeof(object), "o");

    //
    // Created target type.
    //
    Type target = ClauseMatch.Body.Type;

    //
    // List of bindings.
    //

    var bindings = new Dictionary<ParameterExpression, Expression>();

    NewExpression ne;
    ...

    Expression match;
    if ((ne = ClauseMatch.Body as NewExpression) != null)
        match = CompileNewExpression(oinput, target, ne, bindings);
    ...
    else
        throw new
NotSupportedException("Unsupported expression type in match clause.");

    //
    // Store Compile match predicate.
    //
    CompiledMatch = Expression.Lambda<Func<object, bool>>(match, o).Compile();

The second part took care of the mapping part to execute the body. Again, we eliminate compilation on a per-MatchEntry basis and simply keep the expression tree around. It's just a little trickier because originally we did have a delegate called Action representing the match body. However, not this is represented as an expression tree too which represents the lambda expression that takes in the extracted values and produces the result of type T. Since the Body is simply a LambdaExpression now, we can feed it into Expression.Invoke straight away:

    //
    // Conversion from bound object to the target type.
    //

    Expression me = Expression.Convert(o input,  target);

    //
    // Create list of unbound parameters.
    //

    Expression[] args = new Expression[ClauseMatch.Parameters.Count];
    int j = 0;
    foreach (ParameterExpression param in ClauseMatch.Parameters)
    {
        //
        // Parameters need to be bound in the match expression.
        //

        Expression map;
        if (!bindings.TryGetValue(param, out map))
            throw new InvalidOperationException("Parameter " + param.Name + " was not bound in the pattern match.");

        //
        // Null for identity checks; grab from identified property otherwise.
        //

        args[j++] = map ?? me;
    }

    //
    // Store Compile invoker function.
    //

    Expression invoker Map = Expression.Invoke(BodyExpression.Constant(Action), args);
    CompiledInvoker = Expression.Lambda<Func<object, T>>(invoker, o).Compile();
}

Essentially we've been collecting trees in the MatchEntry's Compile method invocations and put them on a nice row inside the ClosedPattern's Compile method in order to create a pattern matching avenue.

 

Testing it

Let's take this sample again:

var bisect = new Pattern<bool>()
    .Match((int x) => new Point() { X = x, Y = x }, x => true)
    .Else(() => false);

Random rand = new Random();
for (int i = 0; i < 100; i++)
{
    Point p = new Point(rand.Next(1, 5), rand.Next(1, 5));
    if (bisect.Execute(p))
        Console.WriteLine(p);
}

Exercise: what's the type of bisect in the code above?

With the help of one breakpoint and the Immediate Window we can figure out what this translates into:

image

VB'ers will recognize IIF as the equivalent for the ternary operator but in a prefix form (actually, this is a lie in terms of VB - IIF is not a truly ternary operator, it's a function - VB 9.0 has a new If operator that's a real ternary operator). We just see one single ternary here, but you can see the whole thing working. First the condition:

((o Is Point) && Equals(Convert(Convert(o).X), Convert(Convert(o).Y)))

is

o is Point && ((Point)o).X == ((Point)o).Y

which represents the condition imposed by

(int x) => new Point() { X = x, Y = x }

The true part of the ConditionalExpression is:

Invoke(x => True,Convert(o).X)

which stands for

(x => true)(((Point)o).X)

where the first part represents a lambda and the second part calls it with argument "point's X value".

And finally the else got translated into False, which clearly stands for false.

Exercise: Map the following ToString output of the internal pattern expression tree representation onto the corresponding high-level pattern match (trivial one): "IIF((Convert(o) = 1), Invoke(() => False), IIF((Convert(o) = 2), Invoke(() => True), IIF((Convert(o) = 3), Invoke(() => False), IIF((Convert(o) = 4), Invoke(() => True), False))))"

You can imagine what mixed pattern matches will start to look like, don't you?

 

1 pattern match => 1 delegate

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Filed under: , ,

Comments

# Reflective Perspective - Chris Alcock &raquo; The Morning Brew #76

Pingback from  Reflective Perspective - Chris Alcock  &raquo; The Morning Brew #76

# Morning Coffee 164 &#8211; DevHawk

Sunday, April 17, 2011 7:31 AM by Morning Coffee 164 – DevHawk

Pingback from  Morning Coffee 164 &#8211; DevHawk