Friday, April 11, 2008 2:00 AM bart

Pattern Matching in C# - Part 5

Remark: Some readers have asked me for the sources of all this magic. Since this series is based on an extraction from a bigger project and I'm composing the decoupled (and slightly simplified) pattern matcher as I'm writing these blog posts, the source isn't in a publishable condition yet. To put the crown on the work, I'll publish the sources with the last post in this series, so keep on reading to absorb the principles, capabilities and design rationale of it.

 

In this series' last post we did put some effort into matching constants which could be considered the simplest of patterns (although there were quite some caveats to work out as we saw). This time we move back to the world of objects and extend upon the notions developed in part 1. Recall we worked on supporting patterns like this:

string message = new Pattern<string>(x)
   .Match((string name) => new Person(name, 25), name => name + " is 25 years.")
   .Match((int age) => new Person("John", age), age=> "John is " + age + " years.")
   .Else(() => "Unmatched object");

Although this approach worked out pretty well, it doesn't work on today's typical objects because we required some annotation to the constructor's parameters, like this:

public Person([Property("Name")]string name, [Property("Age")]int age)
{
   Name = name; 
   Age = age;
}

However, C# 3.0 and VB 9.0 have this new syntax called object initializers:

which allows us to write things like:

new Person() { Name = "Bart", Age = 25 }

An expression-based initialization syntax became a real need with LINQ in order to express projections and indeed, LINQ query providers like LINQ to SQL inspect the expression tree associated with a projection clause to figure out which fields to retrieve from the data store, possibly mixed with other operations, nested queries etc. What I'm aiming at is simply the fact those language constructs, since it are expressions, can be expressed in expression trees:

Expression<Func<Person>> e = () => new Person() { Name = "Bart", Age = 25 };

Enter MemberInitExpression:

image

 

The anatomy of MemberInitExpression

So what makes up a MemberInitExpression? Two things:

image

A NewExpression, as we've already crawled through before, and a set of Bindings. This structure will allow us to decompose the unification of such an expression into parsing the NewExpression based on our previous unification (which supports opt-out since one can call the default constructor that doesn't require any metadata for its parameters obviously) followed by the unification of the bindings which look like this:

image

Those bindings are of type MemberAssignment and contain information about the underlying member (field, property) that gets initialized in the object initializer. The composability of expressions (and hence expression trees) yields some interesting results though. Those assignments can be quite complicated:

new Person() { Name = "Bart", Age = 25, Address.Zip = 98004, Hobbies = new { "Programming", "Reading" } };

In here, we're recursively defining members of members (which is a MemberBinding binding type of MemberAssignment) and defining a list using collection initializer syntax (which is a ListBinding binding type of MemberAssignment). To trim down complexity for the purpose of this post, we'll stick with the Assignment binding type only which allows to set properties/fields as in the original sample.

 

Intermezzo - making symbolic unification work

Another case we didn't quite handle yet explicitly before is this:

.Match((int x) => new Point() { X = x, Y = x }, x => { ... })

This gets all of the points that are on the first/third bisector of the orthogonal 2D space. Assume for a second we have this instead, using our old mapping metadata for properties (the Point type should be very predicable):

.Match((int x) => new Point(x, x), x => { ... })

For the moment our pattern matcher doesn't handle this case correctly, the reason being that a ParameterExpression is bound to one property. In the heart of our unification algorithm, we have something like this:

foreach (ParameterInfo param in ne.Constructor.GetParameters())
{
    ...

    ParameterExpression pe;

    if ((pe = arg as ParameterExpression) != null)
    {
        bindings[pe] = property;
    }
    else
       
...
}

The problematic line is indicated in bold. Since we want unification to work across different types of expressions we need to tackle this problem first. It wouldn't be unrealistic to see code like this:

.Match((int x) => new Point(x, x) { Z = x }, x => { ... })

where a NewExpression is embedded in a MemberInitExpression, so in this case we're looking for points that match the criterion p.X = p.Y = p.Z = x. What are the assignment targets? Likely properties but public fields could sneak in as well (and would in this context (!) be even nicer since they are predictable: what goes in must come out identically). Therefore, we replace the type of bindings from:

Dictionary<ParameterExpression, PropertyInfo> bindings

to

Dictionary<ParameterExpression, MemberInfo> bindings

Now unification needs to do a little more work to enforce the equality of all the members bound to the same parameter, e.g. in:

.Match((int x) => new Point(x, x), x => { ... })

parameter 'x' is bound to both properties X and Y on type Point. Currently, we don't have any matching condition: both properties are unbound, so any point will match. Instead we need to emit conditions like:

Point p;
if ((p = o as Point) != null && p.X == p.Y) { ... }

We're currently missing the part in bold. Again, we're going to a take a cumulative approach, adding stuff to the check condition variable: if a parameter already has a binding, we emit an equality check for both the old and new binding:

foreach (ParameterInfo param in ne.Constructor.GetParameters())
{
    ...

    ParameterExpression pe;

    if ((pe = arg as ParameterExpression) != null)
    {
        if (bindings.ContainsKey(pe))
       
{
            check = Expression.AndAlso(
                check,
                GetEqualityCheck(o, target, bindings[pe], property)
            );

        }
        else
            bindings[pe] = property;
    }
    else
       
...
}

The helper method to emit the equality check looks like this:

private Expression GetEqualityCheck(ParameterExpression o, Type target, MemberInfo left, MemberInfo right)
{
    return Expression.Call(
        typeof(object),
        "Equals",
        new Type[0],
        Expression.Convert(
            Expression.PropertyOrField(
                Expression.Convert(
                    o,
                    target
                ),
                left.Name
            ),
            typeof(object)
        ),
        Expression.Convert(
            Expression.PropertyOrField(
                Expression.Convert(
                    o,
                    target
                ),
                right.Name
            ),
            typeof(object)
        )
    );
}

By now we all should now matching takes two faces: unification and extraction. The extraction currently looks like this:

//
// Create list of unbound parameters.
//

Expression[] args = new Expression[Match.Parameters.Count];
int j = 0;
foreach (ParameterExpression param in Match.Parameters)
{
    //
    // Parameters need to be bound in the match expression.
    //
    PropertyInfo property;
    if (!bindings.TryGetValue(param, out property))
        throw new InvalidOperationException("Parameter " + param.Name + " was not bound in the pattern match.");

    //
    // Null for identity checks; grab from identified property otherwise.
    //
    args[j++] = (property == null ? me : Expression.Property(me, property));
}

In here Match stands for the lambda expression corresponding to the match clause. We look for all parameters, determine the property they're bound to and stuff it into an array for extraction. This time, we'll have possibly more than one binding but all will point at the same value, so we can pick an arbitrary one, i.e. in:

.Match((int x) => new Point(x, x), x => { ... })

it doesn't matter whether we extract the X property from the object or the Y property since we want them to be the same, and indeed in our unification algorithm we only stored one of the many. Therefore the code doesn't need much of a change, just patching up the type of the binding:

//
// Create list of unbound parameters.
//

Expression[] args = new Expression[Match.Parameters.Count];
int j = 0;
foreach (ParameterExpression param in Match.Parameters)
{
    //
    // Parameters need to be bound in the match expression.
    //
    MemberInfo member;
    if (!bindings.TryGetValue(param, out member))
        throw new InvalidOperationException("Parameter " + param.Name + " was not bound in the pattern match.");

    //
    // Null for identity checks; grab from identified property otherwise.
    //
    args[j++] = (member== null ? me : Expression.PropertyOrField(me, member.Name));
}

Time for a sample:

var bisect = new Pattern<bool>()
    .Match((int x) => new Point(x, x), x => true)
    .Else(() => false);

bisect.Compile();

Random rand = new Random();
for (int i = 0; i < 100; i++)
{
    Point p = new Point(rand.Next(1, 5), rand.Next(1, 5));
    if (bisect.Execute(p))
        Console.WriteLine(p);
}

It produces:

image

which is clearly less than 100 generated points and all match the condition. Congratulations - symbolic pattern matching at work! Of course things can get much more complicated, like this:

var bisect = new Pattern<bool>()
    .Match((int x) => new Point(x, 2 * x), x => true)
    .Else(() => false);

What do you think the 2 * x will look like? Exactly, yet another expression tree. How could you handle this (left to the reader as an exercise)? A few tips:

  • Make sure you understand the semantics correctly: in the sample above we're demanding that p.Y == 2 * x.
  • Think about the translation that we're doing:
    • Source: The match clause can be any function of type Func<..., object> taking in an arbitrary number of parameters. E.g. .Match((int x, int y) => new Point(x, y, x + y), (x, y) => ...)
    • Target: The compiled matcher becomes of type Func<object, bool> taking in an arbitrary object determining whether it matches or not.
  • Order of parameter use can be arbitrary, e.g.: .Match((int y) => new Point(2 * y, y), y => ...)
  • Distinguish between trivial bindings and complex bindings.
  • (Golden tip): trees can be patched...
  • (Only for true geeks): mathematic expressions can be rewritten... Think about this one: .Match((int a, int b) => new Point(a + 1, b * 2, a * b), (a, b) => ...)

 

MemberInitExpression - parse time!

So, what should our matcher support? A breakdown:

.Match((string name) => new Person() { Name = name, Age = 25 }, name => { ... })

means we want to get people with Age 25 and extract their name to feed it into the match body. Obviously, our MatchEntry::Compile method needs to be enhanced:

NewExpression ne;
ConstantExpression ce;
ParameterExpression pe;
MemberInitExpression mie;

Expression match;
if ((ne = Match.Body as NewExpression) != null)
    match = CompileNewExpression(o, target, ne, bindings);
else if ((ce = Match.Body as ConstantExpression) != null)
    match = CompileConstantExpression(o, target, ce, bindings);
else if ((pe = Match.Body as ParameterExpression) != null)
    match = CompileParameterExpression(o, target, pe, bindings);
else if ((mie = Match.Body as MemberInitExpression) != null)
    match = CompileMemberInitExpression(o, target, mie, bindings);
else
    throw new NotSupportedException("Unsupported expression type in match clause.");

The code for the MemberInit parser looks like this:

private Expression CompileMemberInitExpression(ParameterExpression o, Type target, MemberInitExpression me, Dictionary<ParameterExpression, MemberInfo> bindings)
{
    //
    // Compile embedded new expression.
    //

    Expression check = CompileNewExpression(o, target, me.NewExpression, bindings);

    //
    // Parse bindings.
    //
    foreach (MemberBinding binding in me.Bindings)
    {
        MemberAssignment ma = binding as MemberAssignment;
        if (ma == null)
            throw new NotSupportedException("Only top-level regular assignment bindings are supported.");

        check = TryBind(check, o, target, ma.Member, ma.Expression, bindings);
    }

    return check;
}

Notice we're parsing the constructor first. MemberInitExpressions always have a call to a constructor embedded in them, so we extend upon that functionality. More importantly, notice the conciseness of the loop thanks to some refactoring into a TryBind method. Make the following observation. When parsing a NewExpression we do precisely the same: for each argument we try to establish a binding or emit a check. Therefore that code can be shared between CompileNewExpression and CompileMemberInitExpression. Let's take a look at the rewritten CompileNewExpression first:

private Expression CompileNewExpression(ParameterExpression o, Type target, MemberInitExpression me, Dictionary<ParameterExpression, MemberInfo> bindings)
{
    //
    // Check the type of the object against the expected type.
    //

    Expression check = GetTypeCheck(o, target);

    //
    // Resolve parameters.
    //

    int i = 0;
    foreach (ParameterInfo param in ne.Constructor.GetParameters())
    {
        //
        // Mapping information from constructor parameters to properties required.
        //
        PropertyAttribute pa = param.GetCustomAttributes(typeof(PropertyAttribute), false).Cast<PropertyAttribute>().SingleOrDefault();
        if (pa == null)
            throw new InvalidOperationException("Input object doesn't have required mapping information.");

        //
        // Find the property.
        //
        PropertyInfo property = target.GetProperty(pa.Name);
        if (property == null)
            throw new InvalidOperationException(String.Format("Property {0} on type {1} not found.", pa.Name, target.Name));

        check = TryBind(check, o, target, property, ne.Arguments[i++], bindings);
    }

    return check;
}

This too is fairly straightforward, we just need to infer the property through the metadata on the constructor arguments first. Time to move on to the TryBind method which acts as an accumulator for bindings and an extended for match expressions:

private Expression TryBind(Expression check, ParameterExpression o, Type target, MemberInfo member, Expression arg, Dictionary<ParameterExpression, MemberInfo> bindings)
{
    ConstantExpression ce;
    ParameterExpression pe;

    //
    // Expression should either be a constant or a parameter reference.
    //
    if ((ce = arg as ConstantExpression) != null)
    {
        //
        // Check the object's property matches the constant.
        //
        check = Expression.AndAlso(
            check,
            GetEqualityCheck(o, target, member, ce)
        );
    }
    else if ((pe = arg as ParameterExpression) != null)
    {
        //
        // Keep track of the loose parameter.
        //
        if (bindings.ContainsKey(pe))
        {
            check = Expression.AndAlso(
                check,
                GetEqualityCheck(o, target, bindings[pe], member)
            );
        }
        else
            bindings[pe] = member;
    }
    else
        throw new NotSupportedException("Can only match constants.");

    return check;
}

That's it! Here's our rewritten Point example:

var bisect = new Pattern<bool>()
    .Match((int x) => new Point() { X = x, Y = x }, x => true)
    .Else(() => false);

bisect.Compile();

Random rand = new Random();
for (int i = 0; i < 100; i++)
{
    Point p = new Point(rand.Next(1, 5), rand.Next(1, 5));
    if (bisect.Execute(p))
        Console.WriteLine(p);
}

and here's the result:

image

Randomly different of course, but still up and running!

 

Next time - a little trip through the art of matching collections.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Filed under: , ,

Comments

# Pattern Matching in C# - Part 6

Monday, April 14, 2008 10:22 AM by B# .NET Blog

Monday morning: The Return of The Pattern Matcher. After an awesome weekend (well, a Saturday at least

# Adventures in F# - F# 101 Part 8 (Mutables and Reference Cells)

Tuesday, April 15, 2008 4:09 PM by Matthew Podwysocki's Blog

Time for another adventure in F#, covering some of the basics of functional programming and F# in particular

# Adventures in F# - F# 101 Part 8 (Mutables and Reference Cells)

Tuesday, April 15, 2008 4:17 PM by Community Blogs

Time for another adventure in F#, covering some of the basics of functional programming and F# in particular

# bart pe

Thursday, June 26, 2008 5:22 AM by bart pe

Pingback from  bart pe

# language tree

Thursday, June 26, 2008 7:22 AM by language tree

Pingback from  language tree