Thursday, April 10, 2008 2:00 AM bart

Pattern Matching in C# - Part 4

Last time around in this series we won the battle of performance, going all the way from an interpreter-driven pattern matcher to a fully on-the-fly compiled one, with great results almost reaching the performance of a manual implementation. However, functionality-wise our pattern matcher is rather restricted, only supporting the NewExpression. Nonetheless, there are many other interesting opportunities to introduce matches for. In this post, we'll talk only about constants. Are constants so interesting you might wonder? You bet they are :-).

 

ConstantExpression

This one was just too simple to start with; its simplicity reflects the borderline between a full-blown pattern matcher and the degenerated case of a switch expression. In other words, you can drop the switch I presented before and layer it on top of the pattern matcher. In order to do so, the pattern matcher needs to provide support to match constants obviously. Take a look back at our MatchEntry.Compile method:

internal void Compile()
{
    //
    // Only support for NewExpression.
    //

    NewExpression ne = Match.Body as NewExpression;
    if (ne == null)
        throw new NotSupportedException("Match clause can only contain NewExpression.");

Since we can expect any kind of expression as the Match lambda expression's body, we should add checks to the above to distinguish between cases. A non-atypical structure looks like this:

    NewExpression ne;
    ConstantExpression ce;

    if ((ne = Match.Body as NewExpression) != null)
    { ... }
    else if ((ce = Match.Body as ConstantExpression) != null)
    { ... }
    else
        throw new NotSupportedException("Unsupported expression detected.");

Not unsurprisingly (the last double negation for today, I promise) this pattern looks familiar - it's nothing more than a type-switch, which - as we said before - can be expressed in terms of a degenerated pattern matcher. It should be apparent that the way towards a meta-circular interpreter isn't too long from this observation on. Anyhow, back to the order of the day. Here's a typical example of what we like to achieve:

uint Fib(uint n)
{
   return new Pattern<uint>(n)
      .Match(() => 0, () => 1)
      .Match(() => 1, () => 1)
      .Else(() => Fib(n - 1) + Fib(n - 2)).Result;
}

Obviously, it's nothing more than a sample since this is likely a laureate in the "Who's written the stupidest implementation of Fibonacci?" competition. But well, it suits our goals. Notice by the way how the Else clause works by means of a closure that captures the local variable n. The implementation of a constant-based match clause unification is seemingly innocent:

private static bool TryMatchConstantExpression(object o, ConstantExpression ce, Dictionary<ParameterExpression, PropertyInfo> bindings)
{
    return Object.Equals(o, ce.Value);
}

And yes, this works but there's a little caveat with the sample above:

      .Match(() => 0, () => 1)

What's the type of the match clause? Func<int>. What's the type of the match body? Func<uint>. What do we feed into the pattern matcher? Right, a uint. So, what about comparability of System.Int32 against System.UInt32? Right, won't work. So either you make your matcher a smart as e.g. the C# compiler:

uint i = 0;
int j = 0;
bool b = i == j;

becomes

.locals init (uint32 V_0,
         int32 V_1,
         bool V_2)
...
ldc.i4.0
stloc.0
ldc.i4.0
stloc.1
ldloc.0
conv.u8
ldloc.1
conv.i8
ceq
stloc.2

with implicit conversions in place. Or, you simply patch up your pattern as follows:

uint Fib(uint n)
{
   return new Pattern<uint>(n)
      .Match(() => (uint)0, () => 1)
      .Match(() => (uint)1, () => 1)
      .Else(() => Fib(n - 1) + Fib(n - 2)).Result;
}

Question: Say with Fib(int i) you can calculate Fibonacci numbers up to i == x. Now switch to Fib(uint u), what do you expect to be the upper bound to the computability using this alternative function? Relate this to the discussion above.

The match implementation above obviously doesn't employ deferred execution using a compiled pattern. Making the switch to the compiled form as explained previously in this series should be trivial, or not? There's a little caveat (as usual) and the choice of Fibonacci as the running sample in this paragraph isn't just a coincidence. Recursion is tricky in lambda calculus, as explained bY others previouslY (notice this doesn't reflect the complexity of the platform, as some believe, but rather the complexity of the core mathematical theories - taken to the extreme - that provide the foundation of what we now call "functional programming"). It's maybe not completely apparent but in reality we've defined the pattern is terms of the place (method) where it's defined in itself. It might be an interesting exercise to trace back a few calculations of say Fib(4) and correlate this to memory usage. A way to make this callable is simply this:

static Pattern<int> _pattern = new Pattern<int>()
       .Match(() => 0, () => 1)
       .Match(() => 1, () => 1)
       .Match((int n) => n, n => Fib(n - 1) + Fib(n - 2))
       .Else(() => { throw new Exception("Unmatched value detected."); });

static int Fib(int n)
{
    return _pattern.Execute(n);
}

Don't bother about the static context, it just so happened to be like that in my collection of unit tests. Notice I've gently switched to use ints instead of uints, and I had to make a couple of changes. First of all, why doesn't it simply work anymore with a trailing Else handling the recursive case? Recall my remark on the subtle use of a closure in the original code, capturing the local variable inside the Fib method and using it inside the lambda expression of the Else case's match body. Since we're delaying execution now through an open pattern match object, there's no such thing as "n" to be captured from the outer scope. Indeed, the new value is going to be passed in to the pattern match machinery through the Execute method instead. To fix this, we need to get our hands on the value that's flowing through the pattern match, and what's better than an identity checking clause (x => x)?

       .Match((int n) => n, n => Fib(n - 1) + Fib(n - 2))

Notice however the type of the lambda expression's body representing the match clause: ParameterExpression. Actually the identity check is nothing more than a wildcard operation albeit with a type check upfront (this too can be lifted if you define a pattern over a closed domain, something we might cover in a later episode). Also notice the fact we now require an Else block to complete the match object because a match is a valued expression that should produce values for the entire input domain (something that can be lifted when working with closed domains while inferring the input domain coverage, i.e. if we'd know in the sample above the input can only be of type int, we can prove the pattern is complete with the first three matches - although it isn't defined meaningful for negative numbers).

So, how should we compile a ParameterExpression? Fairly straightforward:

private static bool TryMatchParameterExpression(object o, ParameterExpression pe, Dictionary<ParameterExpression, PropertyInfo> bindings)
{
    bindings[pe] = null;
    return pe.Type.IsAssignableFrom(o.GetType());
}

The first line is a bit tricky. As it stands now, our bindings map parameters onto properties, but the parameter itself (in this case) maps to the entire object (indeed, you could have a (Person p) => p as well), which we represent internally as a null-mapping. In addition to this, the matcher performs a type check since the object being matched won't necessarily match the clause (e.g. in the (Person p) => p sample, the pattern match object might be fed with a Giraffe object which only in rare circumstances takes human characteristics). As a matter of fact, this type check can be seen as the engine for a type switch layered on top of the pattern matcher. The change to the evaluator (or back-end compiler) is straightforward and just involves one additional null-check:

object value = property == null ? o : property.GetValue(o, null);

and

Expression me = Expression.Convert(o, target);
args[j++] = (property == null ? me : Expression.Property(me, property));

And here are finally the compile functions:

private Expression CompileParameterExpression(ParameterExpression o, Type target, ParameterExpression pe, Dictionary<ParameterExpression, PropertyInfo> bindings)
{
    bindings[pe] = null;

    return Expression.TypeIs(o, target);
}

private Expression CompileConstantExpression(ParameterExpression o, Type target, ConstantExpression ce, Dictionary<ParameterExpression, PropertyInfo> bindings)
{
    return Expression.Equal(
        Expression.Convert(o, ce.Type),
        ce
    );
}

Time for a little geek exercise. In the screenshot below I'm showing off a tempting attempt to make our beloved Fibonacci count his rabbits in a local method scope:

image

However, our even more beloved C# compiler indicates something is wrong. Can you guess what and propose a fix? Does it work as you expect? Notice the following code doesn't compile either but the fix for this issue is trivial (and similar - see Wes' blog for more info):

Func<int, int> fib = n => n > 1 ? fib(n - 1) + fib(n - 2) : 1;

Anyway, here's a complete sample at work:

image

and here are the results:

image

 

Performance

Oh well, nice eye candy but how does it perform? Yesterday's post was definitely a victory but the nice thing about it was the constant cost for each pattern match. This time, we're creating a cascade of pattern executions by means of the recursive call to Fib inside the pattern match's third body. A quick trace down shows:

Fib(4) --> Fib(3) + Fib(2) --> Fib(2) + Fib(1) + Fib(1) + Fib(0) --> Fib(1) + Fib (0) + Fib(1) + Fib(1) + Fib(0)

The number of calls is exploding. Here's a detail of the call tree in CLR Profiler outlining two calculations:

image

Notice how the second line of the first box corresponds to the last line of the second box (both 7 calls). You'll see this pattern returning since those are the recursive calls to Fib(1). To make this cascade more concrete, here's an outline of 10 iterations:

image

Ignore the first one (some heaving string concat going on there) - all the others nicely reflect the trend for subsequent iterations. Math freaks like me will do a little division of two subsequent numbers and will - not coincidently - see the following:

46  
63 63 / 46 = 1.37
96 96 / 63 = 1.52
146 146 / 96 = 1.57
229 229 / 146 = 1.58
362 362 / 229 = 1.60
578 578 / 362 = 1.60
927 927 / 578 = 1.61
1492 1492 / 927 = 1.61

And if you'd profile further, e.g. run 19 and 20 with values of 112434 and 181914 respectively, you'd come to the conclusion the ratio is nearly constant (run[20]/run[19] = 1.6179625...). This is the golden ratio which has the precise value of (SQRT(5) + 1) / 2 = 1.6180339... Whole sites have been devoted to the topic but here's the ultimate source. Fibonacci is worth a blog on its own and has many applications (including memory allocation schemas, cf. Ferguson [1976]). Now, why do we find this number? In essence, it simply reflects the ratio of calls required to calculate Fibonacci number n versus n - 1, which sole distinction is the recalculation of number n - 2. This does simply show the shortcoming of the algorithm, not really our implementation of the pattern matcher.

For example, if you do a really simplistic pattern match like this:

var even = new Pattern<bool>()
                .Match(() => 1, () => false)
                .Match(() => 2, () => true)
                .Match(() => 3, () => false)
                .Match(() => 4, () => true)
                .Else(() => false);

You'll see it performance nearly as good (or bad) at the imperative variant, with just a factor 3 or so the difference, e.g.:

var res_1 = from i in lst select even.Execute(i);
var res_2 = from i in lst select (i == 1 ? false : i == 2 ? true : i == 3 ? false : i == 4 ? true : false);

Obviously, the even check can be turned into i % 2 == 0, but the code above illustrates two almost identical implementations which makes a fair comparison. Remember our switch takes is a Boolean predicate to do matchings, if you layer that on top of this you can get much closer to a modulo operation based implementation again (however, at the moment we don't have those Match method overloads that accept additional Boolean predicates as in "When" - exercise).

 

It's functional baby!

But why so much talking nagging about constants? Let's take a step back and look where those constants appear: not only in the match but also in the calculation's result. Take a look at the left-most expansion of Fib(4):

Fib(4) --> Fib(3) + Fib(2) --> Fib(2) + Fib(1) + Fib(1) + Fib(0) --> Fib(1) + Fib (0) + Fib(1) + Fib(1) + Fib(0)

Assume the underlined expressions have been evaluated, what's next: the bold one. What's it's value? Fib(1) which we've calculated already before. See the picture? Fib is a function and if there's one thing about functions, it's that they're damn consistent over time. Give Fibonacci 10 as its input today and it will say 89. Assuming the activity of rabbits (inside joke) remains consistent over time, Fibonacci should answer 89 again tomorrow. So, what's the difference between the following two fragments?

static int OldFib(int n)
{
    if (n == 0)
        return 1;
    else if (n == 1)
        return 1;
    else
        return
OldFib(n - 1) + OldFib(n - 2);
}

and

static Pattern<int> _pattern = new Pattern<int>()
       .Match(() => 0, () => 1)
       .Match(() => 1, () => 1)
       .Match((int n) => n, n => Fib(n - 1) + Fib(n - 2))
       .Else(() => { throw new Exception("Unmatched value detected."); });

static int Fib(int n)
{
    return _pattern.Execute(n);
}

They do the same, they look (admittedly, almost, <g>) the same but - although both are equally inefficient because of their implementation nature (a simple loop would be much simpler and better - but remember, this post simply illustrates a concept, Fibonacci is our horse rabbit to bring us there) - the latter one gives us much more power than you might imagine. Functions are invariant over time and although the imperative implementation on top is a function, there's no intelligence in the runtime that can exploit those characteristics. If you ask Fib(10) today and you do that again tomorrow, it will get calculated twice. But also as soon as you ask for Fib(12), which consists of Fib(11) and Fib(10), which consists of Fib(10), Fib(9), Fib(9) and Fib(8), we'll repeat this calculation all over again. Here's a loop that calculates Fibonacci numbers 1 to 35, through OldFib and Fib respectively:

image

The algorithm is killing us and our previously assumed minor costs to call through delegates (and because of recursion calling the Execute method many times during the snowball effect) accumulate quite a bit, resulting in those bad figures. But...

 

Tip of the day: Evaluate only once

Of course it all depends on your input distribution characteristics (i.e. how data is spread and what the likelihood is to calculate the same thing multiple times) and how pure your pattern matching match bodies are (which you could enforce a bit more by making it expression trees too - why?), but by now the reader should think about buzzwords like caching.

For sake of the demo, I've implemented what's likely near to the worst cache: a simple Dictionary. Add this to Pattern<T>:

private Dictionary<object, T> _cache = new Dictionary<object, T>();

There are lots of problems with this: we don't have an aging mechanism to clear the cache, the dictionary keys hold on to objects keeping the GC from doing its job, etc. If you want to see a much better implementation of a GC-friendly dictionary, take a look at WeakHash<TKey, TValue> in Microsoft.Scripting.Utils in the DLR (ships with IronPython). Nevertheless, the Evaluate code will be changed like this:

public T Execute(object o)
{
    ...

    T res;
    if (_cache.TryGetValue(o, out res))
        return res;

    ...

and everywhere a value is calculated through the compiled match or the else clause or even through interpretation, it gets stored in the dictionary before returning it to the caller. The result is provably correct if you don't rely on your match bodies to carry out side-effects (i.e. pure functions) and so, with the caching turned on, the naive algorithm starts to perform incredibly well:

image

even a 100 times better than our reference. Given the number of iterations carried out above (35) and the golden ratio we can predict it would take about 25 seconds (36 * 1.6^8) for 43 iterations, which is doable to show off the dramatic improvement once more:

image

Prediction: accurate (given the fact I used 1.6 instead of the 1.61...). Result of our pattern match variant: hasn't moved a bit. Indeed, when we try to bump up the number of iterations performed, we see something like this:

image

Important: The test above doesn't test for correctness anymore. In fact, overflow for Fibonacci happens from input 46 on. However, the sample run still gives some idea about the number of operations carried out which is what we really care about. Also notice the test loop clears the cache every time it restarts to test with a new number of iterations. Don't even dream about 1000 iterations using the recursive algorithm (roughly 5.4e+196 seconds that would take on my computer) and obviously you'd need some LargeInteger or BigNumber implementation. Of course there's the much better linear algorithm I've been referring to quite a bit (with a few obvious flaws - do a domain check for instance):

static int TheFib(int n)
{
    checked
   
{
        if (n <= 1)
            return 1;

        int prev = 1;
        int curr = 1;

        for (int i = 2; i <= n; i++)
        {
            int t = curr;
            curr += prev;
            prev = t;
        }

        return curr;
    }
}

In the light of the overflow checking, notice the use of the checked keyword, which by the way you can use in our pattern match:

static Pattern<int> _pattern = new Pattern<int>()
       .Match(() => 0, () => 1)
       .Match(() => 1, () => 1)
       .Match((int n) => n, n => checked(Fib(n - 1) + Fib(n - 2)))
       .Else(() => { throw new Exception("Unmatched value detected."); });

I'll leave it to the reader to go and figure out whether or not the difference in performance between the imperative and recursive + cached implementation is still relevant at this point (and obviously there are pros and cons)... However, if I gave you the imperative fragment above without the method signature and asked you what it does, would you be able to state it's Fibonacci in an imperative carnival costume?

 

Enough S (K (S I I)) (S (S (K S) K) (K (S I I))) for tonight. Next time we'll match plain objects again with the MemberInitExpression.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Filed under: , ,

Comments

# re: Pattern Matching in C# - Part 4

Thursday, April 10, 2008 4:27 AM by Steve

I think I'm missing something, when will the Else be used?

-1 would match the (int n) => n and it'll just get in a infinite loop.

# re: Pattern Matching in C# - Part 4

Thursday, April 10, 2008 11:19 AM by bart

Hi Steve,

Indeed you'll get a StackOverflowException in that case with the int-based variant. For the moment in this developing story we're missing the concept or guards that can be applied to matches. When we have those, you should be able to write:

.Match((int n) => n, n => n >= 2, n => Fib(n - 1) + Fib(n - 2))

Then the else will handle the invalid input case successfully.

Thanks,

-Bart

# Reflective Perspective - Chris Alcock &raquo; The Morning Brew #71

Pingback from  Reflective Perspective - Chris Alcock  &raquo; The Morning Brew #71

# Dew Drop - April 11, 2008 | Alvin Ashcraft's Morning Dew

Pingback from  Dew Drop - April 11, 2008 | Alvin Ashcraft's Morning Dew

# Adventures in F# - F# 101 Part 8 (Mutables and Reference Cells)

Tuesday, April 15, 2008 4:09 PM by Matthew Podwysocki's Blog

Time for another adventure in F#, covering some of the basics of functional programming and F# in particular

# Pattern Matching in C# - Part 8

Wednesday, April 16, 2008 2:27 AM by B# .NET Blog

In the last handful of posts in this series we&#39;ve been looking at ways to match a whole set of patterns

# &raquo; Pattern Matching in C# - Part 4 A Match Like Memory: What The World Is Saying About A Match Like Memory

Pingback from  &raquo; Pattern Matching in C# - Part 4 A Match Like Memory: What The World Is Saying About A Match Like Memory

# &raquo; ?? Pattern Matching in C# - Part 4 A Match Like Memory: What The &#8230; A Match Like Memory: What The World Is Saying About A Match Like Memory

Pingback from  &raquo; ?? Pattern Matching in C# - Part 4 A Match Like Memory: What The &#8230; A Match Like Memory: What The World Is Saying About A Match Like Memory