Friday, August 22, 2008 8:28 AM bart

Curry for dummies (Cont’d) – A happy ending

I didn’t intend to make this a series of posts but that’s the way things go. Based on feedback from readers and additional questions raised, one decides that add that little bit more to it and ultimate you’re writing a sequel. Where have we been on this journey? First, we took a look at the abstract concept of currying in Curry for dummies. Next, evaluation strategies were outlined in Curry for dummies cont'd - A note on purity, outlining that formal substitution relies on assumptions about the expression involved, i.e. they better are side-effect free to be able to employ call-by-name semantics as opposed to – the much embraced in the imperative world – call-by-value semantics. To challenge the readers, I wrote this:

Readers looking for a challenge are invited to think about how it would be possible to adapt the original (substitution-driven) Curry for dummies code to work with call-by-need semantics; in other words, how can a lambda expression be applied (the number of arguments provided is not really relevant in this discussion) with arguments that are lazily (one time at most) evaluated? I'll post the answer some time soon.

What I’m suggesting here is to implement call-by-need semantics for applying (in formal terms, using beta-reduction) arguments to functions represented as lambdas. To help with the basic plumbing, I provided the Cell<T> class that looks like this:

class Cell<T>
{
    private bool _assigned;
    private Func<T> _func;
    private T _value;

    public Cell(Func<T> func)
    {
        _func = func;
    }

    public T Value
    {
        get
        {
            if (!_assigned)
            {
                _value = _func();
                _assigned = true;
            }

            return _value;
        }
    }
}

In essence, this class allows us to evaluate a function upon the first call to retrieve its value. For example,

var val = new Cell<long>(() => DateTime.Now.Ticks);
Console.WriteLine(DateTime.Now.Ticks + " " + val.Value);
Thread.Sleep(1000);
Console.WriteLine(DateTime.Now.Ticks + " " + val.Value);

To convince you it really doesn’t change its mind after the initial evaluation, look at the following output:

image

The orange is the capture value and is stable. Obviously it’s later than the green one (why? ironically because of call-by-value semantics to String.Concat in the first Console.WriteLine) and earlier than the red one one second later. Anyway, this is our “smart function evaluator” which obviously relies on the fact that the function it captures doesn’t change its mind. The sample above violates this rule but it provides a way to show you it really works.

Next, let’s take a look at the signature for function application:

static LambdaExpression CallBy____(LambdaExpression func, params Expression[] parameters)

We want to create three such functions:

  • CallByValue
  • CallByName
  • CallByNeed

all of which will have the same signature. The way the functions work is as follows: given a lambda expression and a set of parameters, we’ll return another lambda expression that calls the original function, possibly having some remaining parameters. A sample:

var a = Expression.Parameter(typeof(int), "a");
var b = Expression.Parameter(typeof(int), "b");

var add = Expression.Lambda(Expression.Add(a, b), a, b); // represents (int a, int b) => a + b
var addOne = CallByValue(add, Expression.Constant(1));   // represents (int b) => 1 + b
var three = CallByValue(addOne, Expression.Constant(2)); // represents () => 3

If we’d call through the last function, three, without any parameters, we’d get 3. For call-by-value semantics, it would look like:

() => Invoke(b => Invoke((a, b) => (a + b),1,b),2)

but that will become clearer as we move forward. Time to switch gears and start implementing all of the strategies.

 

Call-by-value

This is the simplest one. We won’t change the function itself (so, no forms of substitution are carried out in the lambda body), rather we’ll call it with the (evaluated) supplied arguments, keeping remaining arguments. Here’s what it looks like:

static LambdaExpression CallByValue(LambdaExpression func, params Expression[] parameters)
{
    if (parameters.Length > func.Parameters.Count)
        throw new InvalidOperationException("Too many parameters specified.");

    ParameterExpression[] lambdaParams = func.Parameters.Skip(parameters.Length).ToArray();
    return Expression.Lambda(Expression.Invoke(func, parameters.Concat(lambdaParams)), lambdaParams);
}

Ignoring the validation, we create a new lambda expression with the remaining parameters, and a body that calls the original lambda with the supplied arguments. A sample:

var a = Expression.Parameter(typeof(int), "a");
var b = Expression.Parameter(typeof(int), "b");

var add = Expression.Lambda(Expression.Add(a, b), a, b); // represents (int a, int b) => a + b
var addOne = CallByValue(add, Expression.Constant(1));   // represents (int b) => 1 + b

The CallByValue call above does the following:

ParameterExpression[] lambdaParams = func.Parameters.Skip(parameters.Length).ToArray();

becomes { a, b }.Skip(1).ToArray() = { b }. So, the lambdaParams variable contains the remaining parameters. Next, it returns a new lambda expression that does the following:

Expression.Invoke(func, parameters.Concat(lambdaParams)

which is semantically equivalent to func(1, b). To clarify this, the parameters are 1 (which is passed in as Expression.Constant(1)) and b, the remaining parameter(s). More precisely, we keep the number of arguments to call the original lambda the same, but we’ve concretized one of them (i.e. a := 1). This lambda body is now wrapped in a new lambda that only has b as a parameter. In the end it looks like (pseudo-code):

b => ((a, b) => a + b)(1, b)

where the outer b is different from the inner b (they have different scope) and the binding of a with the constant 1 is pointed out. The ToString result of the produced lambda is:

b => Invoke((a, b) => (a + b),1,b)

What are the characteristics of this approach?

  • Arguments are evaluated before calling the function, which means they are evaluated exactly once.
  • As an implication of the previous, arguments that are not used in the body of the function yield redundant evaluations.
  • An argument that evaluates to the “denotational bottom” (e.g. throws an exception or doesn’t terminate) prevents the call from being made (bail out early, but maybe – see previous point – for no real reason).
  • On the positive side, the technique is easy to understand and compatible with lots of languages (e.g. for interop reasons).

 

Call-by-name

Slightly more complicated but still easy to understand is call-by-value. The idea here is to substitute occurrences of the parameters in the function’s body to create a new function. To put it another way, we don’t wrap the original function but clone it and rewrite it as a new function with fewer arguments. For the same sample:

var a = Expression.Parameter(typeof(int), "a");
var b = Expression.Parameter(typeof(int), "b");

var add = Expression.Lambda(Expression.Add(a, b), a, b); // represents (int a, int b) => a + b
var addOne = CallByName(add, Expression.Constant(1));   // represents (int b) => 1 + b

the comments on the right actually point out literally what’s happening. Indeed, a ToString of the resulting lambda looks like:

b => 1 + b

It’s important to realize that the ‘1’ here could be an arbitrary complex expression that’s put in place wherever ‘a’ was found (because of the binding to the parameter). So, if you’d have the following:

var a = Expression.Parameter(typeof(int), "a");
var twice = Expression.Lambda(Expression.Add(a, a), a); // silly way to double things: (int a) => a + a
var six = CallByName(twice, Expression.Add(Expression.Constant(1), Expression.Constant(2))); // calling twice with argument (1 + 2)

we’d get, by formal substitution of a in (int a) => a + a,

() => (1 + 2) + (1 + 2)

while call by value would evaluate the (1 + 2) first, resulting in:

() => 3 + 3

How do we implement it? The original Curry for dummies post introduces the use of the expression visitor which we inherit from to create an expression tree cloner that replaces parameter occurrences by a given expression like this:

class CurryVisitor : ExpressionVisitor
{
    private Dictionary<ParameterExpression, Expression> _parameters;

    public CurryVisitor(Dictionary<ParameterExpression, Expression> parameters)
    {
        _parameters = parameters;
    }

    protected override Expression VisitParameter(ParameterExpression p)
    {
        if (_parameters.ContainsKey(p))
            return _parameters[p];

        return base.VisitParameter(p);
    }

    public Expression Clone(Expression exp)
    {
        return base.Visit(exp);
    }
}

Using this tool – that we’ll reuse in the call-by-need implementation as well – we can write call-by-name like this:

static LambdaExpression CallByName(LambdaExpression func, params Expression[] parameters)
{
    if (parameters.Length > func.Parameters.Count)
        throw new InvalidOperationException("Too many parameters specified.");

    var assignments = new Dictionary<ParameterExpression, Expression>();

    for (int i = 0; i < parameters.Length; i++)
    {
        ParameterExpression parameter = func.Parameters[i];
        Expression value = parameters[i];

        if (!parameter.Type.IsAssignableFrom(value.Type))
            throw new InvalidOperationException("Incompatible parameter value type for parameter " + parameter.Name);

        assignments[parameter] = value;
    }

    var visitor = new CurryVisitor(assignments);
    return Expression.Lambda(visitor.Clone(func.Body), func.Parameters.Skip(parameters.Length).ToArray());
}

The most important thing here is that we go through all of the specified parameters (in the sample above that’s the expression (1 + 2)), hand-in-hand with the parameter expressions that correspond to it, adding them to a dictionary. For our sample, the dictionary will contain

a –> (1 + 2)

which is then fed in to the CurryVisitor to clone the original lambda’s body (i.e. a + a) while replacing the dictionary key occurrences (a) by the mapped substitution (1 + 2). The new body will be (1 + 2) + (1 + 2), which is hooked up as the body for the new lambda expression with the remaining arguments (in this case none at all) in precisely the same way as in call-by-value.

What are the characteristics of this approach?

  • Arguments are not evaluated when they are not needed in the function body for the particular invocation.
  • On the sunny side, too, fatal side effects (denotational bottom) have no effects for unused arguments.
  • But … arguments may be evaluated multiple times, which means side-effects would be multiplied.
  • Also, in order to do the substitution we need to gain access to the function itself, which would be a runtime (in the two senses of the word) task.

 

Call-by-need

As mentioned in Curry for dummies cont'd - A note on purity, call-by-need is an improvement on top of call-by-name. It follows the same principles of substitution but it employ memoization, a caching technique to remember the evaluation result for an expression, to avoid reevaluation expressions. In other words, it eliminates common subexpressions that are introduced by formal parameter substitution. To continue with the same sample, consider:

var a = Expression.Parameter(typeof(int), "a");
var twice = Expression.Lambda(Expression.Add(a, a), a); // silly way to double things
var six = CallByNeed(twice, Expression.Add(Expression.Constant(1), Expression.Constant(2))); // calling twice with argument (1 + 2)

What happens now though is something I’m going to denote using some special syntax. When evaluating twice, it’s rewritten as:

() => [| () => 1 + 2 |] + [| () => 1 + 2 |]

The cool thing about denotational semantics is you can invent your own symbols :-). Here I’m introducing what I’d call pillar syntax: i.e. [| … |]. The body of it denotes a parameter-less function using lambda syntax. In practical terms this means we have a delay-loaded or lazy-loaded value: it’s only calculated on-demand. Now by means of common subexpression elimination, the compiler can rewrite this as follows (relying on the fact that the function would be pure, i.e. there are no side-effects that call for the need of doing the evaluation every time again):

x = [| () => 1 + 2 |]
() => x + x

The semantics are as follows: when declared ("stage 0” if you want), the function is not evaluated, it’s simply cached. Next, when evaluated for the first time (“stage 1”), the function is called and the value is kept. I’m not building a formal specification here (I’m defining the “valuation function” in words after all), but realize that I’m introducing some state mutation here (let’s not get into monads and beasts like unsafePerformIO this time). On subsequent evaluations, the function isn’t called but the cached value is returned straight away (“stage > 1”). So, applying our function above becomes:

six()
= let x = [| () => 1 + 2 |] in x + x
= [| () => 1 + 2 |] + [| () => 1 + 2 |]
= [| () => 3 |] + [| () => 1 + 2 |]
= [| 3 |] + [| 3 |] 'here mutation happens; x is a “reference” variable
= 6

This is precisely what our Cell<T> does. How can we make this work though? What we need is an “environment” in which we keep all the constructed cells. Such an environment establishes bindings between parameters and the corresponding cell that captures the bound expression lifted in a function. This needs some clarification. When calling CallByNeed in the sample above, the argument (1 + 2) needs to be wrapped in a parameter-less function to make it “on-demand” evaluated. The parameter to call “twice” with is:

Expression.Add(Expression.Constant(1), Expression.Constant(2)) // 1 + 2

and needs to become

Expression.Lambda(Expression.Add(Expression.Constant(1), Expression.Constant(2))) // () => 1 + 2

Next, this lambda needs to be wrapped in a Cell<T>. T is inferred from the type of the parameter, i.e. int, and the constructor of Cell<T> takes in a Func<T>. We can turn a LambdaExpression into a delegate by calling Compile on it. We do this for all the parameters (in the same we only have one) and keep the resulting cells in a dictionary mapping the name of the parameter (a string, in this running sample “a”) onto the corresponding cell. This piece of voodoo plumbing ensures that we have just one reference to a cell per parameter and that’s the whole point of the memoization: we don’t want to reevaluate parameters when they occur in the function body multiple times. To put this in terms of the sample:

var environment = new Dictionary<string, object>();

environment["a"] = new Cell<int>(() => 1 + 2);

One more question to answer now is how to carry out formal substitution? More specially, what should we replace ParameterExpression occurrences with during the “tree visitation” by the CurryVisitor? The answer isn’t too hard either: a lookup into the environment dictionary, returning a Cell<T>, on which we can call Value causing either evaluation (“stage 1”) or return of the memoized value (“stage > 1”):

((Cell<int>)environment["a"]).Value

In other words, our call to twice returns an expression tree like this:

() => ((Cell<int>)environment["a"]).Value + ((Cell<int>)environment["a"]).Value

You’ll wonder where environment is kept – the answer is in some kind of “closure” that’s created when the lambda expression is compiled, but you can consider this to be an irrelevant detail at this point (if you’re really curious, invoke the Method property on the generated delegate and observe the passed-in ExecutionScope object).

Time to dive in the code:

static LambdaExpression CallByNeed(LambdaExpression func, params Expression[] parameters)
{
    if (parameters.Length > func.Parameters.Count)
        throw new InvalidOperationException("Too many parameters specified.");

    var environment = new Dictionary<string, object>();
    var assignments = new Dictionary<ParameterExpression, Expression>();

    MethodInfo envItem = typeof(Dictionary<string, object>).GetMethod("get_Item");

    for (int i = 0; i < parameters.Length; i++)
    {
        ParameterExpression parameter = func.Parameters[i];
        Expression value = parameters[i];

        if (!parameter.Type.IsAssignableFrom(value.Type))
            throw new InvalidOperationException("Incompatible parameter value type for parameter " + parameter.Name);

        LambdaExpression lazyValue = Expression.Lambda(value);
        object cell = typeof(Cell<>).MakeGenericType(value.Type).GetConstructors()[0].Invoke(new object[] { lazyValue.Compile() });

        environment[parameter.Name] = cell;
        assignments[parameter] = Expression.Property(Expression.Convert(Expression.Call(Expression.Constant(environment), envItem, Expression.Constant(parameter.Name)), typeof(Cell<>).MakeGenericType(value.Type)), "Value");
    }

    var visitor = new CurryVisitor(assignments);
    return Expression.Lambda(visitor.Clone(func.Body), func.Parameters.Skip(parameters.Length).ToArray());
}

This is roughly the same as the CallByName implementation (indeed, some refactoring could be done), but the most notable differences (indicated in bold) are:

  • Besides assignments, we’re building up an environment which consists of mappings from parameter.Name strings onto cell objects capturing the lambda built around the parameter expression.
  • The assignments themselves are property .Value calls on the result of retrieving the cell from the environment through the get_Item dictionary indexer, passing in the parameter.Name as the key, and casting the result to Cell<T>.

The substitution using the CurryVisitor is the same as before, only the mappings of ParameterExpressions onto the replacement values are different.

What are the characteristics of this approach?

  • Same as call-by-name but:
    • Arguments are evaluated once at most, which brings it closer to call-by-value compared to call-by-name.
    • However, formal substitution is still the driving mechanism here.
  • Also notice we need access to the implementation of the function to do the substitution. This access is not shallow – if a lambda calls into another function, the laziness of arguments needs to be propagated all the way through. When you hit the border where a call into the non-pure lazy world is needed (e.g. interop code), the landscape changes into call-by-value with eager evaluation of the arguments at the call-site. However, laziness is still beneficial because the code in between the function entry point and the interop code much lower in the stack (including OS function calls that are typically call-by-value) might be separated by a bunch of code paths that do not necessarily need to carry on all of the arguments of the original function. In other words, a (not necessarily proper) subset of the input arguments has the potential of getting discarded without any need for evaluation while others will be “degraded” to forced eagerness.

 

The final test

To test this stuff, I wrote this little test framework:

static Dictionary<string, Func<LambdaExpression, Expression[], LambdaExpression>> s_tests
= new Dictionary<string, Func<LambdaExpression, Expression[], LambdaExpression>> { { "By Value", CallByValue }, { "By Name", CallByName }, { "By Need", CallByNeed } }; static void Test(string testName, LambdaExpression func, params Expression[] parameters) { Test(testName, null, func, parameters); } static void Test(string testName, Action cleanUp, LambdaExpression func, params Expression[] parameters) { Console.WriteLine(testName); Console.WriteLine(new string('-', testName.Length)); foreach (var test in s_tests.Keys) { LambdaExpression res = s_tests[test](func, parameters); try { // this assumes all parameters have been supplied Console.Write(test + " -> " + res + " = ");
Console.WriteLine(res.Compile().DynamicInvoke()); } catch (Exception ex) { Console.WriteLine(ex.InnerException.Message); } if (cleanUp != null) cleanUp(); } Console.WriteLine(); }

As we’re in a functional mood, the use of a dictionary with delegates shouldn’t be too surprising. In essence, s_tests contains mappings of friendly names to delegate instances pointing at each of the CallBy* methods (which all have the same signature). Notice the use of C# 3.0 collection initializers to build up this dictionary. The test execution harness is simple: it iterates over all of the directory entries, extracts the test and calls it with the given lambda expression and the given parameters (assuming all parameters are “eaten” so that we can call DynamicInvoke without passing in any remaining arguments). There’s also support to clean up shared mutable state which comes in handy to compare results.

Now the tests itself:

static void Main()
{
    var a = Expression.Parameter(typeof(int), "a");
    var b = Expression.Parameter(typeof(int), "b");
    var add = Expression.Lambda(Expression.Add(a, b), a, b); // represents (int a, int b) => a + b
    var one = Expression.Constant(1);
    var two = Expression.Constant(2);
    Test("Add test", add, one, two);

    var bad = Expression.Divide(one, Expression.Constant(0));
    var twice = Expression.Lambda(Expression.Add(a, a), a, b); //doesn't use b
    Test("Eagerness test", twice, one, bad);

    var sideEffect = Expression.Call(typeof(Program).GetMethod("ChangingMind", BindingFlags.NonPublic | BindingFlags.Static), null);
    Test("Laziness test", () => { s_i = 0; }, twice, sideEffect, one /* not used */);
}

static int s_i = 0;

static int ChangingMind()
{
    return s_i++;
}

The first test deals with a pure function – we expect all strategies to come up with the same answer. The second one passes in an unused expression that results in a divide by zero. Call-by-value will crash while the others won’t. Finally, the side-effecting test calls out to a function that has mutating state (which we clean up after every iteration of the test bench since we want to see the effects on a per-strategy basis). This is a bit more subtle. Call-by-value will just call ChangingMind once, before calling the twice function. Call-by-name will duplicate the ChangingMind calls, while call-by-need will still do in-place substitution but because of the memoization with Cell<T> the ChangingMind function is only called once when the parameter is really needed. In fact, it would be a good exercise (left to the reader) to combine the last two tests. Here’s the output:

image

The results are as expected. Mission accomplished. You can download the code here.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Filed under:

Comments

# Dew Drop - August 23, 2008 | Alvin Ashcraft's Morning Dew

Saturday, August 23, 2008 7:17 AM by Dew Drop - August 23, 2008 | Alvin Ashcraft's Morning Dew

Pingback from  Dew Drop - August 23, 2008 | Alvin Ashcraft's Morning Dew

# WMOC#17 - PhotoSynth, Reflector move forward - Service Endpoint

Pingback from  WMOC#17 - PhotoSynth, Reflector move forward - Service Endpoint

# Recent Faves Tagged With "monads" : MyNetFaves

Saturday, October 04, 2008 1:36 AM by Recent Faves Tagged With "monads" : MyNetFaves

Pingback from  Recent Faves Tagged With "monads" : MyNetFaves

# Recent Links Tagged With "sequel" - JabberTags

Sunday, January 11, 2009 8:34 AM by Recent Links Tagged With "sequel" - JabberTags

Pingback from  Recent Links Tagged With "sequel" - JabberTags

# Recent Links Tagged With "monads" - JabberTags

Thursday, April 30, 2009 6:34 PM by Recent Links Tagged With "monads" - JabberTags

Pingback from  Recent Links Tagged With "monads" - JabberTags