Saturday, November 22, 2008 1:53 PM bart

Dude, Where’s My LINQ DML?

Introduction

On last week’s TechEd EMEA Developers 2008 conference in Spain I redelivered my talk on writing custom LINQ providers, showing off implementations of LINQ to AD and LINQ to SharePoint. One of the questions I received afterwards went along the lines of this blog post’s title: “Dude, where’s my LINQ DML statement support?”. The stock answer to this kind of question points out that it’s up to individual LINQ providers to provide reasonable support for updating data source entries based on the O/whatever mapping they establish. The typical way this is done is through change tracking of entity objects that have been new’ed up by the provider upon execution of the query through LINQ.

However, there are cases where this kind of updating is unsatisfactory, especially when dealing with batch updates that don’t require client-side input or computation. For example, when dealing with interactive users, the change tracking driven solution works great, but when updating a set op records based on a certain update pattern (e.g. increase the price of each product with 5%) there’s no need to suck in all entity objects from the data source only to send them back with updates that could well be executed by the target database. Obviously it all depends on the richness of the underlying database’s DML statements, but assuming it’s flexible enough, how to deal with this in the context of LINQ?

 

Updateable LINQ to Objects

In order to set the scene, let’s investigate what it would take to make LINQ to Objects update-capable. In other words, we want to be able to execute a query and update the retrieved objects in a declarative fashion, i.e. without manual iteration over the query results. To tackle this problem, we can rely on extension methods over IEnumerable<T> in order to provide a method that can apply a certain update-action to each individual object in the sequence. Here’s a possible implementation.

public static void Update<T>(this IEnumerable<T> source, params Action<T>[] updates)
{
    if (source == null)
        throw new ArgumentNullException("source");

    if (updates == null)
        throw new ArgumentNullException("updates");

    foreach (T item in source)
    {
        foreach (Action<T> update in updates)
        {
            update(item);
        }
    }
}

This particular implementation will break the IEnumerable<T> chain as the method is void-returning, but it’s not hard to make it an iterator, yield returning the original sequence elements from the outer loop after the updates have taken place. However, using an iterator in this case might not be the best idea, as the update won’t go through till you start iterating over the resulting sequence because of the lazy nature iterators have. So, let’s stick with void. Next, you might wonder why I chose to pass in an array of update actions instead of a just a single update action. It turns out this doesn’t really matter that much for the in-memory LINQ to Objects case (and indeed, if you just specify a single update action, it will just work fine) but as we’ll see further on, having more manageable individual update actions is a good idea to remote the DML statement. However, in this particular case there’s no way to have more control over the nature of updates, as an Action<T> can be used to point at any procedure, e.g. specified using a statement lambda. Below, I’m illustrating the use of the Update method with more granular update actions:

List<Product> products = new List<Product>() {
    new Product { Name = "Chai", Price = 123m },
    new Product { Name = "Chang", Price = 234m },
};

(from p in products
 where p.Price > 100
 select p).Update(
    p => p.Price *= 0.95,
    p => p.Name = p.Name.ToUpper()
);

foreach (var p in products)
    Console.WriteLine(p.Name + " - " + p.Price);

Notice how other LINQ to Objects operators are used indirectly through query comprehension syntax, more specifically we filter out the items to be updated using a where clause first. As Update performs eager evaluation, enumerating over the source sequence straight away, the results will be visible immediately when iterating over the (now updated) source sequence after the Update invocation. But let’s take a closer look at how we’re specifying the updates above:

p => p.Price *= 0.95

Here we’re using lambda syntax but notice how the specified lambda doesn’t have a statement body. We’re simply leveraging the fact any value can be ‘converted’ to void, so the lambda above which is of type Product –> decimal can be assigned to an Action<T> where T is Product or, in functional signature syntax, a Product –> void. But there’s nothing that keeps us from writing a far more complex statement body:

p => {
   decimal oldPrice = p.Price;
   string oldName = p.Name;
   p.Price = oldPrice * 0.95;
   p.Name = oldName.ToUpper();
}

Just let your imagination work. Why is this a relevant fact? Our original set of update actions are far more “comprehensible” at runtime for introspection (well, if we turn them in something else than Action<T> which doesn’t produce just plain old IL code) because of their declarative nature.

There are still quite a few things lacking from our simplistic LINQ to Objects update support mechanism, such as transactional integrity (e.g. assume one of the update actions blows up, how to roll back previous applied updates?) but you get the overall idea.

 

Updateable remotable LINQ

Now, let’s turn the scene and focus on remotable kinds of LINQ, where we want to send off the DML statement to a remote component, e.g. a DBMS system, to execute the update statement. In order to accomplish this goal, we need a way to represent the update statement in some form that we can inspect at runtime to cross-translate it into the remote DML language, e.g. SQL. Luckily we have such a capability in the framework today, with expression trees.

But what’s in a name? Yes, expression trees. So, we don’t have the ability to represent arbitrary pieces of procedural statement-driven code like we have above with Action<T>:

Action<int> a = i => { Console.WriteLine(i); }; // works
Expression<Action<int>> e = i => { Console.WriteLine(i); } // fails to compile

On the one hand, this is a pity as it limits our expressiveness, but on the other hand it’s a good thing as analyzing arbitrary statement trees would significantly boost the complexity of the translator we’re about to write. However, there’s one significant limitation imposed by this: our update ‘actions’ cannot contain an assignment operator as this can’t be expressed by an expression tree:

Expression<…> e = p => p.Price *= 0.95;

So, even if we’d find a way to specify expression trees to build up a DML statement (see further), we can’t make assignments which obviously is a pretty basic piece of functionality when dealing with updates. So, how to work around this? What about the following? Instead of representing an update ‘action’ as one function, we could represent it as two functions: one that acts as a “update target selector” denoting which entity property to update (e.g. p => p.Price) and one that calculates the new value (e.g. p => p.Price * 0.95). Together these two carry enough information to express the intent of the update:

Expression<Func<Product, decimal>> extract = p => p.Price;
Expression<Func<Product, decimal>> update = p => p.Price * 0.95;

This is a little unfortunate and although there are alternatives (e.g. cooking up some “Updater” object that carries both extractor and updater) most of these yield syntactical clutter, so the bets of having a signature as easy as the following are off:

public static void Update<T>(this IEnumerable<T> source, params Expression<Action<T>>[] updates)

Using an Updater object, we’d end up with:

public static void Update<T, R>(this IEnumerable<T> source, params Updater<T, R>[] updates)

but calling it would be cumbersome:

Update(
    new Updater<Product, decimal>(p => p.Price, p => p.Price * 0.95),
    new Updater<Product, string>(p => p.Name, p => p.Name.ToUpper())
)

where Update has a constructor taking in:

class Updater<T, R>
{
    public Updater(Expression<Func<T,R>> extract, Expression<Func<T,R>> update)

Ideally, we’d have a way to use tuples to keep the two function expression trees together at all times, also being able to use them in a params:

public static void Update<T, R>(this IEnumerable<T> source, params Tuple<Expression<Func<T, R>> /* extract */, Expression<Func<T, R>> /* update */>[] updates)

and hence the ability to call them easily using some hypothetical call-site syntax:

Update(
    (p => p.Price, p => p.Price * 0.95),
    (p => p.Name, p => p.Name.ToUpper())
)

or even without the additional parentheses pairs (at this time I’m even not thinking about disambiguation, overload resolution, betterness rules, and all this goodness). Or, we could lift the common Expression part out of the tuples, ending up with:

public static void Update<T, R>(this IEnumerable<T> source, params Expression<Func<T, Tuple<R /* extract */, R /* update */>>>[] updates)

Internally, the Update method could now inspect the way the Tuple<R,R> is instantiated, extracting the expression trees for the extract and update parts. However, we’re not all too pleased with the calling syntax that would need to new up tuple objects, obfuscating the intent of the code.

Let’s go over our needs and observations one more time before we settle on some design:

  • We’re bound to the limitations of expression trees; hence:
    • we cannot express an individual update action as one expression (no assignment operation supported);
    • so, an update action needs to be split into an extract and update phase;
    • and, by extension, an update action can only target one value on the entity object at a time.
  • It should be possible to specify multiple update actions at a time to execute them as one unit:
    • this supports transactional updates;
    • we need a way to batch up update actions.

 

Introducing IUpdateable<T>

By now, we all know about the mysterious IQueryable<T> interface of LINQ. I’ve blogged about it several times, but here’s the key take-away: IQueryable<T> represents a query, wrapping an expression tree denoting the semantics of the intended query, and providing iteration facilities by means of its IEnumerable<T> parent. This, however, only tells half of the story, namely the “IQuery” part. The other half, the “able” part, is not visible on the interface per se, and allows users to extend the query by applying additional query operators that are supplied by means of the extension method in the Queryable class, each of which represents a query operator, like Where, Select, OrderBy, etc. This design allows the implementer of the interface to focus on the query execution part rather than the query construction part which is totally done on behalf of the implementer by means of the extension methods.

One way to think about IQueryable<T> in general is as a giant state machine. Most of the IQueryable<T> extension methods return an IQueryable<T>, so applying the method (i.e. some query operator) puts us back in the domain of a (new) IQueryable<T>. However, there are other methods like AsEnumerable that allow us to switch to another world, the one of IEnumerable<T>, where we have similar extension methods. To provide update support, we can play a similar trick:

  • Users write the base query (typically only using filtering) to extract the records that need to be updated – this happens in the IQueryable<T> domain.
  • Next, we transition out of the IQueryable<T> into the IUpdateable<T> domain to provide update support.
  • In IUpdateable<T>, updates can be batched up by calls to a Set method, specifying individual update actions, returning a new IUpdateable<T>.
  • Once all update actions have been specified, an Update method can be called to execute the batch of updates.

A few questions and remarks:

  • Why starting for IQueryable<T>? In quite some cases, users will want to filter records before applying updates. Right, so what about just having a Where method on IUpdateable<T>? Well, it turns out other operators might be useful as well, such as ordering in case the update would be used to number items based on their ordering (left to the reader as an exercise to think about ways to provide this capability in IUpdateable<T>). Similarly, things like Take could be translated into TOP clauses to apply the update only to the first number of items. And so on. In the end, providing all IQueryable<T> operators first, allowing the user to transition into IUpdateable<T> seems beneficial although some operators likely don’t make much sense (like Select, which would result in updating a projection…).
  • How to iterate over the resulting records, after the update has been applied? This kind of combined approach seems attractive too, but we won’t go there in the scope of the blog post. However, providing this feature shouldn’t be too hard, by means of an AsQueryable method, or by making IUpdateable<T> also IEnumerable<T>, where an iteration triggers execution of the update followed by iteration over the original query that got captured by AsUpdateable originally. This too is left as an exercise for the reader.

Time to present our new interface:

interface IUpdateable<T>
{
    Expression Expression { get; }
    IUpdateProvider Provider { get; }
}

It pretty much follows the design of IQueryable<T> where the Expression property captures the expression tree for the update so far, and the Provider property is used by the extension methods to gain access to a factory method to new up new IUpdateable<T> objects:

interface IUpdateProvider
{
    IUpdateable<T> CreateUpdate<T>(Expression expression);
    int Update(Expression expression);
}

In addition, the provider supplies an Update method that will take in the constructed updateable object’s expression tree, carrying out the update and returning the number of affected records. This one can be compared to the Execute method on IQueryProvider. So, how to use those two interfaces now? All of the direct uses will be provided by means of extension methods:

public static class Updateable
{
    public static IUpdateable<T> AsUpdateable<T>(this IQueryable<T> source)
    {
        IUpdateProvider updateProvider = source.Provider as IUpdateProvider;
        if (updateProvider == null)
            throw new InvalidOperationException("Update operations not supported by this query provider.");

        return updateProvider.CreateUpdate<T>(
            Expression.Call(
                null,
                ((MethodInfo)MethodBase.GetCurrentMethod()).MakeGenericMethod(typeof(T)),
                new Expression[] { source.Expression }
            )
        );
    }

    public static IUpdateable<T> Set<T, R>(this IUpdateable<T> source, Expression<Func<T, R>> extract, Expression<Func<T, R>> update)
    {
        return source.Provider.CreateUpdate<T>(
            Expression.Call(
                null,
                ((MethodInfo)MethodBase.GetCurrentMethod()).MakeGenericMethod(typeof(T), typeof(R)),
                new Expression[] { source.Expression, extract, update }
            )
        );
    }

    public static int Update<T>(this IUpdateable<T> updateable)
    {
        return updateable.Provider.Update(updateable.Expression);
    }
}

Here the first method, AsUpdateable, captures an existing IQueryable<T> and checks whether its provider (of type IQueryProvider) supports updates. This piggybacking on top of the existing provider infrastructure is a good thing in general as implementers of an update-capable provider will need to talk to the query provider in order to translate the query portion of the update statement (e.g. a WHERE clause). Being the flip side of it, this means it’s not really possible to sprinkle update support on top of an existing query provider without modifying that provider. You could imagine an additional overload to AsUpdateable that takes in an IUpdateProvider object, and though this would work you’d end up implementing some of the query operators yourself again (e.g. to support filtering). If you know the original query provider’s concrete class and you know it exposes a way to take in an expression tree representing a query, getting back the translated query, there might be a way to avoid duplication (provided the target query language is compositional in nature):

var provider = (SomeSqlQueryProvider)source.Provider;
string sql = provider.GetSqlStatement(source);

string update = “UPDATE “ + … + “ FROM (“ + sql + “)”; // pseudo-SQL

But let’s ignore this for now. Other than this, AsUpdateable is straightforward, it takes the original expression tree and wraps it in a MethodCallExpression for the current method. Just like IQueryable<T> does this, the original expression tree gets extended with information about the applied operators represented as method calls.

The Set method, representing an individual update action, is completely similar. More interesting is its signature: besides of an IUpdateable<T> object, it takes in both the extract and update functions as expression trees, using the principle we talked about earlier.

Finally, Update is just syntactical sugar on top of an IUpdateable<T> to call into the underlying provider to carry out the update, allowing for a fluent interface design. Ultimately, all of this can be used as follows:

(from p in new Table<Product>()
 where p.Price > 100
 select p).AsUpdateable()
    .Set(p => p.Price, p => p.Price * 0.95)
    .Set(p => p.Name, p => p.Name.ToUpper())
.Update();

Going out in the woods we could start dreaming about integrated syntax that looks like this:

from p in new Table<Product>()
where p.Price > 100
update p.Price *= 0.95,
       p.Name = p.Name.ToUpper();

which might automatically become eager, i.e. the Update call is implicit, or stays lazy (which would mean one can reuse the query/update comprehension expression). VB syntax could be very similar, optionally using the With keyword (Update p With …).

Enough dreaming for now, this whole expression (more specifically everything before the .Update() part) translates into an expression tree, constructed at runtime with the aid of the Queryable query operators and Updateable update operators, that can be represented as follows using ToString notation:

expression = {value(Table`1[Product]).Where(p => (p.Price > 100)).AsUpdateable().Set(p => p.Price, p => p.Price * 0.95).Set(p => p.Name, p => p.Name.ToUpper())}

Now the update provider can go ahead and cross-translate this expression tree into a statement the target data source understands, e.g. a SQL DML statement for UPDATE (I’m not a SQL language expert, so treat the following as conceptual code):

UPDATE Products WHERE Price > 100 SET Price *= 0.95, Name = TOUPPER(Name)

I’ll leave a concrete implementation of an update provider to the inspired reader; a basic prototype for SQL (only allowing columns to be updated with constant string or integer values) worked like a charm. In addition, the reader should feel free to think about how it would be possible (if desirable at all, something to think about as well) to make the “LINQ to Objects with update support” case similar to the remotable case, in terms of used operators (i.e. our current LINQ to Object implementation in the first paragraph doesn’t use Set method calls and isn’t lazy at all).

Enjoy!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Filed under: ,

Comments

# Arjan`s World &raquo; LINKBLOG for November 22, 2008

Saturday, November 22, 2008 2:43 PM by Arjan`s World » LINKBLOG for November 22, 2008

Pingback from  Arjan`s World    &raquo; LINKBLOG for November 22, 2008

# Dew Drop &ndash; Weekend Edition &ndash; November 22-23, 2008 | Alvin Ashcraft's Morning Dew

Pingback from  Dew Drop &ndash; Weekend Edition &ndash; November 22-23, 2008 | Alvin Ashcraft's Morning Dew

# Updateable LINQ

Monday, November 24, 2008 2:21 AM by Marco Russo

Bart de Smet wrote a very interesting post about the possible creation of an Updateable LINQ provider,

# Updateable LINQ

Monday, November 24, 2008 2:21 AM by SQLBI - Marco Russo

Bart de Smet wrote a very interesting post about the possible creation of an Updateable LINQ provider,

# re: Dude, Where’s My LINQ DML?

Monday, November 24, 2008 9:16 AM by Jim Wooley

While I like the concept and have heard the same criticisms for some time, the problem with "IUpdatable" is there is already a conflicting implementation in Data Services ("Astoria"). Besides, since when does LINQ really need a real interface to acomplish it's magic. It would just need the extensions to be available and recognized by the compiler as you pointed out. The same could be said for batch deletes and potentially batch inserts.

# re: Dude, Where’s My LINQ DML?

Monday, November 24, 2008 10:13 AM by bart

Hi Jim,

You're definitely right about the "conflicting" implementation in Data Services. After all this is just a thought-exerpiment rather than a full-fledged design.

Concerning the use of an interface different approaches can be taken. Starting from the front-end language patterns, an update would need to look like a fluent chain of calls that include everything IQueryable-ish (or a limited set as explained in my post since some operators like Select don't make much sense in the context of an update) followed by a series of Set calls, topped off by a "commit" call depending on whether the implementation is supposed to be lazy (which would be good for consistency with every other pattern starting with "from", although there are already precedents when talking about First, Single, etc) or eager. Below is an in-the-middle approach:

(from x in y where x.z update x.y *= 2).Commit();

However, just partying on IQueryable is a little problematic as it doesn't make sense to transition out of the chain of Set calls, back into the query operators (or does it? this become a philosophical question), hence my preference to delimit the world of updates by means of another interface which is merely there to bring order in the otherwise unordered soup of operators.

So, in the end, for the language pattern the answer is definitely: you don't need any interface whatsoever, it should "just work" with any method it can find just like LINQ does now. However, on the side of providers the interface approach can help to yield more order and structure.

Thanks,

-Bart

# re: Dude, Where’s My LINQ DML?

Tuesday, November 25, 2008 2:57 AM by Kristofer

Hi Bart,

I wrote a L2S-based implementation of something similar to this a while back - take a look at this:

blog.huagati.com/.../architecture-linq-to-sql-and-set-based-operations-update-statements

# Designing an airline passenger reservation system &raquo; Architecture - Linq-to-SQL and set-based operations: Delete statements

Pingback from  Designing an airline passenger reservation system &raquo; Architecture - Linq-to-SQL and set-based operations: Delete statements

# LINQ: .NET Language-Integrated Query | Code Source

Monday, November 28, 2011 6:11 PM by LINQ: .NET Language-Integrated Query | Code Source

Pingback from  LINQ: .NET Language-Integrated Query | Code Source

# LINQ: .NET Language-Integrated Query | Code Source

Monday, November 28, 2011 6:11 PM by LINQ: .NET Language-Integrated Query | Code Source

Pingback from  LINQ: .NET Language-Integrated Query | Code Source