Monday, June 23, 2008 12:41 AM bart

LINQ to MSI - Part 2 - Queryable without an I

In the last couple of posts in the series, we've been establishing the interop baseline ready to be used by the query provider layer on top, which we're about to build now. Since the goal of this blog series is to live an IQueryable-free live, for reasons explained earlier including demo-simplification goals, this post will introduce a series of "fluent classes" that provide a limited query pattern.

 

The query provider

At the entry-point for our API, we provide a query provider class. Its sole role (at least for now) is to provide access to the underlying MSI database through an MsiConnection object introduced previously. If one wants to support update tracking on the objects retrieved through the provider, the query provider context is the best place to store this tracking information since it can keep a global view on the different tables and their underlying relationships. Since MSI has some notion of relationships between tables, such an implementation could be a good idea but obviously one can avoid all sorts of update tracking whatsoever by directly updating records in the database; after all we don't have a huge price to pay for the "connection" since the MSI database is a local file (i.e. no network traffic cost etc). However, if the end-user expects transactional semantics when updating multiple records as a a way to ensure database consistency, the global view of a context object is a plus as well (although one has to put the provided consistency mechanisms of the underlying database in the equation, see MSDN on the Windows Installer Database to get an idea about the capabilities of the MSI database engine).

To summarize this discussion, we'll provide a query provider object as the entry-point, from which we'll derive to provide access to the various tables. Users of the API can derive from the context themselves to provide access to other custom tables as well. Here's how it looks:

public abstract class MsiQueryProvider : IDisposable
{
     private MsiConnection _conn;

     protected MsiQueryProvider(string file)
     {
          _conn = new MsiConnection(file);
     }

     public void Dispose()
     {
          if (_conn != null)
          {
               _conn.Dispose();
               _conn = null;
          }
     }
}

 

Representing tables

Next we have to talk about tables. MSI tables are fairly simple; they just have a name and there's not much more interesting metadata (apart from the fields they hold of course) associated to it. One caveat though is the fact the names are singular (like Property) so it might be tempting to introduce some metadata through custom properties to create a mapping. That would work, but to simplify manners we'll just have the name of the entity class stand for the table's name. Where does the entity class come into play? A generic parameter does the trick:

public class MsiTable<T> : IEnumerable<T>
{
    ...
}

Given the following entity class:

public class Property
{
     [MsiColumn("Property")]
     public string Name { get; set; }

     public string Value { get; set; }
}

we can construct a database context for our MSI database as follows:

public class MyMsi : MsiQueryProvider
{
     public MyMsi(string file) : base(file)
     {
          Properties = new MsiTable<Property>(this);
     }

     public MsiTable<Property> Properties { get; private set; }
}

which can be used like:

var ctx = new MyMsi("c:\\temp\\demo.msi");
var res = from p in ctx.Properties where p.Name == "ProductCode" select p.Value;

Notice the table-object implements the IEnumerable interface which obviously simply queries the entire table. Since it queries the table, internally it can use the same querying infrastructure as derived queries (which we'll talk about next).

 

The built-in query pattern

C# and VB provide query comprehensions built-in into the language, which translate in chains of method calls. I've been showing this a lot, but one more time for the query above:

var res = from p in ctx.Properties where p.Name == "ProductCode" select p.Value;

becomes

var res = ctx.Properties.Where(p => p.Name == "ProductCode").Select(p => p.Value);

But one could well write other queries using orderby (translates into OrderBy* and ThenBy* method calls), or leaving out certain parts (e.g. no where clause). Since we don't want to support all query operators (dropping the I in IQueryable, just keeping someting queryable) we'll stick with a fluent pattern that allows enough expressiveness and plays nicely together with the built-in comprehensions that are used typically (where, select, orderby). Depending how flexible you want to make a query provider, you can provide more or less of those operators with the ultimate end of the spectrum being a full IQueryable<T> (trading compile-time errors for runtime exceptions in case of not-supported operators).

What we're going to build in this blog series is in fact a little state machine that accumulates query information (you could therefore - not surprisingly - call it a query accumulator) that supports those three basic operators (restriction using where, ordering using orderby and projection using select) in different orders:

image

Each of those classes inherits from a common query base class that looks like this:

abstract class MsiQueryBase<T> : IEnumerable<T>
{
     protected QueryData _query;

     protected MsiQueryBase(QueryData query)
     {
          _query = query;
     }

     IEnumerator IEnumerable.GetEnumerator()
     {
          return GetEnumerator();
     }

     public IEnumerator<T> GetEnumerator()
     {
          return _query.Execute<T>();
     }
}

The accumulative nature lies in the QueryData class we'll outline below. Each time the query gets extended with an additional piece of information, a new QueryData object is created. The reason for this is to keep intermediate query objects available so that multiple views can be constructed. This is more of a general design choice than one that's crucial for the LINQ to MSI provider (since our provider is so limited this "viewing" mechanism would be less useful anyway). With views I mean things like:

var expensive = from p in products where p.UnitPrice > 10 select p;
var notTooExpensive = from p in expensive where p.UnitPrice < 100 select p;

The hidden piece of crucial information here is the fact the chain of method invocations silently continues, conceptually like:

var notTooExpensive = products.Where(p => p.UnitPrice > 10).Select(p => p).Where(p.UnitPrice < 100).Select(p => p);

If the second query were to modify the query data encapsulated by the first query, iterating the first query would yield the results of the second one. Or in other words, how immutability is a must in this case. Notice that IQueryable<T> doesn't have this problem because of the way expression trees are generated (or better: composed). With IQueryable, the first query would be an expression tree which the second query simply refers to. We could either mimic this behavior by composing all pieces of query information into an expression tree, or make our internal query data representation immutable. We'll stick with the last approach as shown further.

Also note that every select operation results in what's called a closed query. Why closed? Because the projection (possibly) looses all track of the original entity information, through anonymous types used in the projection. It's sort of a semi-permeable membrane which can cause everything that follows to be "lost in translation". An example:

var res = (from p in products select new { Name = p.ProductName, Price = p.UnitPrice }).Where(p => p.Price > 100);

If the target query language allows nesting of queries, there's no problem. For example in SQL one can write:

SELECT Price FROM (SELECT ProductName AS Name, UnitPrice AS Price FROM Products) WHERE Price > 100

but lots of other query languages don't allow such flexibility (and indeed, if we were to try this in MSI's SQL implementation, it wouldn't work). In order to reflect this restriction, we make the close the query once a projection has been carried out. If grouping were supported, that's typically subject to analogous restrictions of course again depending on the underlying data store's flexibility.

 

Query data

The query data object we're going to use to represent the pieces of a query encountered during the chaining of query operators is shown below. It simply consists of a predicate (from Where), a set of orderings (from OrderBy* and ThenBy* calls) and a projection (from Select) as well as the original entity type and a reference to the query provider in order to gain access to the underlying data store. Alternatively, one could mimic the Queryable behavior by constructing an expression tree on the fly using Expression.Call, specifying the current expression tree as the left-hand side (the "this" operand if you will) and the passed-in parameters (like the predicate parameter for Where) as the call operands. I'll come back to this later in this series; for now, we'll construct a simple object like this:

internal class QueryData : ICloneable
{
     public MsiQueryProvider Provider;
     public Expression Where;
     public List<OrderClause> Order;
     public Expression Select;
     public Type EntityType;

     public QueryData()
     {
          Order = new List<OrderClause>();
     }

     public object Clone()
     {
          return new QueryData() { Provider = Provider, EntityType = EntityType, Where = Where, Order = new List<OrderClause>(Order), Select = Select };
     }

     public IEnumerator<T> Execute<T>()
     {
          yield break;
     }
}

Remember C#'s public is only as public as the container, so don't blame me for public fields (or feel free to add automatic property syntax on my behalf). Notice the internal object isn't immutable by itself, something you can debate about of course (or feel free to modify the constructor and add automatic property syntax with private setters on my behalf - just a matter of distributing work between the writer and the readers :-)), but we keep control over it ourselves (the "circle of trust"). Also notice that Expression objects are immutable by their design. Obviously, Execute will be where the real work is to be done, but we keep that for a future post.

 

Back to the Table<T>

Now that we have the query object and the idea of the fluent pattern, we can return to the MsiTable<T> object:

public class MsiTable<T> : IEnumerable<T>
{
     private QueryData _query;

     public MsiTable(MsiQueryProvider provider)
     {
          _query = new QueryData() { Provider = provider, EntityType = typeof(T) };
     }

     IEnumerator IEnumerable.GetEnumerator()
     {
          return GetEnumerator();
     }

     public IEnumerator<T> GetEnumerator()
     {
          return _query.Execute<T>();
     }

     public MsiQuery<T> Where(Expression<Func<T,bool>> predicate)
     {
          QueryData query = (QueryData)_query.Clone();
          query.Where = predicate;

          return new MsiQuery<T>(query);
     }

     public MsiOrderedQuery<T> OrderBy<K>(Expression<Func<T,K>> orderClause)
     {
          QueryData query = (QueryData)_query.Clone();
          query.Order.Add(new OrderClause { Mapper = orderClause, Descending = false });

          return new MsiOrderedQuery<T>(query);
     }

     public MsiOrderedQuery<T> OrderByDescending<K>(Expression<Func<T,K>> orderClause)
     {
          QueryData query = (QueryData)_query.Clone();
          query.Order.Add(new OrderClause { Mapper = orderClause, Descending = true });

          return new MsiOrderedQuery<T>(query);
     }

     public MsiClosedQuery<T, R> Select<R>(Expression<Func<T,R>> projection)
     {
          QueryData query = (QueryData)_query.Clone();
          query.Select = projection;

          return new MsiClosedQuery<T, R>(query);
     }
}

The constructor is straightforward, and so is the implementation of IEnumerable<T>, simply asking the query data to execute the query. Actually this code is in common with the MsiQueryBase<T> class, so you could use it as the base class here if you want. However, and this is again philosophical, a table by itself is not a query - it just so happens to be that internally we fetch results using the same query infrastructure upon enumeration of the table, which votes against using MsiQueryBase<T> as the base class here.

More interesting are the Where, OrderBy* and Select methods. The most important thing here is the signature:

public MsiQuery<T> Where(Expression<Func<T,bool>> predicate)
public MsiOrderedQuery<T> OrderBy<K>(Expression<Func<T,K>> orderClause)
public MsiOrderedQuery<T> OrderByDescending<K>(Expression<Func<T,K>> orderClause)
public MsiClosedQuery<T, R> Select<R>(Expression<Func<T,R>> projection)

All are little expression trees, representing the lambda passed in as the argument. These are captured in a clone of the QueryData object and wrapped inside another MsiQueryBase<T>-derived object, as depicted above in our state machine diagram. To distinguish between ascending and descending ordering, an OrderClause with a Descending property either set to false or true is added to the list of order clauses. This is an ordered list because obviously primary and n-ary (where n >= 2, represented by subsequent ThenBy* calls, see below) orderings are by themselves ordered.

 

Query, OrderedQuery and ClosedQuery

Finally we've arrived with the other query classes. Starting with MsiQuery which represents a query capturing a predicate (Where):

public class MsiQuery<T> : MsiQueryBase<T>
{
     internal MsiQuery(QueryData query) : base(query)
     {
          _query = query;
     }

     public MsiOrderedQuery<T> OrderBy<K>(Expression<Func<T,K>> orderClause)
     {
          QueryData query = (QueryData)_query.Clone();
          query.Order.Add(new OrderClause { Mapper = orderClause, Descending = false });

          return new MsiOrderedQuery<T>(query);
     }

     public MsiOrderedQuery<T> OrderByDescending<K>(Expression<Func<T,K>> orderClause)
     {
          QueryData query = (QueryData)_query.Clone();
          query.Order.Add(new OrderClause { Mapper = orderClause, Descending = true });

          return new MsiOrderedQuery<T>(query);
     }

     public MsiClosedQuery<T, R> Select<R>(Expression<Func<T,R>> projection)
     {
          QueryData query = (QueryData)_query.Clone();
          query.Select = projection;

          return new MsiClosedQuery<T, R>(query);
     }
}

Actually there's nothing new in here. All methods have been seen before in exactly the same shape as before. This method-level cloning is the result from trading IQueryable's endless composability for compile-time verified limited composition in our fluent pattern. Obviously, there are ways to "share" the common log (which in this case results in 3 shared lines of code per method). The reader should feel free to perform this refactoring. The core take-away is the limited number of methods per query class, in this particular case there's no Where method (since an object of this type itself is the result of a Where-application in the depicted state machine). If we were to support multiple Where-calls we'd have to AND them together, which wouldn't be a problem since Where and OrderBy*/ThenBy* operations are mutually commutative (i.e. you can put them everywhere in the chain of method calls, as long as you don't pass a Select boundary which introduces tricky nested queries as outlined above), but we deliberately choose to restrict the implementation in the scope of this series.

The MsiOrderedQuery<T> class is a little different since two new methods, ThenBy and ThenByDescending are introduced. However, those are no rocket science either:

public class MsiOrderedQuery<T> : MsiQueryBase<T>
{
     internal MsiOrderedQuery(QueryData query) : base(query)
     {
          _query = query;
     }

     public MsiClosedQuery<T, R> Select<R>(Expression<Func<T,R>> projection)
     {
          QueryData query = (QueryData)_query.Clone();
          query.Select = projection;

          return new MsiClosedQuery<T, R>(query);
     }

     public MsiOrderedQuery<T> ThenBy<K>(Expression<Func<T,K>> orderClause)
     {
          QueryData query = (QueryData)_query.Clone();
          query.Order.Add(new OrderClause { Mapper = orderClause, Descending = false });

          return new MsiOrderedQuery<T>(query);
     }

     public MsiOrderedQuery<T> ThenByDescending<K>(Expression<Func<T,K>> orderClause)
     {
          QueryData query = (QueryData)_query.Clone();
          query.Order.Add(new OrderClause { Mapper = orderClause, Descending = true });

          return new MsiOrderedQuery<T>(query);
     }
}

And finally, there's our closed query object, its closed nature being reflected by the lack of further query operators.

public class MsiClosedQuery<T, R> : MsiQueryBase<R>
{
     internal MsiClosedQuery(QueryData query) : base(query)
     {
          _query = query;
     }
}

In fact, this isn't the end of everything: the object still is an IEnumerable`1, so the user can continue to query using LINQ to Objects if System.Linq is in scope. The careful reader will have noticed this from the very beginning when MsiQueryBase<T> was introduced. For LINQ to MSI, we can see this as a feature but for "remotable" kinds of LINQ providers, I'd consider this a bug in the provider implementation. Why? The compiler would silently fall back to LINQ to Objects operators and you could well take a "bad start":

var res = from p in products join c in categories on p.Category equals c ...

Assume products and categories both represent a remote object but the provider doesn't support a GroupJoin operation. If the objects still derive from IEnumerable<T>, the LINQ to Objects operators will be in scope and you've silently created a local LINQ query that will suck down all objects from both tables for a client-side join. In LINQ to MSI such problems are not that much of a deal since the "remote database" isn't really remote; it's just a local file. And obviously, any logging mechanism on the provider (.Log property typically) will reveal the query sent to the database, so if that shows a "SELECT * fetch all" kind of query, it should trigger suspicion. IQueryable's AsEnumerable acts as an explicit boundary for remote to local execution and solves this problem.

There are ways around this problem of course (apart from buying in to the whole IQueryable stuff). For example, you could require explicit execution by means of an Execute method that returns the enumerator - a little less user-friendly but it rescues the mission anyway. Notice I said to return an "enumerator", not a memory-persisted list with all the pre-fetched results - this leaves room for lazy fetching of results through a data reader object which can be stopped at any moment without wasting network resources (e.g. you might still have a client-side Take(5) operation which would only cause at most five calls to *DataReader.Read()).

 

Known limitations

The query structure for our LINQ to MSI implementation supports Where, OrderBy*, ThenBy* and Select operators. MSI supports the following:

SELECT [DISTINCT]{column-list} FROM {table-list} [WHERE {operation-list}] [ORDER BY {column-list}]

which apparently only leaves out the DISTINCT operator (which is trivial to implement). However, there's one more restriction hidden in the FROM clause: {table-list}. We won't support joins in this implementation (for now) although MSI has limited support for them:

Only inner joins are supported and are specified by a comparison of columns from different tables. Circular joins are not supported. A circular join is a SQL query that links three or more tables together into a circuit.

I'll come back to this in a later post, outlining what it would take to implement support for this.

 

Next time...

...we'll start the translation battle. Stay tuned!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Filed under:

Comments

# re: LINQ to MSI - Part 2 - Queryable without an I

Monday, June 23, 2008 2:18 AM by Kjell-Åke Andersson

Have you looked at the LINQ-To-MSI provider that is available in the Deployment Tools Foundation in Windows Installer XML (WiX)??

http://wix.sourceforge.net

# Dew Drop - June 23, 2008 | Alvin Ashcraft's Morning Dew

Pingback from  Dew Drop - June 23, 2008 | Alvin Ashcraft's Morning Dew

# Arjan`s World &raquo; LINKBLOG for June 24, 2008

Tuesday, June 24, 2008 1:21 PM by Arjan`s World » LINKBLOG for June 24, 2008

Pingback from  Arjan`s World    &raquo; LINKBLOG for June 24, 2008

# WMOC#8 - Functional programming in c# - Service Endpoint

Pingback from  WMOC#8 - Functional programming in c# - Service Endpoint

# re: LINQ to MSI - Part 2 - Queryable without an I

Sunday, June 29, 2008 5:05 PM by Darrell Wright

Stupid question, but what is and OrderClause.  I assume it is a internal class with two public fields Mapper(an Expression) and a bool Descending?

# 3 Below &raquo; LINQ to MSI - Part 2 - Queryable without an I

Wednesday, July 02, 2008 9:59 PM by 3 Below » LINQ to MSI - Part 2 - Queryable without an I

Pingback from  3 Below &raquo; LINQ to MSI - Part 2 - Queryable without an I

# 3 Below &raquo; 3 Below ?? LINQ to MSI - Part 2 - Queryable without an I

Pingback from  3 Below &raquo; 3 Below ?? LINQ to MSI - Part 2 - Queryable without an I

# 3 Below &raquo; 3 Below ?? 3 Below ?? LINQ to MSI - Part 2 - Queryable without an I

Pingback from  3 Below &raquo; 3 Below ?? 3 Below ?? LINQ to MSI - Part 2 - Queryable without an I

# LINQ to MSI

Friday, July 25, 2008 10:51 AM by InstallSite Blog

LINQ stands for Language-Integrated Query and enables you to directly query databases in .NET programming