Saturday, June 07, 2008 8:44 PM bart

LINQ through PowerShell

In a reaction to my post on LINQ to MSI yesterday, Hal wrote this:

I don't know enough about the dev side to know if this is a stupid question or not but here goes: Would I be able to take advantage of LINQ to MSI (or LINQ in general from a wider point-of-view) from within PowerShell?  I know someone made an MSI snapin but I seem to recall it being a pretty simple thing.  Having the ability for admins to query and work with MSI packages seems like it could be awfully useful, and the point of not learning yet another SQL variant rings true for everyone, not just developers.  :)

Obviously not a stupid question at all (there are only stupid answers, to start with a cliché for once :-)). Having LINQ capabilities in PowerShell is definitely something I have given a thought - actually it has been one of my fun projects a few months back but there are quite some challenges associated with this. However, Hal's comment made me think about it a bit more, so I mixed in another piece of magic called "Dynamic LINQ". Let's take a look at an experimental journey through "LINQ in PowerShell" integration.

 

There's no such thing as a pipe...

Well, at least a unique definition for it. Pipelines are found in various corners of the computer science landscape but two major implementations stand out:

  • First of all there's the pipe model employed by shells with typical samples in the UNIX world, the DOS shell and obviously Windows PowerShell. This kind of pipe model works in a push-like eager fashion: the source at the left provides data that's pushed through the pipe that acts as a filtering (e.g. PS where) and transformation mechanism (e.g. PS select). The fact data appears at the end of it is the effect of pushing data into it at the front. It's typically eager because sending the chained set of commands that form the pipeline to the command processor triggers execution immediately, in a left-to-right fashion.
  • On the other side, there's a lazy variant of the pipe model used by the LINQ query comprehension monadic model. Data flows in at the left again but it doesn't start to flow till the pipeline participant on the right sucks data out of it. So ultimately a chain of query operators pull data from the source starting all the way from the right. This laziness makes LINQ stand out since no more data fetching work is done than strictly needed (e.g. if you do a Take(5) on a sequence of 1,000 items, no more than 5 items will be fetched).

Two different models of pipelines that prove hard to unify. However, on the other side when thinking about LINQ in PowerShell it would be handy to leverage existing idioms rather than creating a whole new language in the language for querying although that would definitely work too as long as it feels natural enough. Dynamic LINQ provides a middle-ground: the operators (Where, Select, OrderBy, etc) are still implemented as method calls while their arguments are based on a textual expression language.

 

Dynamic LINQ

Scott blogged about Dynamic LINQ a while ago in his Dynamic LINQ (Part 1: Using the LINQ Dynamic Query Library) post. In essence, Dynamic LINQ talks to any IQueryable-capable LINQ implementation and allows you to write things like:

nw.Products.Where("UnitPrice > 100").Select("New(ProductName As Name, UnitPrice As Price)");

The string islands in here are converted into expression trees at runtime and executed against the IQueryable data source. Notice dynamic types will be generated on the fly as well, in the sample above a type with two properties Name and Price created by the projection. This moves the boundary of runtime versus compile time a bit further: the expression strings become expression trees which in turn get translated into the back-end language by the provider being targeted (SQL, LDAP, CAML, WS calls, etc).

What you need to get started with this post, is the following download: C# Dynamic Query Library (included in the \LinqSamples\DynamicQuery directory)

 

Encapsulating a data source: Northwind strikes again

Moving the boundary from compile-time to run-time is a good thing in quite some cases but extremes are rarely the best choice. All the way statically typed won't work here: PowerShell works with an interpreter and lots of runtime supporting features (such as ETS) - a good thing it does. You don't sell compilers to IT Pros. Now, for our LINQ mission, complete dynamic typing wouldn't work that well either: we're going to access some data store which has rich data with specific type information. We better encapsulate this so that our PowerShell scripts can take advantage of it. For example, a SQL database table produces a strongly-typed entity class, which is precisely what LINQ to SQL's sqlmetal tool (or equivalent designer) does.

However, I do agree that for some types of data sources, a more ad hoc access mechanism is more appropriate that for others. Ad hoc here doesn't just point at the query capability (after all we want to realize just that) but also at the burden you have to go through to get access to data. I'd categorize SQL under the ad hoc extremes: SQL has been so easy to access (SQL syntax, SQLCMD or OSQL tool) that it's a pity we'll have to create an entity type upfront to access any table whatsoever. But still, if you want ad hoc data access, there's still regular SQL (which you'll use for dynamism by trading runtime types representing the data). On the other side there are things like AD where the schema changes rarely and the entities could be part of a library that exposes all of the entities that ship with AD. Once that one's loaded you have virtually any (and strongly-typed) ad hoc data capability through LINQ. After all, it depends on the flux of the data source. Requiring new types every time a SQL database schema changes is definitely overhead but as mentioned for things like AD and e.g. MSI (which has fixed tables) that would be less of a stumble block.

Let's go for a LINQ to SQL sample anyway despite all of this philosophical fluff, so create a new Class Library project in VS and add a LINQ to SQL Classes file to it:

image

Drag and drop all of the tables from the Northwind database from the Server Explorer to the designer and compile the assembly. Data access made easy - that's what LINQ's all about!

image

This can actually already be used from PowerShell:

image

Since the context object is just an object, you can create an instance of it like this:

[System.Reflection.Assembly]::LoadFile("C:\temp\LINQthroughPowerShell\Northwind\bin\Debug\Northwind.dll")
$ctx = new-object Northwind.NorthwindDataContext

Just load the DLL and use new-object. As you can see in the screenshot above, everything what we need is available. But...

 

Breaking eagerness

IEnumerables make PowerShell pipes tick (amongst other "streams" of objects provided by participants in the pipeline). That's too eager. Let's show you: in the session above, type $ctx and see what happens:

image

Oops, all data is in the console already. Handy but wasteful. Why does this happen? Tables in LINQ to SQL are of type Table<T> (where T is the entity type) which are IQueryable<T> and thus IEnumerable<T>. PowerShell, eager as it is, enumerates over IEnumerables to get their results (which makes sense in this context). If you turn on logging on the LINQ to SQL data context, you'll see precisely what happens:

image

So, how can we break eagerness? We don't have such a thing like a lazy pipeline, so let's create one by having two markers: a cmdlet that establishes a "lazy context" and one that terminates it. Everything in between flowing through the pipe won't be an IEnumerable but a "captured IEnumerable", in our case more specifically an IQueryable, which we rewrite throughout the pipe by adding LINQ operators to it through Dynamic LINQ. I assume readers of this blog are familiar with cmdlet development; if not, check out my Easy Windows PowerShell cmdlet development and debugging post.

Below is the class that will encapsulate the IQueryable to suppress eager evaluation, creating our object traveling through the lazy context:

public class LinqQuery
{
    private IQueryable _queryable;

    internal LinqQuery(IQueryable queryable)
    {
        _queryable = queryable;
    }

    public string Expression
    {
        get
        {
            return _queryable.Expression.ToString();
        }
    }

    internal IQueryable Query
    {
        get
        {
            return _queryable;
        }
    }
}

and to establish a lazy context, we'll provide a New-Query cmdlet:

[Cmdlet("New", "Query")]
public class NewQueryCmdlet : Cmdlet
{
    [Parameter(Mandatory = true, Position = 0)]
    public IQueryable Input { get; set; }

    protected override void ProcessRecord()
    {
        WriteObject(new LinqQuery(Input));
    }
}

And finally, to end the context, triggering evaluation, we'll have:

[Cmdlet("Execute", "Query")]
public class ExecuteQueryCmdlet : Cmdlet
{

     [Parameter(Mandatory = true, Position = 0)]
    public LinqQuery Input { get; set; }

    protected override void
ProcessRecord()
    {
        WriteObject(Input.Query);
    }
}

This last one is interesting in that it returns the IQueryable, which by means of the eager pipeline triggers execution (since LINQ providers reach out to the server fetching results upon calling GetEnumerator).

 

Dynamic LINQ query operator cmdlets

This barely needs any explanation whatsoever because of the simplicity of the Dynamic LINQ library. We just start by importing the namespace:

using System.Linq.Dynamic;

And start writing our first cmdlet for Where:

[Cmdlet("Where", "LinqObject")]
public class WhereCmdlet : Cmdlet
{
    [Parameter(Mandatory = true, ValueFromPipeline = true)]
    public LinqQuery Input { get; set; }

    [Parameter(Mandatory = true, Position = 0)]
    public string Predicate { get; set; }

    protected override void ProcessRecord()
    {
        WriteObject(new LinqQuery(Input.Query.Where(Predicate)));
    }
}

Notice we taking in a lazy LinqQuery object from the pipeline and emit a new LinqQuery object in ProcessRecord. This makes a LinqQuery object immutable and one can define a LinqQuery object by means of the pipeline for later reuse (e.g. have a query object that establishes a view on data, and then write multiple queries on top of that). The Predicate parameter takes a string that represents the expression language based predicate for the query. Below you can see all of the extension methods brought in scope by Dynamic LINQ:

image

So, we'll create a cmdlet for all of those, which is absolutely straightforward. Actually, this could be done purely declaratively with my Method Invocation Cmdlet mechanism too, but let's not go there.

 

Putting it to the test

Assume all cmdlets have been written (either do it yourself or download the code below). Time to do a sample:

image

It's that easy. And notice the SQL query being sent to the server, precisely what we were looking for, only fetching what we requested. I did split up the query across some lines to show the expression being generated behind the scenes, which is just a normal chain of methods calls captured by a dynamically generated expression tree at runtime. Without this splitting, writing a query is a one-liner:

image

The query syntax is different than PowerShell-like syntax (e.g. > instead of -lt) but that's merely a syntax issue which would be easy to change. And the cmdlet names are pretty long but there are aliases (lwhere, lsort, ltake, lselect for instance).

 

Implicit more intelligent lazy scoping

Actually what we established above is somewhat equivalent to an explicit Dispose call carried out by a using block. In this case, we have a LinqQuery object created by new-query and disposed off by execute-query. The latter one we can make implicit if we assume that the end of a pipeline should trigger evaluation. That's debatable since it doesn't allow to keep a query object across multiple invocations. Depending on personal taste around explicitness, you might like this behavior and provide a "defer-query" opt-out cmdlet. A simple way to do this intelligent auto-evaluation is by using the PSCmdlet baseclass' MyInvocation property:

public abstract class LazyCmdlet : PSCmdlet
{
    [Parameter(Mandatory = true, ValueFromPipeline = true)]
    public LinqQuery Input { get; set; }

    protected abstract LinqQuery Process();

    protected override void  ProcessRecord()
    {
        LinqQuery result = Process();

        if (MyInvocation.PipelinePosition < MyInvocation.PipelineLength)
        {
            WriteObject(result);
        }
        else
        {
            WriteObject(result.Query);
        }
    }
}

Instead of having the Dynamic LINQ cmdlets override ProcessRecord directly, we let them implement Process and depending on the position in the invocation chain, our base class either returns the query object (avoiding eager expansion by the pipeline) or the IQueryable inside it, making it expand and fetch/yield results. Here's the corresponding class diagram:

image

and with some aliases you can now write:

image

 

Download it

If you want to play with this, you can get the code here: LinqThroughPowerShellProvider.cs. It doesn't include the Dynamic LINQ codebase, which you can get from C# Dynamic Query Library (included in the \LinqSamples\DynamicQuery directory).

Build instructions:

  1. Create a new Class Library project.
  2. Add a reference to System.Management.Automation.dll (from %programfiles%\Reference Assemblies).
  3. Add a reference to System.Configuration.Install.dll.
  4. Add LinqThroughPowerShellProvider.cs to it.
  5. Add Dynamic.cs form the Dynamic LINQ library to it.
  6. Build it.

Install instructions:

  1. Open an elevated prompt and go to the bin\Debug build output folder.
  2. Execute installutil -i <name of the dll>

Run instructions:

  1. Open Windows PowerShell.
  2. Execute add-pssnapin LTP
  3. Play around with samples :-). You can use any LINQ provider with IQueryable support (e.g. LINQ to SQL, AD, SharePoint, etc).

Have fun!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Filed under: ,

Comments

# Making the pipeline lazy, LINQ through PowerShell

Sunday, June 08, 2008 1:52 AM by The PowerShell Guy

Bart de Smet (again) made a great PowerShell post, in reaction of a question from Hal Rottenberg (of

# re: LINQ through PowerShell

Sunday, June 08, 2008 10:33 AM by Hal Rottenberg

Awesome!  I'll have to pin you down at TechEd to explain all this.  :D

# re: LINQ through PowerShell

Sunday, June 08, 2008 3:37 PM by karl prosser

This is just brilliant.

# Reflective Perspective - Chris Alcock &raquo; The Morning Brew #110

Pingback from  Reflective Perspective - Chris Alcock  &raquo; The Morning Brew #110

# Dew Drop &ndash; June 9, 2008 | Alvin Ashcraft's Morning Dew

Pingback from  Dew Drop &ndash; June 9, 2008 | Alvin Ashcraft's Morning Dew

# LINQ through PowerShell

Tuesday, June 10, 2008 3:10 PM by DotNetKicks.com

You've been kicked (a good thing) - Trackback from DotNetKicks.com

# Development in a Blink &raquo; Blog Archive &raquo; One of the most important properties of LINQ: its flexibility

Pingback from  Development in a Blink  &raquo; Blog Archive   &raquo; One of the most important properties of LINQ: its flexibility

# BUGBUG: poor title &raquo; Blog Archive &raquo; Brilliant Powershell posts

Pingback from  BUGBUG: poor title  &raquo; Blog Archive   &raquo; Brilliant Powershell posts

# Episode 147 &#8211; The 2011 Scripting Games Champions &laquo; PowerScripting Podcast

Pingback from  Episode 147 &#8211; The 2011 Scripting Games Champions &laquo;  PowerScripting Podcast

# Bookmarks for July 30th through August 10th | The Wahoffs.com

Pingback from  Bookmarks for July 30th through August 10th | The Wahoffs.com