Tuesday, July 04, 2006 5:29 PM bart

LINQ - A custom implementation of the .NET Standard Query Operators

Introduction

Finally I found the time to catch up with the evolution on LINQ the last couple of months. The May 06 CTP of LINQ is without doubt a great one and the first one that reflects where Microsoft is taking data access technology in the future.

Btw, don't forget to check out http://blogs.msdn.com/dataaccess for more information about stuff like ADO.NET vNext and entities. Another useful resource is the Channel 9 video with Anders Hejlsberg and Sam Druker.

The .NET Standard Query Operators explained

So, what did I do the past couple of days? When crawling through the May 06 CTP release, I found the document entitled "The .NET Standard Query Operators" that contains information about all of the query operators supported by LINQ that ship with the technology out of the box. To give you an idea about these operators, a short list:

  • Restriction operators (1.3) - Where
  • Projection operators (1.4) - Select, SelectMany
  • Partitioning operators (1.5) - Take, TakeWhile, Skip, SkipWhile
  • Join operators (1.6) - Join, GroupJoin
  • Concatenation operators (1.7) - Concat
  • Ordering operators (1.8) - OrderBy, OrderByDescending, ThenBy, ThenByDescending, Reverse
  • Grouping operators (1.9) - GroupBy
  • Set operators (1.10) - Distinct, Union, Intersect, Except
  • Conversion operators (1.11) - ToSequence, ToArray, ToList, ToDictionary, ToLookup, OfType, Cast
  • Equality operators (1.12) - EqualAll
  • Element operators (1.13) - First, FirstOrDefault, Last, LastOrDefault, Single, SingleOrDefault, ElementAt, ElementAtOrDefault, DefaultIfEmpty
  • Generation operators (1.14) - Range, Repeat, Empty
  • Quantifiers (1.15) - Any, All, Contains
  • Aggregate operators (1.16) - Count, LongCount, Sum, Min, Max, Average, Aggregate

LINQ comes with an implementation of all those operators in the class System.Query.Sequence which contains a lot of so-called extension methods (a new feature of C# 3.0).

About extension methods

So, what are extension methods? Glad you asked. Basically, extension methods allow you to extend any class with addional (instance-level) methods without having code access to that particual class. Let me give you an example, with System.String.

Extension methods are static methods defined in a static class, assume the following one:

namespace MyExtensions
{
   public static class Extensions
   {
      public static string Reverse(this string str)
      {
         System.Text.StringBuilder sb = new System.Text.StringBuilder(str.Length);
         for (int i = str.Length - 1; i >= 0; i--)
            sb.Append(str[i]);
         return sb.ToString();
      }
   }
}

The most important keyword is indicated in bold, this. By prefixing the first parameter of the static method with the keyword 'this', the method becomes an extension method for the type specified by this very parameter. In this case, the Reverse method becomes an extension method for System.String.

To use the extension method, one keyword is required again: using.

using MyExtensions;

public class Demo
{
   public static void Main(string[] args)
   {
      string s = "Hello LINQ";
      Console.WriteLine(s.Reverse()); //will print "QNIL olleH"
   }
}

The using keyword not only imports a namespace, but it also provides access to the extension methods (defined in static classes as mentioned above) defined in that namespace. This allows us to write something like s.Reverse() although the type System.String doesn't contain a definition for the Reverse method.

Note: Extension methods have the lowest precedence when resolving method call invocations. Thus, if a method Reverse was defined on System.String, that one would take precedence over our extension method. This will lead to a very powerful mechanism when talking about LINQ, see further.

For the geek fellows, it might be interested what happens behind the scenes. Therefore, a little IL:

  • The extension method becomes a straightforward static method in the Extensions class:

    .method public hidebysig static string Reverse(string str) cil managed

  • The call to the extension method becomes a static method invocation:

    IL_0007: ldloc.0
    IL_0008: call string Extensions::Reverse(string)

So, under the covers the CLR isn't aware of this extension method thingie (btw, LINQ runs on top of .NET Framework 2.0).

Back to the query operators

Most query operators are extension methods that extend the generic interface System.Collections.Generics.IEnumerable<T>. Think about it for a while and imagine the scope of these extensions (how many classes do you know that implement this interface?). In order to have a good understanding of how all of these operators work, I decided to take the raw spec document and translate it into code, implementing all of the operators mentioned above (to give you an idea about the effort: 4301 lines of code including XML comments + 12761 lines of unit testing code).

Let's give you an example of one such method:

public static IEnumerable<T> Where<T>(this IEnumerable<T> source, Func<T, bool> predicate)
{
   if (source == null || predicate == null
)
      throw new ArgumentNullException
();

   foreach (T item in
source)
      if
(predicate(item))
         yield return
item;
}

As you can see, it's a static method and the first parameter is denoted with the this keyword, making it an extension method for the type IEnumerable<T>. The second parameter is of type Func<T, bool>, which is a delegate defined as follows:

public delegate TR Func<T0, TR>(T0 a0);

The body of the code is pretty straightforward. As the source parameter is of type IEnumerable<T>, a foreach loop can be used to iterate over it. In the end, a foreach loop is just a shorthand syntax for a while-loop like the following one:

   while (source.MoveNext())
   {
      T item = source.Current;
      //Use it
   }

The if (predicate(item)) condition checks whether the current item should be returned or not. If that's the case the item is yielded using the yield return item; statement. Maybe the concept of yielding is unknown to you, so I'll do a separate blog post about iterators in C# 2.0 pointing out how this works.

In the end, you can write something like this:

public static void Main()
{

   string
[] names = new string[] { "Bart", "Bill", "Steve", "John", "Rob" };

   foreach (string name in
names.Where(s => s.Length == 4)) //Will print "Bart\nBill\nJohn"
      Console.WriteLine(name);
}

Oops, another big IUnknown thing in the code snippet above: s => s.Length == 4. That's a so-called lambda expression which is also new to C# 3.0.

About lambda expressions

Lamba expressions are an easier way to write anonymous methods in C#. In C# 2.0, anonymous methods were introduced, resulting in easier code. The code snippet above could be rewritten as:

public static void Main()
{
  
string
[] names = new string[] { "Bart", "Bill", "Steve", "John", "Rob"
};

   foreach (string name in
names.Where(delegate (string s) { return s.Length == 4; })) //Will print "Bart\nBill\nJohn"
      Console.WriteLine(name);
}

In C# 1.x one should have written a separate method for the required delegate's logic:

public static void Main()
{

   string
[] names = new string[] { "Bart", "Bill", "Steve", "John", "Rob"
};

   foreach (string name in
names.Where(WhereClause)) //Will print "Bart\nBill\nJohn"
      Console.WriteLine(name);
}

static bool WhereClause(string s)
{
   return s.Length == 4;
}

From the C# 3.0 spec: "Lambda expressions provide a more concise, functional syntax for writing anonymous methods."

Making the link with LINQ

So, to recap we saw how to use an extension method such as Where to perform operations on a sequence (the LINQ-adage for an IEnumerable<T>). As these extension methods also return sequences, one can chain a bunch of operators. As I've only defined the Where method in this post, let's stick with that:

public static void Main()
{

   string[] names = new string[] { "Bart", "Bill", "Steve", "John", "Rob" };

   foreach (string name in
names.Where(s => s.Length == 4).Where(s => s.StartsWith("B"))) //Will print "Bart\nBill"
      Console
.WriteLine(name);
}

Another more complex example is:

names.Where(s => s.Length == 4).OrderByDescending(s => s).Select(s => s.ToUpper())

Imagine the complexity you would have to go through without lambda expressions, without anonymous methods and without extension methods. You would end up with 3 methods containing the one-line lambda expressions equivalent code, and you would have to use static calls (throwing away the chaining capability). An example:

public static void Main()
{
   string[] names =  = new string[] { "Bart", "Bill", "Steve", "John", "Rob" };
   IEnumerable<string> s1 = Sequence.Where(names, WhereClause);
   IEnumerable<string> s2 = Sequence.OrderByDescending(s1, OrderByDescendingClause);
   IEnumerable<string> result = Sequence.Select(s2, SelectClause);

   foreach (string name in
result) //Will print "Bart\nBill"
      Console.WriteLine(name);
}

bool WhereClause(string s) { return s.Length == 4; }
string OrderByDescendingClause(string s) { return s; }
string SelectClause(string s) { return s.ToUpper(); }

Now, all of this already looks pretty cool but we can still go a little further: enter LINQ (Language Integrated Query). LINQ for C# 3.0 allows you to write the following:

public static void Main()
{
   string[] names =  = new string[] { "Bart", "Bill", "Steve", "John", "Rob" };


   IEnumerable<string> result = from in names
                                where s.Length == 4
                                orderbydescending
                                select s.ToUpper();

   foreach (string name in
result) //Will print "JOHN\nBILL\nBART"
      Console.WriteLine(name);
}

Visual Basic 9.0 has also LINQ support, but I won't cover that for now (it's basically the same syntax).

If you like, take a look at the IL code generated by this example. The relevant code (i.e. the LINQ query) translates to the following piece of IL:

IL_0032: ldloc.0
IL_0033: ldsfld class BdsSoft.Query.Func`2<string,bool> BdsSoft.Query.Program::'<>9__CachedAnonymousMethodDelegate7'
IL_0038: brtrue.s IL_004d
IL_003a: ldnull
IL_003b: ldftn bool BdsSoft.Query.Program::'<Main>b__0'(string)
IL_0041: newobj instance void class BdsSoft.Query.Func`2<string,bool>::.ctor(object, native int)
IL_0046: stsfld class BdsSoft.Query.Func`2<string,bool> BdsSoft.Query.Program::'<>9__CachedAnonymousMethodDelegate7'
IL_004b: br.s IL_004d
IL_004d: ldsfld class BdsSoft.Query.Func`2<string,bool> BdsSoft.Query.Program::'<>9__CachedAnonymousMethodDelegate7'
IL_0052: call class [mscorlib]System.Collections.Generic.IEnumerable`1<!!0> BdsSoft.Query.Sequence::Where<string>(class [mscorlib]System.Collections.Generic.IEnumerable`1<!!0>,class BdsSoft.Query.Func`2<!!0,bool>)
IL_0057: ldsfld class BdsSoft.Query.Func`2<string,string> BdsSoft.Query.Program::'<>9__CachedAnonymousMethodDelegate8'
IL_005c: brtrue.s IL_0071
IL_005e: ldnull
IL_005f: ldftn string BdsSoft.Query.Program::'<Main>b__3'(string)
IL_0065: newobj instance void class BdsSoft.Query.Func`2<string,string>::.ctor(object, native int)
IL_006a: stsfld class BdsSoft.Query.Func`2<string,string> BdsSoft.Query.Program::'<>9__CachedAnonymousMethodDelegate8'
IL_006f: br.s IL_0071
IL_0071: ldsfld class BdsSoft.Query.Func`2<string,string> BdsSoft.Query.Program::'<>9__CachedAnonymousMethodDelegate8'
IL_0076: call class BdsSoft.Query.OrderedSequence`1<!!0> BdsSoft.Query.Sequence::OrderByDescending<string,string>(class [mscorlib]System.Collections.Generic.IEnumerable`1<!!0>, class BdsSoft.Query.Func`2<!!0,!!1>)
IL_007b: ldsfld class BdsSoft.Query.Func`2<string,string> BdsSoft.Query.Program::'<>9__CachedAnonymousMethodDelegate9'
IL_0080: brtrue.s IL_0095
IL_0082: ldnull
IL_0083: ldftn string BdsSoft.Query.Program::'<Main>b__5'(string)
IL_0089: newobj instance void class BdsSoft.Query.Func`2<string,string>::.ctor(object, native int)
IL_008e: stsfld class BdsSoft.Query.Func`2<string,string> BdsSoft.Query.Program::'<>9__CachedAnonymousMethodDelegate9'
IL_0093: br.s IL_0095
IL_0095: ldsfld class BdsSoft.Query.Func`2<string,string> BdsSoft.Query.Program::'<>9__CachedAnonymousMethodDelegate9'
IL_009a: call class [mscorlib]System.Collections.Generic.IEnumerable`1<!!1> BdsSoft.Query.Sequence::Select<string,string>(class [mscorlib]System.Collections.Generic.IEnumerable`1<!!0>, class BdsSoft.Query.Func`2<!!0,!!1>)
IL_009f: stloc.1

As you can see, we end up with three anonymous methods: <Main>b__0 contains the where clause code, <Main>b__3 contains the orderby clause code and <Main>b__5 contains the select clause code. The foreach-related code was omitted as I'll cover that in my upcoming iterator-related post.

Get the code!

I've uploaded my custom implementation of all the .NET Standard Query Operators on my website. You can grab it over here. The download includes a VS2005 solution including a console application with the implementation of the Standard Query Operators (I chose to create a console application so that you can start experimenting with it right away in the Program.cs file) and 143 unit tests for each query operator, testing all the scenarios from the spec (100% code coverage for the Sequence class; 97% code coverage in total). You can reuse these tests to check your modifications to the implementation if desired.

How to use?

The only thing you should do in order to use the custom implementation of the query operators instead of the System.Query implementation is to import

using BdsSoft.Query;

instead of

using System.Query;

By doing so, the compiler will find the extension methods in the custom library and use these instead of the System.Query.Sequence implementation. You can put breakpoints in the custom implementation code to verify this.

Enjoy!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Filed under: ,

Comments

# C# 2.0 Iterators

Friday, July 07, 2006 12:28 AM by B# .NET Blog

Introduction
In my post about LINQ a couple of days ago, I promised to do a dive deep post on iterators...

# New and Notable 108

Friday, July 07, 2006 2:28 PM by Sam Gentile

Ruby/Rails

Martin notes that a video of his keynote at RailsConf is online and the rest are starting...

# re: LINQ - A custom implementation of the .NET Standard Query Operators

Friday, July 07, 2006 3:58 PM by Jim Wooley

Great article. I'm unclear why you decided to take on the effort of re-writing sequence.cs other than for the point of a learning exercise. Anyway, if any of your readers are interested, I have a write-up of extension methods in VB which are not well documented yet otherwise at http://devauthority.com/blogs/jwooley/archive/2006/05/21/1095.aspx

# Andy Maule&#8217;s blog &raquo; Blog Archive &raquo; Behind the scenes of LINQ

# LINQ-SQO project started on CodePlex

Saturday, August 05, 2006 12:59 AM by B# .NET Blog

Today, the LINQ-SQO&amp;nbsp;(&quot;A custom implementation of the .NET Standard Query Operators from LINQ&quot;)&amp;nbsp;project...

# How LINQ Works - IQueryable &laquo; The Wandering Glitch 2

Saturday, November 18, 2006 5:08 AM by How LINQ Works - IQueryable « The Wandering Glitch 2

# Simple Ling Query &laquo; Joachim Van den Bogaert&#8217;s Weblog

Pingback from  Simple Ling Query &laquo; Joachim Van den Bogaert&#8217;s Weblog

# New and Notable 108

Thursday, December 04, 2008 2:44 PM by Sam Gentile's Blog

Ruby/Rails Martin notes that a video of his keynote at RailsConf is online and the rest are starting to appear (PragDave is up there). John Lam talks about why Ruby (and a little bit about RubyCLR), on NET Rocks . LINQ/Data Daniel Cazzulino on how upcoming