Deferred Execution behavior in LINQ providers

LINQ uses a Deferred Execution model which means that nothing really happens until the results of the query are accessed, e.g. in a for(each)-loop. One of the advantages of this model is that you can compose complex queries in multiple steps to make them more readable. So from a execution point of view I expected that it should not matter whether you create the query in one complex statement, or in multiple smaller statements. But unfortunately that’s not always the case…

To see the difference I’ve created an XML document and two simple queries that create Album objects when the where-clause matches.

The ‘complex’ query:

var query1 = from a in albums.Descendants("Album")
             where a.Element("Artist").Value == "Radiohead"
             select new Album
             {
                 Artist = a.Element("Artist").Value,
                 Title = a.Element("Title").Value
             };

The decomposed, more readable query:

var query2 = from a in albums.Descendants("Album")
             select new Album
             {
                 Artist = a.Element("Artist").Value,
                 Title = a.Element("Title").Value
             };

query2 = from a in query2
         where a.Artist == "Radiohead"
         select a;

The result from both queries is exactly the same, but they execute differently. What happens in Query2 is that first a list of all Album objects is created and then the ‘where’ part is evaluated over each object. This is different compared to Query1 in which an Album object is only created when it matches the ‘where’ part. So in this example Query1 is more efficient.

Is this what we should expect of deferred execution with LINQ? Yes, at least on implementations based on IEnumerable. What basically happens is that a chain of methods is created and executed in the same order as added to the query. This makes Query1: albums.Where(…).Select(…) and Query2: albums.Select(…).Where(…).

With implementations based on IQueryable, like LINQ To SQL, this is different in a way that it uses an Expression Tree to analyze/optimize the query. Using the same two queries with LINQ To SQL both queries execute exactly the same! Only one SQL statement is send to database and they both include the where-clause, which make them equally efficient.

In this example it isn’t that much a problem but when more data is involved this is definitely something to be aware of when composing LINQ queries.