Why LINQ (well, C#) is Broken
24 October 2016
LINQ is a system that provides a flexible query interface for .NET languages. It allows a user to write queries over arbitrary data using an in-built SQL-like syntax. This syntactic sugar is mapped to method calls at compile time, so any data structure that implements the correct methods can be used with LINQ.
The essential methods for enabling LINQ support are
implemented as extension methods. They have the following types:
With implementations of these three methods, it is possible to write a query expression such as:
which will be compiled to something like:
SomeData<C> output = justWord.SelectMany(a => myB, (a, b) => f(a, b));
Readers who are familiar with Haskell or similar functional languages will
>>= and the
from .. in .. select syntax is equivalent to Monad comprehensions. The above
code would be written in Haskell as follows:
LINQ was designed to bring Monad comprehensions to C#. And it does. Almost.
Consider our query from earlier:
This seems like a common pattern. We don’t want to write this code over and
over, so we abstract
f and make the query into a method.
Now say we define a new data type to use with LINQ, call it
SelectMany appropriately. We also want to implement
from .. in .. from .. in .. select .. is still a common
pattern that we want to avoid writing:
There is a pattern emerging. For every data type that we want to use with LINQ, one must reimplement all LINQ-specific methods specifically for that type.
This is an issue because it grossly violates DRY (don’t repeat yourself). A well-written program should not have duplicated code - it makes maintenance more laborious and increases the chance of bugs.
So in an effort to save ourselves time, we should abstract over this common pattern. We require a function that specifies
for all generic classes
SelectMany, given an instance of
As, another instance of
Bs, and a
Func<A,B,C, return an
It turns out that it’s actually impossible to write this method in C#. I’d like
to write something like
F<C> CombineWith<F<?>,A,B,C>(F<A> myA, F<B> myB, Func<A,B,C> f), but C# only
allows abstraction over non-generic types.
To add a little more weight to this revelation, let’s imagine if we could
not abstract over the contents of a list ie. the method
List<A> Sort<A>(List <A> input) cannot be expressed in this language. Due to
this limitation, we would have to create a new list class every time we needed
a different element type inside the list, then reimplement
Sort for each new
This is again a terrible violation of the “don’t repeat yourself” principle.
n implementations of
n is the number of sortable
classes. Imagine that each implementation used the
proven incorrect version of TimSort.
If you wanted to implement the correct version, you would have to update
Also consider the implementation of
List<B> Map<A,B>(List<A> input, Func<A,B> f) in a generic-less language. You
would have to write a different method for each inhabitant of
Map methods where
n is the number of of mappable classes.
More generally, in this generic-less language, you write
the sum of should-be-generic inputs and should-be-generic outputs, and
the number of should-be-generic classes.
This exponential growth of redundant nonsense also applies to our
issue. For every LINQ-able class, you have to write a separate implementation
CombineWith, even though it’s exactly the same code!
Haskell (and other sane functional languages) uses a concept called “Higher
Kinded Types” to address this problem. Every type has a “kind” (denoted
C#, every type must have kind
*. Higher-kinded types are functions from kinds
to kinds. Given data declaration that has a single type variable, say
Maybe a = Just a | Nothing, we say that
Maybe has kind
* -> *, which means
that it is a higher-kinded type that takes a type of kind
* and returns a type
*. In C#, every type must have kind
* ie. if you have a defined the
List<A> then you get a compile error if you refer to
the type argument.
Let’s take another look the Haskell implementation of
In this function, and the definition of the Monad typeclass (read: interface)
implicitly has kind
* -> *. This function will work for any type that is
an instance of Monad (read: implements the Monad interface). In Haskell, this
code only needs to be written once. The cost of implementation and maintenance
of a group of functions has gone from O(n^m) to O(1).
Now you might say, “Well, I don’t use LINQ like that. I only use it for
IEnumerable things”. This is akin to a user of our imaginary generic-less
language saying “Well, I don’t use Sort like that. I only sort lists of
integers”. It is agreed that a language without generics is counter to
productivity. It follows that a language without higher-kinded types is also
counter to productivity.