Why LINQ (well, C#) is Broken
24 October, 2016
LINQ is a system that provides a flexible query interface for .NET languages. It allows a user to write queries over arbitrary data using an in-built SQL-like syntax. This syntactic sugar is mapped to method calls at compile time, so any data structure that implements the correct methods can be used with LINQ.
The essential methods for enabling LINQ support are Select
and SelectMany
,
implemented as extension methods. They have the following types:
<B> Select<A,B>(this SomeData<A> a, Func<A,B> f)
SomeData<B> SelectMany<A,B>(this SomeData<A> a, Func<A,SomeData<B>> f)
SomeData<C> SelectMany<A,B,C>(this SomeData<A> a, Func<A,SomeData<B>> f, Func<A,B,C> g) // Overloaded to reduce levels of nesting SomeData
With implementations of these three methods, it is possible to write a query expression such as:
<A> myA = ...;
SomeData<B> myB = ...;
SomeData<A,B,C> f = ...;
Func<C> myC = from a in myA
SomeDatain myB
from b f(a,b); select
which will be compiled to something like:
SomeData<C> output = justWord.SelectMany(a => myB, (a, b) => f(a, b));
Readers who are familiar with Haskell or similar functional languages will
notice that Select
is fmap
, SelectMany
is >>=
and the
from .. in .. select
syntax is equivalent to Monad comprehensions. The above
code would be written in Haskell as follows:
= ...
myA = ...
myB = ...
f a b = do
myC <- myA
a <- myB
b return $ f a b
LINQ was designed to bring Monad comprehensions to C#. And it does. Almost.
Consider our query from earlier:
...
<C> myC = from a in myA
SomeDatain myB
from b f(a,b); select
This seems like a common pattern. We don’t want to write this code over and
over, so we abstract myA
, myB
and f
and make the query into a method.
<C> CombineWith<A,B,C>(SomeData<A> myA, SomeData<B> myB, Func<A,B,C> f)
SomeData{
return from a in myA from b in myB select f(a,b);
}
Now say we define a new data type to use with LINQ, call it OtherData<A>
, and
implement Select
and SelectMany
appropriately. We also want to implement
CombineWith
because from .. in .. from .. in .. select ..
is still a common
pattern that we want to avoid writing:
<C> CombineWith<A,B,C>(OtherData<A> myA, OtherData<B> myB, Func<A,B,C> f)
OtherData{
return from a in myA from b in myB select f(a,b);
}
There is a pattern emerging. For every data type that we want to use with LINQ, one must reimplement all LINQ-specific methods specifically for that type.
This is an issue because it grossly violates DRY (don’t repeat yourself). A well-written program should not have duplicated code - it makes maintenance more laborious and increases the chance of bugs.
So in an effort to save ourselves time, we should abstract over this common pattern. We require a function that specifies
for all generic classes
F<???>
implementingSelect
andSelectMany
, given an instance ofF
containingA
s, another instance ofF
containingB
s, and aFunc<A,B,C
, return anF
containingC
s
It turns out that it’s actually impossible to write this method in C#. I’d like
to write something like
F<C> CombineWith<F<?>,A,B,C>(F<A> myA, F<B> myB, Func<A,B,C> f)
, but C# only
allows abstraction over non-generic types.
To add a little more weight to this revelation, let’s imagine if we could
not abstract over the contents of a list ie. the method
List<A> Sort<A>(List <A> input)
cannot be expressed in this language. Due to
this limitation, we would have to create a new list class every time we needed
a different element type inside the list, then reimplement Sort
for each new
class.
ListOfInt.Sort
ListOfBool.Sort
ListOfSomeData.Sort
…
This is again a terrible violation of the “don’t repeat yourself” principle.
You write n
implementations of Sort
, where n
is the number of sortable
classes. Imagine that each implementation used the
proven incorrect version of TimSort.
If you wanted to implement the correct version, you would have to update n
methods.
Also consider the implementation of
List<B> Map<A,B>(List<A> input, Func<A,B> f)
in a generic-less language. You
would have to write a different method for each inhabitant of A
and B
ListOfInt.MapToListOfInt
ListOfInt.MapToListOfBool
ListOfInt.MapToListOfSomeData
ListOfBool.MapToListOfBool
…
You write n^2
Map
methods where n
is the number of of mappable classes.
More generally, in this generic-less language, you write O(n^m)
where m
is
the sum of should-be-generic inputs and should-be-generic outputs, and n
is
the number of should-be-generic classes.
This exponential growth of redundant nonsense also applies to our CombineWith
issue. For every LINQ-able class, you have to write a separate implementation
of CombineWith
, even though it’s exactly the same code!
Haskell (and other sane functional languages) uses a concept called “Higher
Kinded Types” to address this problem. Every type has a “kind” (denoted *
). In
C#, every type must have kind *
. Higher-kinded types are functions from kinds
to kinds. Given data declaration that has a single type variable, say
Maybe a = Just a | Nothing
, we say that Maybe
has kind * -> *
, which means
that it is a higher-kinded type that takes a type of kind *
and returns a type
of kind *
. In C#, every type must have kind *
ie. if you have a defined the
class List<A>
then you get a compile error if you refer to List
without
the type argument.
Let’s take another look the Haskell implementation of CombineWith
:
combineWith :: Monad m => m a -> m b -> (a -> b -> c) -> m c
= do
combineWith myA myB f <- myA
a <- myB
b return $ f a b
In this function, and the definition of the Monad typeclass (read: interface)m
implicitly has kind * -> *
. This function will work for any type that is
an instance of Monad (read: implements the Monad interface). In Haskell, this
code only needs to be written once. The cost of implementation and maintenance
of a group of functions has gone from O(n^m) to O(1).
Now you might say, “Well, I don’t use LINQ like that. I only use it for
IEnumerable
things”. This is akin to a user of our imaginary generic-less
language saying “Well, I don’t use Sort like that. I only sort lists of
integers”. It is agreed that a language without generics is counter to
productivity. It follows that a language without higher-kinded types is also
counter to productivity.