Lists
Let's use what we have learned about parameterized types to generalize our
IntList
type. In Java, generic containers such as ArrayList<T>
exist exactly
because it would be a terrible waste of energy to implement a separate
ArrayList
type from scratch for every possible element type we want to store
in it. The implementation of the ArrayList
does not change, no matter what
elements we want to store in it. The same is true for a list. So let's define a
list type that can store not only Int
s but values of any type a
:
data List a
= Empty
| Cons a (List a)
This is essentially the same definition as that of IntList
, only the list type
has gained a parameter a
now that is the type of the elements we want to store
in the list. The constructor for a non-empty list is no longer Cons Int IntList
but Cons a (List a)
because a non-empty list of a
s stores as its head an
element of type a
, and the tail of this list must itself be a list of a
s,
that is, a List a
.
All the functions we have defined for IntList
so far can be converted into
functions that work for arbitrary lists without changing their logic. Here's the
head function:
listHead :: List a -> Maybe a
listHead Empty = Nothing
listHead (Cons x _) = Just x
The implementation is the same, character for character, as our earlier
implementation for IntList
, only the type of the function has changed from
IntList -> Maybe Int
to List a -> Maybe a
. That's an indication that making
this function polymorphic in the element type a
is the right thing to do: we
gain the flexibility of polymorphism without any added complications in our
implementation of the listHead
function.
Here's the tail function:
listTail :: List a -> Maybe (List a)
listTail Empty = Nothing
listTail (Cons _ xs) = Just xs
This example shows that the types we pass as arguments to type constructors such
as Maybe
don't have to be "simple" types. They can themselves be constructed
from other types using type constructors, here List
. Type constructors can
nest. The mathematically inclined among you may gain a lot of clarity
surrounding this nesting of type constructors from the following aside.
Aside: Type Functions
I said before that we call type constructors type constructors because they
can be seen as functions that take an argument type and construct a new type
from this argument type. Maybe
is a type constructor that takes a
as an
argument and constructs from it the type Maybe a
. Haskell is extremely
consistent in this view of type constructors as functions. Let's build up
our understanding of these type functions slowly.
Normal values and functions have types. The value 5
has type Int
(or
any number type, but let's ignore this little wrinkle here). The function
even
has the type Int -> Bool
. The function (+)
has type
Int -> Int -> Int
. So types capture what we can do with values.
Now when considering types such as Int
, Double
, and type constructors
such as Maybe
or List
, we can think about concrete types such as Int
and Double
as the values we are working with, and Maybe
and List
become functions that manipulate these "values", that map types to types. To
avoid mixing up the types of values and functions from values to values with
the types of types and functions from types to types, Haskell calls the
types of types and type constructors kinds. So a kind is the type of a
type.
Now look at the analogy between values and types. 5
is a simple value, of
type Int
. Similarly, Int
is a "simple type" (not a type constructor).
Its kind is *
. That's the symbol that Haskell uses for the kind of all
types that can have values. You can check this in GHCi:
>>> :kind Int
Int :: *
>>> :kind Double
Double :: *
>>> :kind Char
Char :: *
All of these are types that can have values. We have values of type Int
,
Double
, Char
, and so on. There are no values of type Maybe
. There are
only values of type Maybe Int
or Maybe Char
. You can check this again by
asking for the kinds of these types in GHCi:
>>> :kind Maybe
Maybe :: * -> *
>>> :kind Maybe Int
Maybe Int :: *
>>> :kind Maybe Char
Maybe Char :: *
Indeed, GHCi agrees with us that there exist values of type Maybe Int
and
Maybe Char
. We know that we can create entities such as Just 1
or
Just 'a'
, and GHCi tells us that these types have kind *
. Maybe
by
itself, on the other hand, has kind * -> *
. This also makes perfect sense,
because Maybe
is a type constructor, a type function. If we give it a type
of kind *
, such as Int
, as an argument, it produces from it the type
Maybe Int
, which also has kind *
. Thus, it maps a type of kind *
to a
type of kind *
; its own kind is * -> *
.
If we take types with multiple type parameters, such as Either
discussed
later in this book,
data Either a b = Left a | Right b
what should its kind be? Well, it takes two type arguments and produces
from them a new type. So, its kind should be * -> * -> *
, just as a
function with arguments of type a
and type b
and return value of type
c
has type a -> b -> c
. And, indeed, GHCi agrees with us:
>>> :kind Either
Either :: * -> * -> *
Now if this analogy between types of values and functions and kinds of types and type constructors makes sense to you, you may/will/should ask whether all the other magic of Haskell functions—functions that take other functions as arguments, partial function application, etc.—also applies to type functions. Yes, it does.
There do exist type constructors such as
>>> import Control.Monad.State
>>> :kind StateT
StateT :: * -> (* -> *) -> * -> *
This is a type constructor with three arguments. The first and last one must
be simple types, of kind *
. The second argument must itself be a type
constructor, of kind * -> *
, that is, one that maps a type of kind *
to a
type of kind *
.
We can also partially apply type constructors, and the result will once
again be a type constructor. For example, the Either
type is often used to
represent computations that can fail, similar to the Result
type in
Rust
. In this case, we often have some type Error
that represents
errors. The return type of a function that may fail and normally returns
some value of type a
is Either Error a
. If we use the same error type
Error
throughout a wide swath of our code, then writing Either Error a
for all those return types gets tedious. Better to define a type alias:
type Result a = Either Error a
and then we can write Result Char
, Result Int
, etc. for the return types
of our functions. However, we can define this type more succinctly:
type Result = Either Error
That's partial function application. Either
has kind * -> * -> *
. Here,
we provided the first argument, Error
, but not the second argument, so
Either Error
has kind * -> *
. We assign the resulting type function to
Result
, by defining Result
as an alias for it. We can now apply Result
to any concrete type to produce another concrete type that is wrapped in
Either Error
.
By default, *
is the only kind available for simple types that can have
values, and functions between them are functions of kinds * -> *
,
* -> * -> *
, * -> (* -> *) -> *
, and so on. For the ninja Haskell
programmer, there exists a Haskell extension that allows us to lift ordinary
types to be used as kinds of other types. Used properly, this can further
increase the type safety of our code, but exploring this here would go too
far. This extension also isn't used that widely because in most
circumstances, the safety guarantees provided by Haskell's "vanilla" type
system are sufficient.
Let's look at our intListMap
and intListFilter
functions. First
intListFilter
. Its translation to general lists is once again the same,
character for character, only the type has changed:
listFilter :: (a -> Bool) -> List a -> List a
listFilter pred = go
where
go Empty = Empty
go (Cons x xs) | pred x = Cons x (go xs)
| otherwise = go xs
The intListMap
function worked only for IntList
s and, as a result, could
also only take functions of type Int -> Int
as the first argument: every list
element of type Int
in the input list had to be translated into a list element
of the output list, which also had to be of type Int
. The direct translation
of this to arbitrary lists would be:
listMap :: (a -> a) -> List a -> List a
listMap f = go
where
go Empty = Empty
go (Cons x xs) = Cons (f x) (go xs)
but that's more restrictive than it has to be. This version of listMap
can
only produce a list of a
s from a list of a
s. But what if I want to convert a
list of Double
s into a list of Int
s by rounding each Double
in the input
list. listMap
can't do this because Double
and Int
are different types.
What we want is a function that can turn a List a
into a List b
, where a
and b
can be the same type, but they don't have to be. To produce a List b
from a List a
, we need a function that turns each list element of type a
into a list element of type b
, that is, a function of type a -> b
. Well,
that's easy:
listMap :: (a -> b) -> List a -> List b
listMap f = go
where
go Empty = Empty
go (Cons x xs) = Cons (f x) (go xs)
Only two letters changed in this implementation: the return type of the function
f
, and the type of the list elements in the list returned by listMap
.
Exercise
The standard library contains an operator
(++) :: [a] -> [a] -> [a]
Given two lists, it returns their concatenation:
>>> [1,2] ++ [3,4,5]
[1,2,3,4,5]
Implement a function listJoin
that does the same for our custom list type
List a
:
listJoin :: List a -> List a -> List a
Solution
listJoin (Cons x xs) ys = Cons x (listJoin xs ys)
listJoin _ ys = ys
Exercise
The standard library contains a function foldr
for lists:
foldr :: (a -> b -> b) -> b -> [a] -> b
The expression foldr f init xs
takes the initial value init
and
traverses the list xs
from right to left (starting at the end). For each
list element x
, it updates the value of init
using the function f
. In
particular, it applies f
to x
and the current value of init
to produce
the new value of init
. The result of foldr f init xs
is the value of
init
obtained after traversing the entire list. For example,
>>> foldr (+) 0 [1,2,3,4,5]
15
sums all the elements in the given list.
Implement a function listFoldR
that does the same for our custom list type
List a
:
listFoldR :: (a -> b -> b) -> b -> List a -> b
Solution
listFoldR f = go
where
go accum (Cons x xs) = f x $ go accum xs
go accum _ = accum
Exercise
The standard library contains a function
concat :: [[a]] -> [a]
Given a list of lists of a
s, it concatenates the elements in these lists
into a single list:
>>> concat [[1,2],[],[3],[4,5,6]]
[1,2,3,4,5,6]
Implement the same function for our custom list type List a
:
listConcat :: List (List a) -> List a
Solution
Here's the easy implementation using the two functions listJoin
and
listFoldR
from the previous two exercises. We start with the empty
list and then add the given lists to the beginning of this list, using
listJoin
, from right to left.
listConcat = listFoldR listJoin Empty
A more explicit manual implementation that avoids listFoldR
but still
uses listJoin
would be:
listConcat (Cons xs xss) = listJoin xs $ listConcat xss
listConcat _ = Empty
With a polymorphic list type, we can return to the solution of this exercise on implementing Merge Sort and fix the efficiency problem. Remember, the problem was that to split the list in half, we had to traverse the list once to determine its length, and then traverse the first half one more time to split the list into the two halves. Here is how we avoid this: Consider all the recursive calls that Merge Sort makes. At the bottom of the recursion, we have partitioned the list into one-element lists that are already sorted. That's why the recursion stops at these lists. As we back out of the recursion, we merge these lists in pairs to produce a single sorted list. Here's the trick then. We first convert our list of integers into a list of one-element lists. That's trivially a list of sorted lists because every one-element list is sorted. Now we repeatedly group the sorted lists in our list into pairs and merge them. This gives another list of sorted lists, but this one has half as many lists in it as before. By repeating this \(O(\lg n)\) times, we obtain a list containing a single sorted list. This one list is what Merge Sort needs to return. Here's what this looks like in Haskell:
mergeSort :: List Int -> List Int
mergeSort = mergeInPairs . mapList (\x -> Cons x Empty)
mergeInPairs :: List (List Int) -> List Int
mergeInPairs xss@(Cons _ (Cons _ _)) = mergeInPairs $ mergePairs xss
mergeInPairs (Cons xs _) = xs
mergeInPairs _ = Empty
mergePairs :: List (List Int) -> List (List Int)
mergePairs (Cons xs (Cons ys yss)) = Cons (merge xs ys) (mergePairs yss)
mergePairs xss = xss
So mergeSort
applies mapList (\x -> Cons x Empty)
to convert each element in
its input list into a one-element list.1 Then it merges them into
a single list by calling mergeInPairs
.
Given a list of lists, mergePairs
gathers these lists in pairs and merges the
lists in each pair. It collects the resulting merged lists in a list and returns
this list. So it reduces a list of storted lists to a list of sorted lists with
half as many lists in it. In that sense, it makes progress towards having a
single sorted list.
mergeInPairs
uses mergePairs
recursively to produce a a single sorted list.
If the input consists of at least two lists—it's a list of shape
Cons _ (Cons _ _)
—then we call mergePairs
to reduce the number of lists to
be merged and then call mergeInPairs
recursively on this shorter list of lists
to turn it into a single sorted list. Note that this first equation of
mergeInPairs
uses an at-pattern, which I introduced in the solution to
this exercise. If the input consists of a
list containing a single list, then this is our sorted list. We extract it and
return it. The final branch is necessary to deal with the possibility that the
input to mergeSort
is empty. In this case, mergeSort
also calls
mergeInPairs
on an empty list of lists, and the result of merging these lists
is also the empty list.
Finally, mergePairs
achieves its goal of merging the lists in its input list
of lists in pairs as follows: If there are at least two lists in the input—the
input is of shape Cons xs (Cons ys yss)
with xs
and ys
being the first two
lists and yss
being the list of all remaining lists—then the first list in the
result is merge xs ys
. The remainder of the result is produced by merging the
lists in yss
recursively. If there is at most one list in the input, then
there's nothing to merge; the result is just the list of lists given as input.
-
After we introduce Haskell's built-in list type properly in the next chapter, this function
\x -> Cons x Empty
, which is quite a mouthful, becomes simply\x -> [x]
or, just,singleton
. Using list comprehensions, we can in fact reduce the whole expressionmapList (\x -> [x]) xs
down to[[x] | x <- xs]
. That's much more readable, but it is useful to work with our custom list type for a bit to understand how lists work under the hood. ↩