Lists

Let's use what we have learned about parameterized types to generalize our IntList type. In Java, generic containers such as ArrayList<T> exist exactly because it would be a terrible waste of energy to implement a separate ArrayList type from scratch for every possible element type we want to store in it. The implementation of the ArrayList does not change, no matter what elements we want to store in it. The same is true for a list. So let's define a list type that can store not only Ints but values of any type a:

data List a
    = Empty
    | Cons a (List a)

This is essentially the same definition as that of IntList, only the list type has gained a parameter a now that is the type of the elements we want to store in the list. The constructor for a non-empty list is no longer Cons Int IntList but Cons a (List a) because a non-empty list of as stores as its head an element of type a, and the tail of this list must itself be a list of as, that is, a List a.

All the functions we have defined for IntList so far can be converted into functions that work for arbitrary lists without changing their logic. Here's the head function:

listHead :: List a -> Maybe a
listHead Empty      = Nothing
listHead (Cons x _) = Just x

The implementation is the same, character for character, as our earlier implementation for IntList, only the type of the function has changed from IntList -> Maybe Int to List a -> Maybe a. That's an indication that making this function polymorphic in the element type a is the right thing to do: we gain the flexibility of polymorphism without any added complications in our implementation of the listHead function.

Here's the tail function:

listTail :: List a -> Maybe (List a)
listTail Empty       = Nothing
listTail (Cons _ xs) = Just xs

This example shows that the types we pass as arguments to type constructors such as Maybe don't have to be "simple" types. They can themselves be constructed from other types using type constructors, here List. Type constructors can nest. The mathematically inclined among you may gain a lot of clarity surrounding this nesting of type constructors from the following aside.

Aside: Type Functions

I said before that we call type constructors type constructors because they can be seen as functions that take an argument type and construct a new type from this argument type. Maybe is a type constructor that takes a as an argument and constructs from it the type Maybe a. Haskell is extremely consistent in this view of type constructors as functions. Let's build up our understanding of these type functions slowly.

Normal values and functions have types. The value 5 has type Int (or any number type, but let's ignore this little wrinkle here). The function even has the type Int -> Bool. The function (+) has type Int -> Int -> Int. So types capture what we can do with values.

Now when considering types such as Int, Double, and type constructors such as Maybe or List, we can think about concrete types such as Int and Double as the values we are working with, and Maybe and List become functions that manipulate these "values", that map types to types. To avoid mixing up the types of values and functions from values to values with the types of types and functions from types to types, Haskell calls the types of types and type constructors kinds. So a kind is the type of a type.

Now look at the analogy between values and types. 5 is a simple value, of type Int. Similarly, Int is a "simple type" (not a type constructor). Its kind is *. That's the symbol that Haskell uses for the kind of all types that can have values. You can check this in GHCi:

GHCi

>>> :kind Int
Int :: *
>>> :kind Double
Double :: *
>>> :kind Char
Char :: *

All of these are types that can have values. We have values of type Int, Double, Char, and so on. There are no values of type Maybe. There are only values of type Maybe Int or Maybe Char. You can check this again by asking for the kinds of these types in GHCi:

GHCi

>>> :kind Maybe
Maybe :: * -> *
>>> :kind Maybe Int
Maybe Int :: *
>>> :kind Maybe Char
Maybe Char :: *

Indeed, GHCi agrees with us that there exist values of type Maybe Int and Maybe Char. We know that we can create entities such as Just 1 or Just 'a', and GHCi tells us that these types have kind *. Maybe by itself, on the other hand, has kind * -> *. This also makes perfect sense, because Maybe is a type constructor, a type function. If we give it a type of kind *, such as Int, as an argument, it produces from it the type Maybe Int, which also has kind *. Thus, it maps a type of kind * to a type of kind *; its own kind is * -> *.

If we take types with multiple type parameters, such as Either discussed later in this book,

data Either a b = Left a | Right b

what should its kind be? Well, it takes two type arguments and produces from them a new type. So, its kind should be * -> * -> *, just as a function with arguments of type a and type b and return value of type c has type a -> b -> c. And, indeed, GHCi agrees with us:

GHCi

>>> :kind Either
Either :: * -> * -> *

Now if this analogy between types of values and functions and kinds of types and type constructors makes sense to you, you may/will/should ask whether all the other magic of Haskell functions—functions that take other functions as arguments, partial function application, etc.—also applies to type functions. Yes, it does.

There do exist type constructors such as

GHCi

>>> import Control.Monad.State
>>> :kind StateT
StateT :: * -> (* -> *) -> * -> *

This is a type constructor with three arguments. The first and last one must be simple types, of kind *. The second argument must itself be a type constructor, of kind * -> *, that is, one that maps a type of kind * to a type of kind *.

We can also partially apply type constructors, and the result will once again be a type constructor. For example, the Either type is often used to represent computations that can fail, similar to the Result type in Rust. In this case, we often have some type Error that represents errors. The return type of a function that may fail and normally returns some value of type a is Either Error a. If we use the same error type Error throughout a wide swath of our code, then writing Either Error a for all those return types gets tedious. Better to define a type alias:

type Result a = Either Error a

and then we can write Result Char, Result Int, etc. for the return types of our functions. However, we can define this type more succinctly:

type Result = Either Error

That's partial function application. Either has kind * -> * -> *. Here, we provided the first argument, Error, but not the second argument, so Either Error has kind * -> *. We assign the resulting type function to Result, by defining Result as an alias for it. We can now apply Result to any concrete type to produce another concrete type that is wrapped in Either Error.

By default, * is the only kind available for simple types that can have values, and functions between them are functions of kinds * -> *, * -> * -> *, * -> (* -> *) -> *, and so on. For the ninja Haskell programmer, there exists a Haskell extension that allows us to lift ordinary types to be used as kinds of other types. Used properly, this can further increase the type safety of our code, but exploring this here would go too far. This extension also isn't used that widely because in most circumstances, the safety guarantees provided by Haskell's "vanilla" type system are sufficient.

Let's look at our intListMap and intListFilter functions. First intListFilter. Its translation to general lists is once again the same, character for character, only the type has changed:

listFilter :: (a -> Bool) -> List a -> List a
listFilter pred = go
  where
    go Empty                   = Empty
    go (Cons x xs) | pred x    = Cons x (go xs)
                   | otherwise =         go xs

The intListMap function worked only for IntLists and, as a result, could also only take functions of type Int -> Int as the first argument: every list element of type Int in the input list had to be translated into a list element of the output list, which also had to be of type Int. The direct translation of this to arbitrary lists would be:

listMap :: (a -> a) -> List a -> List a
listMap f = go
  where
    go Empty       = Empty
    go (Cons x xs) = Cons (f x) (go xs)

but that's more restrictive than it has to be. This version of listMap can only produce a list of as from a list of as. But what if I want to convert a list of Doubles into a list of Ints by rounding each Double in the input list. listMap can't do this because Double and Int are different types. What we want is a function that can turn a List a into a List b, where a and b can be the same type, but they don't have to be. To produce a List b from a List a, we need a function that turns each list element of type a into a list element of type b, that is, a function of type a -> b. Well, that's easy:

listMap :: (a -> b) -> List a -> List b
listMap f = go
  where
    go Empty       = Empty
    go (Cons x xs) = Cons (f x) (go xs)

Only two letters changed in this implementation: the return type of the function f, and the type of the list elements in the list returned by listMap.

Exercise

The standard library contains an operator

(++) :: [a] -> [a] -> [a]

Given two lists, it returns their concatenation:

GHCi

>>> [1,2] ++ [3,4,5]
[1,2,3,4,5]

Implement a function listJoin that does the same for our custom list type List a:

listJoin :: List a -> List a -> List a

Solution

listJoin (Cons x xs) ys = Cons x (listJoin xs ys)
listJoin _           ys = ys

Exercise

The standard library contains a function foldr for lists:

foldr :: (a -> b -> b) -> b -> [a] -> b

The expression foldr f init xs takes the initial value init and traverses the list xs from right to left (starting at the end). For each list element x, it updates the value of init using the function f. In particular, it applies f to x and the current value of init to produce the new value of init. The result of foldr f init xs is the value of init obtained after traversing the entire list. For example,

GHCi

>>> foldr (+) 0 [1,2,3,4,5]
15

sums all the elements in the given list.

Implement a function listFoldR that does the same for our custom list type List a:

listFoldR :: (a -> b -> b) -> b -> List a -> b

Solution

listFoldR f = go
  where
    go accum (Cons x xs) = f x $ go accum xs
    go accum _           = accum

Exercise

The standard library contains a function

concat :: [[a]] -> [a]

Given a list of lists of as, it concatenates the elements in these lists into a single list:

GHCi

>>> concat [[1,2],[],[3],[4,5,6]]
[1,2,3,4,5,6]

Implement the same function for our custom list type List a:

listConcat :: List (List a) -> List a

Solution

Here's the easy implementation using the two functions listJoin and listFoldR from the previous two exercises. We start with the empty list and then add the given lists to the beginning of this list, using listJoin, from right to left.

listConcat = listFoldR listJoin Empty

A more explicit manual implementation that avoids listFoldR but still uses listJoin would be:

listConcat (Cons xs xss) = listJoin xs $ listConcat xss
listConcat _             = Empty

With a polymorphic list type, we can return to the solution of this exercise on implementing Merge Sort and fix the efficiency problem. Remember, the problem was that to split the list in half, we had to traverse the list once to determine its length, and then traverse the first half one more time to split the list into the two halves. Here is how we avoid this: Consider all the recursive calls that Merge Sort makes. At the bottom of the recursion, we have partitioned the list into one-element lists that are already sorted. That's why the recursion stops at these lists. As we back out of the recursion, we merge these lists in pairs to produce a single sorted list. Here's the trick then. We first convert our list of integers into a list of one-element lists. That's trivially a list of sorted lists because every one-element list is sorted. Now we repeatedly group the sorted lists in our list into pairs and merge them. This gives another list of sorted lists, but this one has half as many lists in it as before. By repeating this \(O(\lg n)\) times, we obtain a list containing a single sorted list. This one list is what Merge Sort needs to return. Here's what this looks like in Haskell:

mergeSort :: List Int -> List Int
mergeSort = mergeInPairs . mapList (\x -> Cons x Empty)

mergeInPairs :: List (List Int) -> List Int
mergeInPairs xss@(Cons _ (Cons _ _)) = mergeInPairs $ mergePairs xss
mergeInPairs (Cons xs _)             = xs
mergeInPairs _                       = Empty

mergePairs :: List (List Int) -> List (List Int)
mergePairs (Cons xs (Cons ys yss)) = Cons (merge xs ys) (mergePairs yss)
mergePairs xss                     = xss

So mergeSort applies mapList (\x -> Cons x Empty) to convert each element in its input list into a one-element list.¹ Then it merges them into a single list by calling mergeInPairs.

Given a list of lists, mergePairs gathers these lists in pairs and merges the lists in each pair. It collects the resulting merged lists in a list and returns this list. So it reduces a list of storted lists to a list of sorted lists with half as many lists in it. In that sense, it makes progress towards having a single sorted list.

mergeInPairs uses mergePairs recursively to produce a a single sorted list. If the input consists of at least two lists—it's a list of shape Cons _ (Cons _ _)—then we call mergePairs to reduce the number of lists to be merged and then call mergeInPairs recursively on this shorter list of lists to turn it into a single sorted list. Note that this first equation of mergeInPairs uses an at-pattern, which I introduced in the solution to this exercise. If the input consists of a list containing a single list, then this is our sorted list. We extract it and return it. The final branch is necessary to deal with the possibility that the input to mergeSort is empty. In this case, mergeSort also calls mergeInPairs on an empty list of lists, and the result of merging these lists is also the empty list.

Finally, mergePairs achieves its goal of merging the lists in its input list of lists in pairs as follows: If there are at least two lists in the input—the input is of shape Cons xs (Cons ys yss) with xs and ys being the first two lists and yss being the list of all remaining lists—then the first list in the result is merge xs ys. The remainder of the result is produced by merging the lists in yss recursively. If there is at most one list in the input, then there's nothing to merge; the result is just the list of lists given as input.

After we introduce Haskell's built-in list type properly in the next chapter, this function \x -> Cons x Empty, which is quite a mouthful, becomes simply \x -> [x] or, just, singleton. Using list comprehensions, we can in fact reduce the whole expression mapList (\x -> [x]) xs down to [[x] | x <- xs]. That's much more readable, but it is useful to work with our custom list type for a bit to understand how lists work under the hood. ↩