head and tail

So, we can build lists using (:) and (++). To inspect the parts of a list, we use pattern matching. Remember, [] and (:) are data constructors, so we can use them in patterns. For inspecting the head or tail of a list, we also have

head :: [a] -> a
head (x:_) = x
head []    = error "Cannot take head of the empty list"

tail :: [a] -> [a]
tail (_:xs) = xs
tail []     = error "Cannot take tail of the empty list"

head and tail are partial functions that will make your program crash if you try to apply them to the empty list. We discussed before how to make these functions safe by making them return their result wrapped in Maybe, and returning Nothing if we try to take the head or tail of the empty list. There is a replacement of the standard library that does exactly this, emphasizing safety above everything else. The standard library opted for convenience over safety, because if the return value is wrapped in Maybe, then we need to unwrap it using pattern matching to use it.

Exercise

We can use pattern matching or head and tail to take a list apart into its head and tail. It isn't uncommon to want to split a list xs into two lists, one containing the first \(n\) elements of xs, and the other containing the remaining elements. Or maybe we only want one of these two lists. The standard library has three functions that do this kind of stuff:

take :: Int -> [a] -> [a]
drop :: Int -> [a] -> [a]
splitAt :: Int -> [a] -> ([a], [a])

take n xs returns the list containing the first n elements of xs. If xs has fewer than n elements, then the result is the entire list:

GHCi

>>> take 3 [1,2,3,4,5]
[1,2,3]
>>> take 10 [1,2,3,4,5]
[1,2,3,4,5]

drop n xs returns the list obtained by removing the first n elements of xs. If xs has fewer than n elements, then the result is the empty list:

GHCi

>>> drop 3 [1,2,3,4,5]
[4,5]
>>> drop 10 [1,2,3,4,5]
[]

splitAt n xs returns both lists. The result is a pair where the first list contains the first n elements of xs, and the second list contains the rest. If xs has fewer than n elements, then the result is (xs, []). In other words, splitAt n xs behaves as if it were implemented as

splitAt n xs = (take n xs, drop n xs)

but this implementation would be inefficient because take n xs and drop n xs would both traverse the first n elements of xs. The actual implementation of splitAt n xs achieves the split using a single traversal of the first n elements of xs. Figure out how to do this. Implement splitAt without using take and drop, and then use splitAt to implement take and drop. (You did the splitAt part of this exercise before, for our custom IntList type. This exercise asks you for implementing the real splitAt function, for lists.)

Solution

splitAt :: Int -> [a] -> ([a], [a])
splitAt 0 xs     = ([], xs)
splitAt _ []     = ([], [])
splitAt n (x:xs) = (n : ls, rs)
  where
    (ls, rs) = splitAt (n - 1) xs

take, drop :: Int -> [a] -> [a]
take n = fst . splitAt n
drop n = snd . splitAt n

Exercise

head and tail have two cousins called last and init. While head returns the first element in a list, last returns the last element in the list. Similarly, tail returns the list with its first element removed, and init returns the list with its last element removed:

GHCi

>>> last [1,2,3,4,5]
5
>>> init [1,2,3,4,5]
[1,2,3,4]

Just as head and tail, last and init are undefined when the input list is empty.

Given how lists are implemented, head and tail are constant-time operations. The same is not true for last and init. They need to walk down the entire list to find the last element and either return it or remove it. Still, it is useful to have these functions for when we really need them.

Implement last and init.

Solution

last :: [a] -> a
last []     = error "Cannot take last element of an empty list"
last [x]    = x
last (_:xs) = last xs

init :: [a] -> [a]
init []     = error "Cannot take init of an empty list"
init [x]    = []
init (x:xs) = x : init xs

Exercise

Lists are great as long as we add, remove, and modify elements at the head of the list. For example, we can easily implement a stack based on lists:

newtype Stack a = Stack [a]

emptyStack :: Stack a
emptyStack = Stack []

isEmpty :: Stack a -> Bool
isEmpty (Stack xs) = null xs

push :: a -> Stack a -> Stack a
push x (Stack xs) = Stack (x:xs)

pop :: Stack a -> (a, Stack a)
pop (Stack [])     = error "Cannot pop from empty stack"
pop (Stack (x:xs)) = (x, Stack xs)

A queue is harder to implement as a list because the enqueue operations adds elements at one end of the queue, and the dequeue operation removes elements at the other end. There do exist purely functional queue implementations that support enqueue and dequeue in worst-case constant time, but they're too advanced to discuss here. After defining q' = enqueue x q, we still have access to the old version of q, and there's nothing that prevents us from throwing away q' and applying another operation to q instead. Algorithms people call such data structures persistent: we have access to all previous versions of a data structure. Immutability gives us persistence "for free". At the same time, persistent data structures aren't easy to implement. As a case in point, even achieving amortized constant time[^amortization] for enqueue and dequeue operations in the presence of persistence isn't easy. So we'll settle for something much less ambitious here. We want a queue that achieves amortized constant time for enqueue and dequeue operations as long as we do not use persistence: After producing a new queue q' using q' = enqueue x q or (x, q') = dequeue q, we never access q again.

Such a queue can be implemented using two lists:

data Queue a = Queue
    { front :: [a]
    , back  :: [a]
    }

You should think about this queue as being split into two parts at an arbitrary position. The elements before this split position are stored in front. The elements after this split position are stored in back. However, there's a twist: The elements in front are stored in the order in which they occur in the queue, while the elements in back are stored in reverse order.

Implement the following operations for this Queue a data type:

emptyQueue :: Queue a should return an empty queue.
isEmpty :: Queue a -> Bool should report whether the given queue is empty.
enqueue :: a -> Queue a -> Queue a should add a new element to the end of the queue. Specifically, enqueue x q returns a new queue that is the same as q but with x added to the end.
dequeue :: Queue a -> (a, Queue a) takes a queue q and returns a pair (x, q'). x is the element at the front of q. q' is the queue obtained by removing x from the front of q.

Argue that these operations take amortized constant time, that is, any sequence of \(n\) such operations takes \(O(n)\) time in total.

Solution

emptyQueue and isEmpty are fairly trivial:

emptyQueue :: Queue a
emptyQueue = Queue [] []

isEmpty :: Queue a -> Bool
isEmpty (Queue f b) = null f && null b

Given that back stores a list of elements at the back of q in reverse order, appending a new element to the end of q is as simple as adding this element to the front of back q:

enqueue :: a -> Queue a -> Queue a
enqueue x q = q { back = x : back q }

Where things get a little more interesting is dequeue. If front q is non-empty, then we can simply remove the head of front q:

dequeue (Queue (x:f) b) = (x, Queue f b)

But what if front q = []? In that case, we want to remove the last element of back q. We could use init and last to accomplish this, but that would take linear time every time we call dequeue. Here's a nifty little trick to avoid this cost.

Logically, the two queues Queue [] b and Queue (reverse b) [] represent the same sequence. For a list xs, reverse xs is the reversal of xs:

GHCi

>>> reverse [1,2,3,4,5]
[5,4,3,2,1]

Thus, we convert Queue [] b into Queue (reverse b) [], and then we can remove from the head of reverse b as before. The whole implementation of dequeue looks like this:

dequeue :: Queue a -> (a, Queue a)
dequeue (Queue []    []) = error "Cannot dequeue from an empty queue"
dequeue (Queue []    b ) = dequeue (Queue (reverse b) [])
dequeue (Queue (x:f) b ) = (x, Queue f b)

Whenever front q is empty, dequeue q takes linear time in the length of back q, because reverse b takes linear time:

reverse :: [a] -> [a]
reverse = go []
  where
    go rev []     = rev
    go rev (x:xs) = go (x:rev) xs

The amortized cost per enqueue and dequeue operation is constant though. Indeed, enqueue takes constant time to add its argument to the head of back q. Each element moves from back q to front q exactly once, whenever front q is empty. This takes linear time in the length of back q, that is, constant time per element in back q. Thus, we can charge the cost of moving elements from back q to front q to the enqueue operations that add these elements to back q, at constant cost per enqueue operation. With the moving of elements from back q to front q paid for, all that dequeue does now is remove elements from the front of front q, which takes constant time per dequeue operation.