Exploring a Maze
We represent a maze as a 2-d array of cells:
newtype Maze = Maze { cells :: Array Pos Cell }
type Pos = (Int, Int)
For each maze cell, we store whether it has a wall above it, to its left, below it or to its right:
data Cell = Cell
{ wallAbove :: Bool
, wallBelow :: Bool
, wallOnLeft :: Bool
, wallOnRight :: Bool
}
This representation is redundant because, for example, a cell has a wall above
it if and only if the cell above it has a cell below it—we're storing the same
information twice. Using a whole Bool—which uses one byte—to store whether
each wall is present or not is also wasteful. Clearly, one bit would be enough.
If we wanted a really fast implementation of our path search through a maze,
we'd use a more compact representation that uses one bit per wall. Here, we are
only interested in learning to work with the list monad though.
To make the problem simpler, we will assume that there are no cycles in the maze: there is no way to go from a cell back to itself without retracing our steps:

A \(6 \times 6\) maze and the unique path from the top-left cell of the maze to the bottom-right cell of the maze.
What we'd like to implement then is a function
findPath :: Maze -> Pos -> Pos -> Path
type Path = [Pos]
Given a maze and two positions start and end in the maze,
findPath maze start end finds a path from start to end. It returns this
path as a list of the maze positions visited by the path.
Let's see how we can describe a path from start to end: It needs to start at
position start. If start == end, then that's it; we're already where we want
to go. If start /= end, then we move from start to some neighbour of
start, and from there continue our path towards end. Here's how we express
this using the list monad:
findPath maze start end = head $ search start
where
((minCol, minRow), (maxCol, maxRow)) = bounds $ cells maze
search current
| current == end = return [current]
| otherwise = do
nbr <- neighbour current
pathFromNbr <- search nbr
return $ current : pathFromNbr
We use search start to find all paths we can find from start to end. To
extract the first path in this list, we use head.
Given the current position current, search current finds a path from
current to end. This is the exact translation of our description above: If
current == end, then there is exactly one path from current to end, and
this path contains only current. We return this path. Note that there are
two lists involved here: A path is a list of positions. Here, this path
contains only the position current, it is the list [current]. However,
search should produce the list of all paths from current to end. So
search current should return the list containing this path.
return [current] produces this list [[current]]. Our entire implementation
of findPath works with lists of lists of maze positions where the inner lists
are simply used to represent paths, and the outer lists are used to represent
the non-determinism of the search.
If current /= end, then we pick a neighbour nbr of current and find a path
from nbr to end. The path from current to end consists of current
followed by the path from nbr to end, that is, it is
current : pathFromNbr. That's the path we return.
Before discussing how neighbour is implemented, let's discuss the magic of the
list monad. The implementation of the otherwise branch reads like a simple
sequence of steps:
do
nbr <- neighbour current
pathFromNbr <- search nbr
return $ current : pathFromNbr
Pick a neighbour, search for a path from this neighbour, and then return the path with the neighbour prepended to it. How do we know which neighour to pick?
We don't. neighbour returns the list of all neighbours of current. Next
remember that the above do-block gets desugared to
neighbour current >>= \nbr -> search nbr
>>= \pathFromNbr -> return $ current : pathFromNbr
that return x = [x] for the list monad, so we get
neighbour current >>= \nbr -> search nbr
>>= \pathFromNbr -> [current : pathFromNbr]
and that the (>>=) operator for the list monad is concatMap, so we get
concatMap ( \nbr -> concatMap (\pathFromNbr -> [current : pathFromNbr])
(search nbr)
)
neighbour
The function
\nbr -> concatMap (\pathFromNbr -> [current : pathFromNbr])
(search nbr)
calls search nbr. It turns every path pathFromNbr in the list this returns
into the list [current : pathFromNbr], and then it concatenates these lists
(since we use concatMap, not map). What we obtain is the list of all paths
to end that start at current, visit nbr next, and then somehow continue to
end. The whole do-block concatMaps this function over the list of all
neighbours of current returned by neighbour
current. Thus, we obtain the list of all paths from current to end.
So, even though the do-block looks like a simple sequence of steps, it
actually performs a whole search, because all the steps after
neighbour current are executed once for each value nbr in the list returned
by neighbour current. That's the magic of the list monad.
Now let's look at the neighbour function:
neighbour current@(x, y) = do
(dx, dy, wallToCheck) <- [
(-1, 0, wallOnLeft), (1, 0, wallOnRight),
(0, -1, wallAbove), (0, 1, wallBelow)
]
let x' = x + dx
y' = y + dy
guard (x' >= minCol && x' <= maxCol)
guard (y' >= minRow && y' <= maxRow)
guard (not $ wallToCheck $ cells maze ! current)
return (x', y')
Note that this function must be defined as a local function of findPath
because it needs access to maze and to some of the local variables defined in
findPath. So our whole implementation looks like this:
findPath :: Maze -> Pos -> Pos -> Path
findPath maze start end = head $ search start
where
((minCol, minRow), (maxCol, maxRow)) = bounds $ cells maze
search current
| current == end = return [current]
| otherwise = do
nbr <- neighbour current
pathFromNbr <- search nbr
return $ current : pathFromNbr
neighbour current@(x, y) = do
(dx, dy, wallToCheck) <- [
(-1, 0, wallOnLeft), (1, 0, wallOnRight),
(0, -1, wallAbove), (0, 1, wallBelow)
]
let x' = x + dx
y' = y + dy
guard (x' >= minCol && x' <= maxCol)
guard (y' >= minRow && y' <= maxRow)
guard (not $ wallToCheck $ cells maze ! current)
return (x', y')
Here's how neighbour works. First, we extract a direction in which we can move
from the current position. We can move to the left, which corresponds to
decreasing x by 1 and leaving y unchanged. This is allowed only if
wallOnLeft is false for the current position. We can also move to the right,
which corresponds to increasing x by 1 and leaving y unchanged. This is
allowed only if wallOnRight is false for the current position. The other two
options are moving up or down. To get a single neighbour of current, we pick a
change in x, which we call dx, a change in y, which we call dy, and the
corresponding wall whose absence we need to check, which we call wallToCheck.
From dx and dy, we compute the coordinates of the chosen neighbour by adding
dx to x and dy to y. We are allowed to move to this neighbour if this
neighbour actually exists—x' >= minCol && x' <= maxCol and
y' >= minRow && y' <= maxRow—and there is no wall between current and this
neighbour—not $ wallToCheck $ cells maze ! current. We check these conditions
using guard, which we'll discuss shortly. If the neighbour position (x', y')
passes all these checks, then it is an actual neighbour of current, so we
return it.
So how does guard work? It is actually a function available in every monad
that is an instance of MonadFail. That's any monad that can express failure.
Maybe and Either e are two such monads, as is the list monad. Since we use
lists to return the list of all possible results of a non-deterministic
function, failure to produce any result is represented by returning the empty
list. This is similar to returning Nothing when a computation in the Maybe
monad fails to produce a result.
However, for the list monad, the effect of failure is less drastic than for the
Maybe monad. Consider the expression x >>= f. For the Maybe monad, if f
returns Nothing, then the whole computation returns Nothing—the whole
computation fails. If x >>= f is an expression that uses the list monad, then
x is a list, and we apply f to each element in this list. f may fail for
some of these elements of x and succeed for others, that is, f may produce
an empty list for some arguments from x, and a non-empty list for others. Now
remember that x >>= f = concatMap f x. Thus, the end result of the computation
is the concatenation of all the lists produced by f. If f does not fail for
all values in x, then the result of x >>= f is not empty, that is,
x >>= f does not fail.
The right way to think about this is that the computation branches on the
different values in x, as shown in this figure. You
should take x = f 5 = [3, 8, 11] in this example, and f = g. Applying g to
8 "fails", by returning the empty list. The applications of g to 3 and
11 produce two lists [7, 4] and [11, 3, 9]. The result of applying g to
x is the concatenation of the three lists [7, 4], [], and [11, 3, 9].
A branch that fails is truncated, cannot continue. A branch that does not fail
produces a value, and this value can be used in subsequent steps. In particular,
if we have a sequence of three steps x >>= f >>= g, then g is never called
in any branch where f fails. For the branches where f produces one or more
values, g is applied to each of these values. The list monad really just
implements a branching search, and branching search is how we visualize
non-determinism.
The behaviour we want for guard in the Maybe monad is that the whole
computation fails if the given condition is false, and it continues otherwise.
Thus, guard should return Nothing if the given condition is false, and
Just () if it is true:
guard :: Bool -> Maybe ()
guard True = Just ()
guard False = Nothing
For the list monad, we want that guard truncates the current branch of the
search if the given condition is not satisfied. We do this by returning the
empty list:
guard :: Bool -> [()]
guard True = [()]
guard False = []
In the implementation of neighbour, we branch on the four
possibilities of moving in each of the four directions using the expression
(dx, dy, wallToCheck) <- [
(-1, 0, wallOnLeft), (1, 0, wallOnRight),
(0, -1, wallAbove), (0, 1, wallBelow)
]
The guard expressions ensure that all the branches where we moved into an
invalid direction (we moved off the grid or jumped over a wall) are truncated.
The remaining branches that are not truncated are the ones that lead us to valid
neighbours of current, and search calls itself recursively on each of these
neighbours.
The implementation we have developed so far has a major problem: We have to be rather lucky to obtain an answer. The following figure illustrates the problem:

An example maze where the naïve search for a path from cell A to cell E does not terminate
I want you to focus on the leftmost path in the search tree. It starts with cell
A. Cell B being its only neighbour, the search continues to cell B. Cell B has
cells A and C as neighbours. The neighbours function explores the neighbours
of a cell in the order left neighbour, right neighbour, top neighbour, bottom
neighbour. Thus, the leftmost path continues to cell A again. From A, it moves
to A's only neighbour, B. From B, it has the choice to move to A or C. The
leftmost path continues on to cell A. This continues ad infinitum. Clearly, this
path never reaches E. What's worse, this path is never reported and completely
blocks the search from ever exploring any paths to the right of it in the search
tree, some of which would lead us to E. This is not difficult to see from the
implementation of search. The first branch of search applies only when
current == end. Clearly, the leftmost path never reaches this branch, because
it never reaches E. The only way the otherwise branch can return is if
neighbour current produces the empty list—in this case, this branch of the
search is truncated—or it reaches the return statement after search nbr
returns. However, search nbr never returns because the path we're exploring is
infinite.
We could fix this by searching only for paths up to a given maximum length
maxLength. When maxLength is less than the length of the path from start
to end, the search would not turn up any paths—it would fail. We could then
increase maxLength until we encounter the first success. It is not hard to see
that at that point, search produces exactly one path, and that this path does
not visit any maze cell more than once.
This strategy would work for small mazes, but it would fail for larger mazes. We
still consider paths that visit nodes more than once, only we bound their
lengths now, so we never run into an endless recursion. For every maze cell on
the current path that has at least two neighbours, we consider all of its
neighbours as possible extensions of this path, that is, we introduce a branch
in the search. The result is that even for the simple maze in the figure above,
the search explores a number of paths that is exponential in the current maximum
length maxLength. We really need a strategy that constructs only paths that
never visit any maze cell more than once.
To do so, we need to add another parameter to search. If we want to find a
path from current to end that visits nbr as the next position after
current, then clearly, we should not return to current after visiting nbr.
We want to make progress towards end, not visit cells we already visited.
Thus, we add the last position we visited before arriving at current as a
parameter to search and neighbour. neighbour excludes this last position
from the list of neighbours it proposes, so search only considers paths that
start at neighbours other than the last position visited before current:
findPath :: Maze -> Pos -> Pos -> Path
findPath maze start end = head $ search start start
where
((minCol, minRow), (maxCol, maxRow)) = bounds $ cells maze
search current last
| current == end = return [current]
| otherwise = do
nbr <- neighbour current last
pathFromNbr <- search nbr current
return $ current : pathFromNbr
neighbour current@(x, y) last = do
(dx, dy, wallToCheck) <- [
(-1, 0, wallOnLeft), (1, 0, wallOnRight),
(0, -1, wallAbove), (0, 1, wallBelow)
]
let x' = x + dx
y' = y + dy
nbr = (x' y')
guard (x' >= minCol && x' <= maxCol)
guard (y' >= minRow && y' <= maxRow)
guard (not $ wallToCheck $ cells maze ! current)
guard (nbr /= last)
return nbr
And with this change, findPath visits every grid cell at most once (because we
assumed that the maze contains no cycles; otherwise, we'd need a more
sophisticated strategy), so this implementation takes at most linear time in the
size of the maze. We don't quite have the tools to try this function in action,
because constructing a non-trivial maze by hand is no fun. We're close though.
What's missing is the IO monad that allows us to read some randomly generated
maze files and print the path we produce on screen. At the end of this book, we
will discuss three capstone projects that use many of the things we have learned
in this book to solve three interesting problems: navigating a maze, \(k\)-means
clustering of a bunch of points, and parsing a textual representation of
phylogenetic trees commonly used in bioinformatics. The project on
navigating a maze will build on the code we have developed here
and will complete it by adding the necessary I/O code.
Exercise
Remember this exercise. You were asked to implement a function
filterUps :: Ord a => [a] -> [a]
that keeps only those elements in the input list that are greater than the elements immediately before them:
>>> filterUps []
[]
>>> filterUps [1]
[]
>>> filterUps [1,2,4,3,7,10,5,4,5]
[2,4,7,10,5]
Implement this function using the list monad. (You'll still need zip.)
Solution
filterUps :: Ord a => [a] -> [a]
filterUps [] = []
filterUps xs = do
(x, y) <- zip xs (tail xs)
guard (x < y)
return y
We take every pair (x, y) in the list zip xs (tail xs). As we
discussed in the solution to
this exercise,
this is the list of all pairs of consecutive elements in xs. For each
such pair, we check whether x < y. If so, we return the second
component, y, of the pair, which adds it to the list produced by the
function.