foldl and foldr
You may have heard of the MapReduce framework for parallel programming before. The idea is simple: Many computations on sequences of elements can be expressed in terms of two primitives:
-
Map: Apply a function to every element in the sequence. The function applications to different elements in the sequence are independent of each other and thus can be carried out in parallel on multiple cores or even multiple computers.
-
Reduce: Take the values in a sequence and combine them into a single value. For most combination operators—all those that are associative—this can also be done efficiently in parallel.
By providing a framework that offers efficient parallel implementations of the map and reduce primitives, we put parallel computing at the finger tips of many programmers, whereas writing customized parallel code that is correct and efficient requires significant skill.
In Haskell, the map
function discussed before provides the "map" portion of
the MapReduce framework. The "reduce" portion is provided by the functions
foldl
and foldr
.
Both foldl
and foldr
take a combinator function f
, an initial accumulator
value init
, and a list of elements xs
. Initially, the accumulator has value
init
. Then foldl
and foldr
traverse the list xs
. For every list element
x
, we apply the combinator function f
to x
and the current accumulator
value. The result is the new accumulator value. Whatever the accumulator value
is after we have traversed the entire list is the result that foldl
or foldr
returns.
The difference between the two functions is the direction in which they traverse
the list: foldl
traverse the list from left to right. foldr
traverse the
list from right to left. For some operators, the direction of traversal is
irrelevant, at least as far as the result goes, but not as far as efficiency
goes:
>>> foldl (+) 0 [1,2,3,4,5]
15
>>> foldr (+) 0 [1,2,3,4,5]
15
For others, the direction of traversal is important. For example, we can reverse a list like this:
>>> foldl (flip (:)) [] [1,2,3,4,5]
[5,4,3,2,1]
For foldl
, the combinator function takes the current accumulator value as its
first argument and the list element as the second argument. Thus, in this
example, the first argument of the combinator function should be a list, and the
second argument should be a list element. The order of arguments of (:)
is the
opposite. flip
is a useful helper function that reverses the order of the
first two arguments of a function:
flip :: (a -> b -> c) -> (b -> a -> c)
flip f y x = f x y
Thus, foldl (flip (:)) [] [1,2,3,4,5]
prepends the elements 1 through 5 to the
empty list, which has the effect of reversing this list. The prelude includes
the function reverse :: [a] -> [a]
to reverse a list, and this function is
implemented simply as
reverse :: [a] -> [a]
reverse = foldl (flip (:)) []
foldr
traverse the list from right to left. The combinator function expects
the current list element as the first argument and the current accumulator value
as the second argument. In other words, the argument order is reversed compared
to foldl
. Intuitively, this makes sense if you think about foldl
as putting
the initial accumulator value on the left and then folding the list elements
onto it from left to right; foldr
places the initial accumulator value on the
right and then folds the list elements onto it from right to left:
If we tried to implement reverse
using foldr
, then we wouldn't have to use
flip (:)
as the combinator function. Simply (:)
suffices. But then we'd just
get the original list back because we'd be adding the list elements to the empty
list, from right to left:
>>> foldr (:) [] [1,2,3,4,5]
[1,2,3,4,5]
Here are the implementations of foldl
and foldr
:
foldl :: (a -> b -> a) -> a -> [b] -> a
foldl f = go
where
go accum (x:xs) = go (f accum x) xs
go accum _ = accum
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f = go
where
go accum (x:xs) = f x $ go accum xs
go accum _ = accum
Note that the implementation of foldl
is tail-recursive, while the
implementation of foldr
is not. Since foldr
needs to traverse the list to
find the end and then backtrack to accumulate the list elements in reverse, it
really has no choice. This has important consequences for efficiency. I think
that discussing them in detail here goes a bit too far. I'll just give a brief
summary:
-
When the fold produces a number of some other constant-size result of accumulating the list elements, then a left fold is more efficient because it avoids building up stack frames as the non-tail-recursive implementation of
foldr
would. However,foldl
is not as efficient as you would think. The problem is that due to lazy evaluation,foldl
still builds up an expression of linear size that gets reduced to a constant-size value only once we ask for its result. As a result,foldl
is used rarely. Most of the time, when a left fold is beneficial, what we really want is the strict version offoldl
, calledfoldl'
.foldl'
behaves exactly likefoldl
, but it fully evaluates each application of the combinator functionf
before proceeding to the next element instead of building up an expression that is reduced only at the end. -
Right folds are often used to produce lists from lists. If each list element in the output list can be produced from the current element of the input list and the consumer of the output list does not hold on to the whole list and simply consumes the list elements as they arrive, then lazy evaluation ensures that
foldr
still runs in constant space because it produces the output list elements on demand as they are requested, and each list element is discarded immediately after being consumed. Under certain conditions, the compiler can completely optimize the list passed fromfoldr
to the consumer away and instead turn this into a tight loop that invokes the consumer on each value produced byfoldr
.
I should also mention that there exist similar functions foldl1
and foldr1
.
The 1 means that they expect their input lists to be non-empty. These functions
panic on empty lists. For a non-empty list, they take the first or last list
element, respectively, as the initial accumulator value, and then they "add" the
other list elements to the accumulator in the same fashion that foldl
and
foldr
do. Note that the accumulator can have a different type than the list
elements for foldl
and foldr
. For foldl1
and foldr1
, it necessarily has
the same type as the list elements because we take the first or last list
element as the initial accumulator value. Here are the implementations of
foldl1
and foldr1
:
foldl1 :: (a -> a -> a) -> [a] -> a
foldl1 _ [] = error "Prelude.foldl1: empty list"
foldl1 f (x:xs) = foldl f x xs
foldr1 :: (a -> a -> a) -> [a] -> a
foldr1 f = go
where
go [] = error "Prelude.foldr1: empty list"
go [x] = x
go (x:xs) = f x $ go xs
Since map
and foldl
/foldr
are very useful for expressing a lot of
computations, they have been generalized to containers beyond lists. We will
have a look at them in the chapter on
Functors and Folds. A functor is the abstraction of
a container type that supports application of a function to each of its
elements, in the same way that map
applies a function to every list element.
In particular, the list type is a functor. A foldable container is one that
supports foldl
and foldr
operations. Indeed, in today's standard library,
foldl
and foldr
do not have the types I listed above at all. They are
defined for arbitrary foldable containers:
>>> :t foldl
foldl :: Foldable t => (b -> a -> b) -> b -> t a -> b
>>> :t foldr
foldr :: Foldable t => (a -> b -> b) -> b -> t a -> b