Coroutines
I know I'm supposed to teach you Haskell, and right now, we're talking about Scheme. We'll actually take one step farther away from Haskell by talking about Python next. In the next section, we'll finally look at how to translate everything in this section to Haskell. The reason why we talk about Python first is because we are going to talk about a concept that can be confusing in its own right, even without worrying about continuation. We'll look at coroutines. Once we understand how they work in Python (and a number of other languages), we figure out how to implement them in Scheme, by using continuations. And in the next section, we translate this implementation using continuations to Haskell.
Here's the problem we want to solve. We want to generate the list of all strings
over the alphabet ['0', '1']
of at most a given length, in lexicographically
sorted order. Here's the list of strings of length at most 3:
0
00
000
001
01
010
011
1
10
100
101
11
110
111
Note that there's an empty line at the beginning of this listing. That's the empty string, which is clearly also a string of length at most 3 over the given alphabet.
Now, we don't want to generate the entire list in one shot. That would take too
much memory. What we want is an iterator of these strings, so we can inspect
these strings one at a time and do something with each of them. The above list
was produced using the following Python code, which we already discussed in
class (only there, the alphabet was ['a', 'b']
):
def lexy(n):
yield ""
if n > 0:
for x in "01":
for xs in lexy(n - 1):
yield x + xs
for xs in lexy(3):
print(xs)
The for-loop iterates over the elements returned by the iterator lexy(3)
and
prints each element. lexy
is a coroutine. The first thing that lexy(n)
does
is to yield
the empty string. What yield
does is to suspend the execution of
lexy
and return to the caller, with the argument given to yield
as return
value. Thus, the first string the for-loop sees is the empty string, which it
prints. The way Python's for-loop requests the next element from the iterator is
to resume the coroutine. This doesn't call it from the beginning. Instead, the
computation picks up where yield
suspended it. In this case, lexy
continues
its computation from the if-statement. Since we have n = 3
in the current
invocation, this satisfies the condition that n > 0
, and lexy(3)
starts the
two nested for-loops. The outer loop iterates over all characters in the string
"01"
, that is, over the characters 0
and 1
. For each of these characters,
it iterates over all strings that are one character shorter. Those are the
strings produced by lexy(n - 1)
. For each such string, we yield the
concatenation x + xs
back to the caller. Once again, this only suspends the
computation. When the for-loop that prints the strings resumes lexy(3)
, this
picks up right after yield x + xs
, so we start another iteration unless this
was the last iteration. If it was the last iteration, then lexy(3)
returns,
and this is the signal to the for-loop that it should terminate.
An important point about this code is that most of the time during its
execution, we have four suspendes coroutines: Consider the state right before
printing the string 000
. How did we obtain this string? lexy(3)
must be in
the first iteration of its outer loop to produce the first 0
in this string.
It obtained the remainder of the string, 00
, from lexy(2)
. How did lexy(2)
produce this string? It must currently be in the first iteration of its outer
loop to produce the first character of 00
. It obtained the second 0
from
lexy(1)
. lexy(1)
constructs this string from 0
and the empty string
returned by lexy(0)
. lexy(0)
yields this empty string, that is, it
suspends itself in the process. lexy(1)
yields 0
and suspends itself in
the process. lexy(2)
yields 00
and suspends itself in the process.
lexy(3)
yields 000
and suspends itself in the process. We have four
invocations lexy(0)
, lexy(1)
, lexy(2)
, and lexy(3)
that are all
suspended. In order to obtain the next string, we resume lexy(3)
, which asks
for the next string from lexy(2)
by resuming it. This resumes lexy(1)
and
then lexy(0)
. lexy(0)
exits because it has nothing else to yield. lexy(1)
then moves on to the next iteration corresponding to the character 1
and makes
a new call to lexy(0)
This new call of lexy(0)
yields the empty string
again, and the whole game starts again, suspending lexy(0)
, then lexy(1)
,
then lexy(2)
, then lexy(3)
, and we print the string 001
.
I hope it's clear now how this code works. Scheme does not have coroutines, but we can build our own using continuations:
(define (lexy n yield)
(call/cc (lambda (continue) (yield (cons '() continue))))
(unless (zero? n)
(map (lambda (c)
(let ((result (call/cc (lambda (yield) (lexy (- n 1) yield)))))
(when (car result)
(yield (cons (cons c (car result)) (cdr result))))))
'(#\0 #\1)))
(cons #f #f))
(let ((result (call/cc (lambda (yield) (lexy 3 yield)))))
(when (car result)
(display (list->string (car result)))
(newline)
((cdr result))))
First, let's understand the behaviour we want lexy
to have here. It takes
two arguments, the length n
of the strings to be generated—that's the same
as for our Python version of lexy
—and a continuation yield
that we use to
yield values back to the caller. Every time lexy
calls yield
, it does so
with a pair of values. The first element of the pair is the yielded string or
#f
to indicate that the iteration is finished. The second element of the pair
is a continuation that can be invoked to resume the execution of lexy
.
An important point is that, due to some awkwardness of working with strings in
Scheme, lexy
does not actually return a string but a list of characters. We
convert this list to a string using list->string
. So that's what the main loop
in the last five lines of the program does: We use call/cc
to create a
continuation yield
we can pass to lexy
, and lexy
uses this continuation to
yield values. We call (lexy 3 yield)
. This returns a pair composed of the
first string to be printed and the continuation we can use to resume lexy
. In
Scheme, we can take a pair apart into its components using car
(Haskell's
fst
) and cdr
(Haskell's snd
). Moreover, in Scheme, any non-#f
value is
treated as true. Thus, the test (when (car result) ...)
tests whether the
first component of the result is not #f
. It's a list of characters in this
case. We convert this list to a string and print it, adding a newline. And then
we invoke the second element of result
, which is the continuation that resumes
lexy
. lexy
yields the next string together with a continuation to resume
lexy
. This makes call/cc
return a second time. Once again, we print the
yielded string and use the yielded continuation to resume lexy
. This continues
until the first component of the pair yielded by lexy
is #f
. At that point,
the code in the when
block is not executed, so the program exits.
Now let's look at lexy
itself. No matter the value of n
, lexy
should
always yield the empty string or, as a list of characters, an empty list. That's
the first thing it does. But it needs to bundle this with a continuation to
resume lexy
. Thus, it creates a continuation using call/cc
and invokes
yield
with a pair composed of the empty list and this continuation as
argument. Resuming lexy
will make this invocation of call/cc
return. In this
case, we don't expect any value to be returned by call/cc
. The next thing
lexy
does is to test whether n
is zero. If it is, then the (unless ...)
block is not executed. Thus, the only thing that lexy
does is to return the
pair (cons #f #f)
with both its components set to #f
. This makes the
call/cc
that was used to invoke lexy
return one last time. The caller tests
whether the first component of the returned pair is #f
and then doesn't try to
resume lexy
, which wouldn't work because the second part of this pair is not a
continuation in this case.
If n
is not zero, it is hopefully positive—we should technically check this.
In this case, we map an anonymous function over the list of characters
'(#\0 #\1)
. #\x
is how we write an individual character x
in Scheme. In
other words, that's the same as 'x'
in most other languages. For each
character c
in this list, the anonymous function starts by calling
(lexy (- n 1) yield)
, where yield
is a continuation created using call/cc
to yield values back to the current invocation of lexy
. The logic now is
essentially the same as for the main loop, only now we don't print values, but
we yield values back to the caller of the current invocation of lexy
.
Specifically, for every list xs
yielded by (lexy (- n 1) yield)
—that's the
car
component of result
—we prepend c
to this list using
(cons c (car result))
, and then we yield this list back to the caller.
An interesting point is that the continuation we pass to the caller to resume
the iteration is not a continuation to resume the current invocation of lexy
directly but simply the continuation returned by (lexy (- n 1) yield)
. This
makes our Scheme code behave slightly differently from the Python code, even
though the result is the same. Remember that at a time when we had four
suspended invocations of lexy
in Python, we resumed them one at a time "from
the outside in" and then suspended them again "from the inside out". Since
(lexy n yield)
simply yields the continuation returned by
(lexy (- n 1) yield)
, what the outer loop that prints the strings receives is
the continuation to directly resume the innermost invocation (lexy 0 yield)
.
By invoking this continuation, we jump directly into this innermost invocation
of lexy
without threading our way through (lexy 3 yield)
, (lexy 2 yield)
,
and (lexy 1 yield)
. (lexy 0 yield)
then returns, and this makes
(lexy 1 yield)
make a new call to (lexy 0 yield)
. This creates a new
continuation to resume this new call of (lexy 0 yield)
, which gets yielded
back to (lexy 1 yield)
, which yields this back to (lexy 2 yield)
, which
yields this back to (lexy 3 yield)
, which yields this back to the print loop.
Thus, by resuming (lexy 0 yield)
, we end up making all the other invocations
of lexy
resume their computations as well, due to the way the different
continuations are connected in this program.
In Haskell, we can do all this without using continuations, as we discuss below. In Python or Scheme, we really need coroutines or continuations to do this. If we don't use coroutines or continuations, we have two options: We can easily generate the entire list of strings and then iterate over it. But that does not meet our requirement that the strings should be generated one string at a time. Storing all strings at once uses too much space. The other option would be to implement an iterator object in the same way that a C++ or Java programmer would solve this problem, who don't have coroutines or continuations at their disposal. Implementing such an iterator object is somewhat painful because what this object has to represent is the position where we are in iterating over the characters in each of the string positions. In Python, each suspended coroutine tracks the state of each character, and it does so simply by means of its suspended for-loop. No manual bookkeeping is required. Admittedly, this example isn't too bad even in Java or C++ if you ignore the excessive amount of boilerplate code required to implement an iterator, especially in C++. However, for more complex nested iterations, things become more tedious quickly.
The reason why we can get away without coroutines or continuations in Haskell is
that if we do generate the entire list in one go, the list elements are still
generated one element at a time, thanks to Haskell's lazy evaluation. Here's
lexy
in Haskell:
lexy :: Int -> [String]
lexy 0 = [""]
lexy n = "" : concatMap (\x -> map (x :) (lexy (n - 1))) "01"
>>> :{
| lexy 0 = [""]
| lexy n = "" : concatMap (\x -> map (x :) (lexy (n - 1))) "01"
| :}
>>> mapM_ putStrLn (lexy 3)
0
00
000
001
01
010
011
1
10
100
101
11
110
111