Coroutines

I know I'm supposed to teach you Haskell, and right now, we're talking about Scheme. We'll actually take one step farther away from Haskell by talking about Python next. In the next section, we'll finally look at how to translate everything in this section to Haskell. The reason why we talk about Python first is because we are going to talk about a concept that can be confusing in its own right, even without worrying about continuation. We'll look at coroutines. Once we understand how they work in Python (and a number of other languages), we figure out how to implement them in Scheme, by using continuations. And in the next section, we translate this implementation using continuations to Haskell.

Here's the problem we want to solve. We want to generate the list of all strings over the alphabet ['0', '1'] of at most a given length, in lexicographically sorted order. Here's the list of strings of length at most 3:

Note that there's an empty line at the beginning of this listing. That's the empty string, which is clearly also a string of length at most 3 over the given alphabet.

Now, we don't want to generate the entire list in one shot. That would take too much memory. What we want is an iterator of these strings, so we can inspect these strings one at a time and do something with each of them. The above list was produced using the following Python code, which we already discussed in class (only there, the alphabet was ['a', 'b']):

def lexy(n):
    yield ""
    if n > 0:
        for x in "01":
            for xs in lexy(n - 1):
                yield x + xs

for xs in lexy(3):
    print(xs)

The for-loop iterates over the elements returned by the iterator lexy(3) and prints each element. lexy is a coroutine. The first thing that lexy(n) does is to yield the empty string. What yield does is to suspend the execution of lexy and return to the caller, with the argument given to yield as return value. Thus, the first string the for-loop sees is the empty string, which it prints. The way Python's for-loop requests the next element from the iterator is to resume the coroutine. This doesn't call it from the beginning. Instead, the computation picks up where yield suspended it. In this case, lexy continues its computation from the if-statement. Since we have n = 3 in the current invocation, this satisfies the condition that n > 0, and lexy(3) starts the two nested for-loops. The outer loop iterates over all characters in the string "01", that is, over the characters 0 and 1. For each of these characters, it iterates over all strings that are one character shorter. Those are the strings produced by lexy(n - 1). For each such string, we yield the concatenation x + xs back to the caller. Once again, this only suspends the computation. When the for-loop that prints the strings resumes lexy(3), this picks up right after yield x + xs, so we start another iteration unless this was the last iteration. If it was the last iteration, then lexy(3) returns, and this is the signal to the for-loop that it should terminate.

An important point about this code is that most of the time during its execution, we have four suspendes coroutines: Consider the state right before printing the string 000. How did we obtain this string? lexy(3) must be in the first iteration of its outer loop to produce the first 0 in this string. It obtained the remainder of the string, 00, from lexy(2). How did lexy(2) produce this string? It must currently be in the first iteration of its outer loop to produce the first character of 00. It obtained the second 0 from lexy(1). lexy(1) constructs this string from 0 and the empty string returned by lexy(0). lexy(0) yields this empty string, that is, it suspends itself in the process. lexy(1) yields 0 and suspends itself in the process. lexy(2) yields 00 and suspends itself in the process. lexy(3) yields 000 and suspends itself in the process. We have four invocations lexy(0), lexy(1), lexy(2), and lexy(3) that are all suspended. In order to obtain the next string, we resume lexy(3), which asks for the next string from lexy(2) by resuming it. This resumes lexy(1) and then lexy(0). lexy(0) exits because it has nothing else to yield. lexy(1) then moves on to the next iteration corresponding to the character 1 and makes a new call to lexy(0) This new call of lexy(0) yields the empty string again, and the whole game starts again, suspending lexy(0), then lexy(1), then lexy(2), then lexy(3), and we print the string 001.

I hope it's clear now how this code works. Scheme does not have coroutines, but we can build our own using continuations:

(define (lexy n yield)
  (call/cc (lambda (continue) (yield (cons '() continue))))
  (unless (zero? n)
    (map (lambda (c)
           (let ((result (call/cc (lambda (yield) (lexy (- n 1) yield)))))
             (when (car result)
               (yield (cons (cons c (car result)) (cdr result))))))
         '(#\0 #\1)))
  (cons #f #f))


(let ((result (call/cc (lambda (yield) (lexy 3 yield)))))
  (when (car result)
    (display (list->string (car result)))
    (newline)
    ((cdr result))))

First, let's understand the behaviour we want lexy to have here. It takes two arguments, the length n of the strings to be generated—that's the same as for our Python version of lexy—and a continuation yield that we use to yield values back to the caller. Every time lexy calls yield, it does so with a pair of values. The first element of the pair is the yielded string or #f to indicate that the iteration is finished. The second element of the pair is a continuation that can be invoked to resume the execution of lexy.

An important point is that, due to some awkwardness of working with strings in Scheme, lexy does not actually return a string but a list of characters. We convert this list to a string using list->string. So that's what the main loop in the last five lines of the program does: We use call/cc to create a continuation yield we can pass to lexy, and lexy uses this continuation to yield values. We call (lexy 3 yield). This returns a pair composed of the first string to be printed and the continuation we can use to resume lexy. In Scheme, we can take a pair apart into its components using car (Haskell's fst) and cdr (Haskell's snd). Moreover, in Scheme, any non-#f value is treated as true. Thus, the test (when (car result) ...) tests whether the first component of the result is not #f. It's a list of characters in this case. We convert this list to a string and print it, adding a newline. And then we invoke the second element of result, which is the continuation that resumes lexy. lexy yields the next string together with a continuation to resume lexy. This makes call/cc return a second time. Once again, we print the yielded string and use the yielded continuation to resume lexy. This continues until the first component of the pair yielded by lexy is #f. At that point, the code in the when block is not executed, so the program exits.

Now let's look at lexy itself. No matter the value of n, lexy should always yield the empty string or, as a list of characters, an empty list. That's the first thing it does. But it needs to bundle this with a continuation to resume lexy. Thus, it creates a continuation using call/cc and invokes yield with a pair composed of the empty list and this continuation as argument. Resuming lexy will make this invocation of call/cc return. In this case, we don't expect any value to be returned by call/cc. The next thing lexy does is to test whether n is zero. If it is, then the (unless ...) block is not executed. Thus, the only thing that lexy does is to return the pair (cons #f #f) with both its components set to #f. This makes the call/cc that was used to invoke lexy return one last time. The caller tests whether the first component of the returned pair is #f and then doesn't try to resume lexy, which wouldn't work because the second part of this pair is not a continuation in this case.

If n is not zero, it is hopefully positive—we should technically check this. In this case, we map an anonymous function over the list of characters '(#\0 #\1). #\x is how we write an individual character x in Scheme. In other words, that's the same as 'x' in most other languages. For each character c in this list, the anonymous function starts by calling (lexy (- n 1) yield), where yield is a continuation created using call/cc to yield values back to the current invocation of lexy. The logic now is essentially the same as for the main loop, only now we don't print values, but we yield values back to the caller of the current invocation of lexy. Specifically, for every list xs yielded by (lexy (- n 1) yield)—that's the car component of result—we prepend c to this list using (cons c (car result)), and then we yield this list back to the caller.

An interesting point is that the continuation we pass to the caller to resume the iteration is not a continuation to resume the current invocation of lexy directly but simply the continuation returned by (lexy (- n 1) yield). This makes our Scheme code behave slightly differently from the Python code, even though the result is the same. Remember that at a time when we had four suspended invocations of lexy in Python, we resumed them one at a time "from the outside in" and then suspended them again "from the inside out". Since (lexy n yield) simply yields the continuation returned by (lexy (- n 1) yield), what the outer loop that prints the strings receives is the continuation to directly resume the innermost invocation (lexy 0 yield). By invoking this continuation, we jump directly into this innermost invocation of lexy without threading our way through (lexy 3 yield), (lexy 2 yield), and (lexy 1 yield). (lexy 0 yield) then returns, and this makes (lexy 1 yield) make a new call to (lexy 0 yield). This creates a new continuation to resume this new call of (lexy 0 yield), which gets yielded back to (lexy 1 yield), which yields this back to (lexy 2 yield), which yields this back to (lexy 3 yield), which yields this back to the print loop. Thus, by resuming (lexy 0 yield), we end up making all the other invocations of lexy resume their computations as well, due to the way the different continuations are connected in this program.

In Haskell, we can do all this without using continuations, as we discuss below. In Python or Scheme, we really need coroutines or continuations to do this. If we don't use coroutines or continuations, we have two options: We can easily generate the entire list of strings and then iterate over it. But that does not meet our requirement that the strings should be generated one string at a time. Storing all strings at once uses too much space. The other option would be to implement an iterator object in the same way that a C++ or Java programmer would solve this problem, who don't have coroutines or continuations at their disposal. Implementing such an iterator object is somewhat painful because what this object has to represent is the position where we are in iterating over the characters in each of the string positions. In Python, each suspended coroutine tracks the state of each character, and it does so simply by means of its suspended for-loop. No manual bookkeeping is required. Admittedly, this example isn't too bad even in Java or C++ if you ignore the excessive amount of boilerplate code required to implement an iterator, especially in C++. However, for more complex nested iterations, things become more tedious quickly.

The reason why we can get away without coroutines or continuations in Haskell is that if we do generate the entire list in one go, the list elements are still generated one element at a time, thanks to Haskell's lazy evaluation. Here's lexy in Haskell:

lexy :: Int -> [String]
lexy 0 = [""]
lexy n = "" : concatMap (\x -> map (x :) (lexy (n - 1))) "01"

GHCi

>>> :{
  | lexy 0 = [""]
  | lexy n = "" : concatMap (\x -> map (x :) (lexy (n - 1))) "01"
  | :}
>>> mapM_ putStrLn (lexy 3)

0
00
000
001
01
010
011
1
10
100
101
11
110
111