stackless coroutine

coroutines are program instructions (eg: functions) whose execution can be suspended and resumed; coroutines can have a full stack (ie: stackful) or only one stack frame (ie: stackless); typically, stackless coroutines are much more performant than stackful coroutines (like 10-100x faster), and they are the topic in this article;

stateless and stateful functions

normally, the functions we write are stateless; they are subroutines with one entry point and one exit point; the function can return a value when it exits; in fact functions can have multiple exit points, but without loss of generality we only consider those with exactly one exit point, which is usually a return statement placed at the end the function; we say such functions are “stateless” because every time we call the function, it starts running from its entry point, and runs until its exit point; the function does not remember its execution state between multiple invocations;

on the other hand, stateful functions can suspend itself in the middle of its execution (optionally yielding a value), and they remember where they suspended so that next time it is invoked it can resume from the latest suspension point;

this pseudo c example compares stateless and stateful functions:

int stateless() {
    return 0;
    return 1;
    return 2;
    return 3;
}

stateless();    //  output 0;
stateless();    //  output 0;
stateless();    //  output 0;
stateless();    //  output 0;

int stateful() {
    yield 0;
    yield 1;
    yield 2;
    yield 3;
}

stateful();     //  output 0;
stateful();     //  output 1;
stateful();     //  output 2;
stateful();     //  output 3;

while function stateless() can only return the same value each time it is called, function stateful() can suspend itself at each yield statement and resume its execution from where it left off, producing four different values;

stateful functions in real c

there are no yield statements in c; why do we give an example in pseudo c? because we will implement this yield statement in real c; we begin with an implementation of stateful functions without yield, then define yield as a macro encapsulating the mechanisms we use in implementing stateful functions;

c has a mechanism where you can declare a static variable inside a function, and that variable maintains its value across multiple calls of this function; this is how we keep function state between multiple calls; another c mechanism is goto statement which allows free-style jumps within a function; this is used to resume from the latest suspension point;

int stateful() {
    static int state = 0;
    switch (state++) {
        case 0:
            goto point0;
        case 1:
            goto point1;
        case 2:
            goto point2;
        case 3:
            goto point3;
    }
    point0: return 0;
    point1: return 1;
    point2: return 2;
    point3: return 3;
}

stateful();     //  output 0;
stateful();     //  output 1;
stateful();     //  output 2;
stateful();     //  output 3;

stackless coroutines are stateful functions

are you aware of it? we have just implemented a stackless coroutine! this is because stackless coroutines are stateful functions; there are exceptions, for example, when the coroutine is not made of a function in high-level language, but those are rare cases;

our implementation discloses the internal structure of a stackless coroutine: there is a state variable maintained across multiple function calls, which is used to jump directly to the latest suspension point; it is like the function has been cut into several segments, and each invocation runs one segment;

implement `yield` statement

we do not want to write a switch statement for every stackless coroutine we define; it would become infeasible to maintain such a statement; so we will shape our program into a more ergonomic form with the same functionality;

the first thing we do is to assign a constant to each state instead of self-incrementing the state variable; we do not always increment the state every time we call it, because multiple states may be merged into one (eg: when the function has loops); plus, this makes it easy to turn these statements into macros; note that the state is set to the next one before each suspension point, so that it has the correct value when the function is resumed;

int stateful() {
    static int state = 0;
    switch (state) {
        case 0:
            goto point0;
        case 1:
            goto point1;
        case 2:
            goto point2;
        case 3:
            goto point3;
    }
    point0: state = 1; return 0;
    point1: state = 2; return 1;
    point2: state = 3; return 2;
    point3: state = 4; return 3;
}

the second thing we do is to turn goto labels into case labels, leveraging the device invented by tom duff; this allows us to eliminate goto statements from the function by inlining the goto statement body; the compiler might give a warning about non-void function does not return a value; this does not matter in this example, because we will not call a coroutine that is done;

int stateful() {
    static int state = 0;
    switch (state) {
        case 0:
            state = 1; return 0;
        case 1:
            state = 2; return 1;
        case 2:
            state = 3; return 2;
        case 3:
            state = 4; return 3;
    }
}

the third thing we do is to combine a case label with its previous return statement, to facilitate writing the macro; we also add an extra case label after the last return statement to make things consistent; the function body starts looking hacky and wacky; do you see macros in it?

int stateful() {
    static int state = 0; switch (state) { case 0:;
        state = 1; return 0; case 1:;
        state = 2; return 1; case 2:;
        state = 3; return 2; case 3:;
        state = 4; return 3; case 4:;
    }
}

the last thing we do is to turn those highly similar statements into macros:

#define begin static int state = 0; switch (state) { case 0:
#define yield(i, n) do { state = i; return n; case i:; } while (0)
#define end }

int stateful() {
    begin;
    yield(1, 0);
    yield(2, 1);
    yield(3, 2);
    yield(4, 3);
    end;
}

the greatness about this piece of code is that everyone who has seen it cannot help calling it “HACK!” yet it compiles and gives the correct output; look, we have implemented the yield statement (as a macro) and it works in a stateful function (ie: stackless coroutine); in fact, this works even when the function gets more complicated (eg: having loops); there is a great article by simon tatham giving more complicated examples;

to summarize, this is how we transformed pseudo yield statement into a real implementation:

function variables whose state needs to be kept are declared static;
convert yield to return;
set the state variable immediately before return, to the next state;
add a case label immediately after return;
call switch at function entry to jump to case label for current state;

it is worth noting that this construction works with nested stateful functions; however, if a stateful function is used multiple times, then its state needs to be re-initialized before each use; this is not possible when there is only one copy of state (eg: in a static variable); hence we need to allocate this state on heap in an activation frame for each function invocation;

activation frame

talks about coroutines often involve a concept named activation frame; this concept is a generalization of stack frame; while a stack frame (of a normal function) is allocated on the stack, an activation frame (of a coroutine) is often allocated on the heap, because this activation frame needs to be kept across coroutine yield points; a yield in stackless coroutine is like a return, and its stack frame will be discarded, but its activation frame on the heap can be preserved; more details can be found in this article; in our example we simply used a static variable as the activation frame to preserve coroutine state, but allocating the activation frame on heap is the norm in language implementations;

async/await pattern

in languages that employ stackless coroutines, we can often see the async/await pattern; the async keyword declares a function as asynchronous; the await keyword suspends an asynchronous function if necessary, waiting for it to finish before returning its result;

an async function is basically a stackless coroutine; there is a rule about async functions that often looks puzzling: the caller of an async function must also be an async function itself; that is, a sync function cannot call an async function, but an async function can call a sync function; people frequently ask why this has to be the case; if you remember stackless coroutines are basically stateful functions, then you understand async functions are more “advanced” than sync functions because they can keep their state; when a call chain consists solely of “advanced” async functions, suspend/resume work fine because each one of them can suspend/resume; having one sync function in the chain breaks the chain, losing the advantage of being able to suspend/resume, because that sync function cannot remember its state;

the await of an async function involves a concept named future, promise or delay; this is a proxy to something that is to be resolved later; an async function itself is an example of a future: the return (not yield) value of an async function is resolved only when it returns; in other cases, a future could be the response of a network request, the blob of a database read, etc.; when an async function awaits a future, it binds itself to the future and suspends itself; when the future is later resolved, this async function is resumed, with the resolved future value as return value of the await statement;