云风coroutine库源码分析

项目介绍-coroutine

云凤写过一个简易协程库，介绍如下：

It’s an asymmetric coroutine library (like lua).

You can use coroutine_open to open a schedule first, and then create coroutine in that schedule.

You should call coroutine_resume in the thread that you call coroutine_open, and you can’t call it in a coroutine in the same schedule.

Coroutines in the same schedule share the stack , so you can create many coroutines without worry about memory.

But switching context will copy the stack the coroutine used.

整个项目就只有一个coroutine.h，一个coroutine.c和main.c，提供了几个简单的API：

struct schedule * coroutine_open(void);
void coroutine_close(struct schedule *);int coroutine_new(struct schedule *, coroutine_func, void *ud);
void coroutine_resume(struct schedule *, int id);
int coroutine_status(struct schedule *, int id);
int coroutine_running(struct schedule *);
void coroutine_yield(struct schedule *);

下面将先分析coroutine库用到的底层context系统调用，然后分析coroutine的设计与实现。

POSIX ucontext API

定义

众所周知，一个线程的上下文包含线程栈、寄存器和阻塞的信号列表。

ucontext_t结构体可以表示一个线程的上下文信息：

ucontext_t *uc_link      context to assume when this one returns
sigset_t uc_sigmask      signals being blocked
stack_t uc_stack         stack area
mcontext_t uc_mcontext   saved registers

我们可以将线程上下文保存到ucontext_t中，必要时用ucontext_t中信息替换当前上下文，这样就可以实现用户级线程的保存和切换了。

POSIX定义了4个操纵ucontext_t的API，来实现上下文切换等功能：

int getcontext(ucontext_t *ucp);int setcontext(const ucontext_t *ucp);void makecontext(ucontext_t *ucp, void (*func)(), int argc, ...);int swapcontext(ucontext_t *oucp, const ucontext_t *ucp);

man手册描述如下：

The getcontext() function saves the current thread’s execution context in the structure pointed to by ucp. This saved context may then later be restored by calling setcontext().

保存当前线程的上下文到ucp中。

The setcontext() function makes a previously saved thread context the current thread context, i.e., the current context is lost and setcontext() does not return. Instead, execution continues in the context specified by ucp, which must have been previously initialized by a call to getcontext(), makecontext(3), or by being passed as an argument to a signal handler (see sigaction(2)).

If ucp was initialized by getcontext(), then execution continues as if the original getcontext() call had just returned (again).

If ucp was initialized by makecontext(3), execution continues with the invocation of the function specified to makecontext(3). When that function returns, ucp->uc_link determines what happens next: if ucp->uc_link is NULL, the process exits; otherwise, setcontext(ucp->uc_link) is implicitly invoked.

If ucp was initialized by the invocation of a signal handler, execution continues at the point the thread was interrupted by the signal.

使用ucp中的上下文替换当前线程的上下文，执行流会发生跳转，这意味着此函数永远不会返回。
如果ucp经getcontext()设置，那么就好像程序从getcontext()处返回继续执行。
如果ucp经makecontext()设置，那么会调用makecontext()设置的func，当func执行完后替换上下文为ucp->uc_link指定的上下文。

The makecontext() function modifies the user thread context pointed to by ucp, which must have previously been initialized by a call to getcontext(3) and had a stack allocated for it. The context is modified so that it will continue execution by invoking func() with the arguments (of type int) provided. The argc argument must be equal to the number of additional arguments provided to makecontext() and also equal to the number of arguments to func(), or else the behavior is undefined.

The ucp->uc_link argument must be initialized before calling makecontext() and determines the action to take when func() returns: if equal to NULL, the process exits; otherwise, setcontext(ucp->uc_link) is implicitly invoked.

修改ucp的内容，使得下次切换到当前上下文时会调用func()函数。
argc参数要等于后面的可变参数的参数数量，这些可变参数（必须是int类型）会被传递给func()。
在调用makecontext()之前，必须确保
1. ucp经getcontext()初始化。
2. ucp->uc_stack指向一个关联的栈空间。
3. ucp->uc_lick已被设置，当func()执行完后就会将当前线程上下文切换成其指向的上下文。

The swapcontext() function saves the current thread context in *oucp and makes *ucp the currently active context.

swapcontext()会将当前线程上下文保存在oucp，并用ucp替换当前上下文。

用例

#include <iostream>
#include <ucontext.h>
using namespace std;// ctx[0] <-> main
// ctx[1] <-> f1
// ctx[2] <-> f2
ucontext_t ctx[3];using ucfunc_t = void(*)(void);static void f1(int p) {printf("start f1 of %d\n", p);swapcontext(&ctx[1], &ctx[2]);puts("finish f1");
}static void f2(int p) {printf("start f2 of %d\n", p);swapcontext(&ctx[2], &ctx[1]);puts("finish f2");
}int main() {cout << "main begin" << endl;char stk1[8192];char stk2[8192];getcontext(&ctx[1]);ctx[1].uc_link = &ctx[0];ctx[1].uc_stack.ss_sp = stk1;ctx[1].uc_stack.ss_size = sizeof stk1;makecontext(&ctx[1], (ucfunc_t)f1, 1, 1);getcontext(&ctx[2]);ctx[2].uc_link = &ctx[1];ctx[2].uc_stack.ss_sp = stk2;ctx[2].uc_stack.ss_size = sizeof stk2;makecontext(&ctx[2], (ucfunc_t)f2, 1, 2);// 执行流：main.1 -> f2.1 -> f1.1 -> f2.2 -> f1.2 -> main.2swapcontext(&ctx[0], &ctx[2]);cout << "main end" << endl;
}

执行结果如下：

main begin
start f2 of 2
start f1 of 1
finish f2
finish f1
main end

注意到makecontext的func参数只接受void(*)(void)类型，而我们要想提供一个有参数的func就只能强制类型转换，但这是安全的，因为makecontext不关心func()的类型，重要的是func()的地址。

那么为什么传递给func的参数只能是int类型呢？

Stack Overflow回答

简单来说：这是一个设计缺陷，因为C语言的可变长参数无法预知参数类型，故只能假定为int类型。这很不方便，例如如果要传一个64位的指针就必须用两个int参数组合。更好的设计是将func定义为void(*)(void*)类型，然后传一个把参数打包成一个结构体，传这个结构体的地址作为参数就好了。

But it didn’t.

coroutine库的设计与实现

什么是协程

什么是协程？ - zhihu

重点是，协程和线程无关，可以看到coroutine库的示例代码也是单线程，如果改成多线程一样要加锁。

使用

main.c：

struct args {int n;
};static void foo(struct schedule *S, void *ud) {struct args *arg = ud;int start = arg->n;int i;for (i = 0; i < 5; i++) {printf("coroutine %d : %d\n", coroutine_running(S), start + i);coroutine_yield(S);}
}static void test(struct schedule *S) {struct args arg1 = {0};struct args arg2 = {100};int co1 = coroutine_new(S, foo, &arg1);int co2 = coroutine_new(S, foo, &arg2);printf("main start\n");while (coroutine_status(S, co1) && coroutine_status(S, co2)) {coroutine_resume(S, co1);coroutine_resume(S, co2);}printf("main end\n");
}int main() {struct schedule *S = coroutine_open();test(S);coroutine_close(S);return 0;
}

首先调用coroutine_open()创建一个schedule，一个schedule是一个执行上下文，所有要执行的协程都要在schedule的栈环境下运行。
在test()中，调用coroutine_new()创建了两个coroutine对象，它们各自代表一个可调度的协程实体。
在while循环中，只要两个协程都未结束（coroutine_status()仅在协程终止时返回0），就依次调用coroutine_resume()唤醒它们。
协程函数体foo()内部，在有限循环中执行printf任务后调用yield()，让出执行流。

整个程序的执行流在test()和co1::foo()、co2::foo()之间来回跳转，这即是协程。

设计与实现

首先梳理一下概念，coroutine库实现的是非对称协程，简单来说协程调用yield只能返回到它的调用方。

对称和非对称协程 - zhihu

另外coroutine库使用了共享栈，即调用resume()时当前执行的协程的栈会被拷贝到共享栈中，而协程中调用yield()时，其协程栈就被从共享栈中拷贝到它自己的私有结构体中。

共享栈模式 - zhihu

在coroutine库中，只用到了两种数据结构schedule和coroutine：

struct schedule {char stack[STACK_SIZE]; // 共享栈ucontext_t main; // 主程序上下文int nco; // 协程数量int cap; // 协程容量int running; // 运行状态，-1表示未运行，其他表示正运行的协程idstruct coroutine **co; // 长为 S->cap 的 struct coroutine*指针数组
};struct coroutine {coroutine_func func; // 协程关联的函数void *ud; // 要传递给函数的参数ucontext_t ctx; // 协程上下文struct schedule *sch; // 调度时环境ptrdiff_t cap; // 协程栈容量ptrdiff_t size; // 协程栈已使用大小int status; // 运行状态char *stack; // 协程栈
};

schedule和coroutine是一对多的关系，一个schedule对象表示一个执行上下文，而一个coroutine对象表示一个协程。

一个coroutine有4种状态：

#define COROUTINE_DEAD 0
#define COROUTINE_READY 1
#define COROUTINE_RUNNING 2
#define COROUTINE_SUSPEND 3

顾名思义，一个协程刚创建时的状态为COROUTINE_READY，被销毁后为COROUTINE_DEAD，运行时为COROUTINE_RUNNING，挂起时为COROUTINE_SUSPEND。

首先我们来看coroutine_open()和coroutine_close()，这两个函数所做的仅仅就是分配和销毁一个struct schedule对象。

struct schedule *coroutine_open(void) {struct schedule *S = malloc(sizeof(*S));S->nco = 0;S->cap = DEFAULT_COROUTINE;S->running = -1;S->co = malloc(sizeof(struct coroutine *) * S->cap);memset(S->co, 0, sizeof(struct coroutine *) * S->cap);return S;
}

coroutine_open()会返回一个已初始化的schedule对象，它可以承载的coroutine数量为DEFAULT_COROUTINE即16。

void _co_delete(struct coroutine *co) {free(co->stack);free(co);
}void coroutine_close(struct schedule *S) {int i;for (i = 0; i < S->cap; i++) {struct coroutine *co = S->co[i];if (co) {_co_delete(co);}}free(S->co);S->co = NULL;free(S);
}

coroutine_close()则会用delete释放掉所有coroutine对象，然后释放掉schedule对象。

**coroutine_new()**会创建一个新的coroutine对象并且把它注册到schedule中，如果schedule没有足够的容量还要先进行扩容：

int coroutine_new(struct schedule *S, coroutine_func func, void *ud) {struct coroutine *co = _co_new(S, func, ud);if (S->nco >= S->cap) { // S->co空间不足时2倍扩容int id = S->cap;S->co = realloc(S->co, S->cap * 2 * sizeof(struct coroutine *));memset(S->co + S->cap, 0, sizeof(struct coroutine *) * S->cap);S->co[S->cap] = co;S->cap *= 2;++S->nco;return id;} else {int i;for (i = 0; i < S->cap; i++) {int id = (i + S->nco) % S->cap; // trick: 优先使用(nco, cap)区间的协程控制块（它们更可能是空闲的）if (S->co[id] == NULL) {S->co[id] = co;++S->nco;return id; // 返回协程id，这也是用户操纵指定协程的句柄}}}assert(0);return -1;
}struct coroutine *_co_new(struct schedule *S, coroutine_func func, void *ud) {struct coroutine *co = malloc(sizeof(*co));co->func = func;co->ud = ud;co->sch = S;co->cap = 0;co->size = 0;co->status = COROUTINE_READY;co->stack = NULL; // 创建协程时不分配栈return co;
}

可以看到协程刚创建时状态为READY，而且已经保存了要执行的函数和要传入的参数，但是还没有分配栈空间，因为只有在第一次resume 一个协程时才不得不需要分配协程栈，通过延迟加载来避免不必要的空间开销。

**coroutine_status()和coroutine_running()**会返回当前协程的状态：

int coroutine_status(struct schedule *S, int id) {assert(id >= 0 && id < S->cap);if (S->co[id] == NULL) {return COROUTINE_DEAD;}return S->co[id]->status;
}int coroutine_running(struct schedule *S) {return S->running;
}

至此，我们只剩下两个api，coroutine_resume()和coroutine_yield()没有讲解，这也是最为关键的协程核心函数。

首先要明确它们的使用场景，coroutine_resume()用来唤起一个协程，它应该在协程外部调用；coroutine_yield()用来让出当前执行流，它一定是被一个协程调用的，并且会返回到resume()的调用处。

我们从**coroutine_resume()**入手：

void coroutine_resume(struct schedule *S, int id) {assert(S->running == -1);assert(id >= 0 && id < S->cap);struct coroutine *C = S->co[id];if (C == NULL)return;int status = C->status;switch (status) {case COROUTINE_READY: // 协程第一次resume时获取上下文并设置共享栈为S->stack.getcontext(&C->ctx);C->ctx.uc_stack.ss_sp = S->stack;C->ctx.uc_stack.ss_size = STACK_SIZE;C->ctx.uc_link = &S->main; // 协程执行结束/挂起后返回至此函数尾（然后return）S->running = id;C->status = COROUTINE_RUNNING;uintptr_t ptr = (uintptr_t) S;makecontext(&C->ctx, (void (*)(void)) mainfunc, 2, (uint32_t) ptr, (uint32_t) (ptr >> 32));swapcontext(&S->main, &C->ctx); // 调用mainfunc，运行在共享栈S->stack上break;case COROUTINE_SUSPEND:memcpy(S->stack + STACK_SIZE - C->size, C->stack, C->size); // 拷贝协程栈到共享栈S->running = id;C->status = COROUTINE_RUNNING;swapcontext(&S->main, &C->ctx); // 调用mainfunc，运行在共享栈S->stack上break;default:assert(0);}
}

如果这是一个协程的第一次resume()，那么先设置相关的信息：保存当前上下文到C->ctx，将C->ctx.uc_stack指定为共享栈S->stack，设置C->ctx.uc_link为&S->main，设置ucontext的执行函数为mainfunc，修改协程状态为RUNNING。
如果这不是第一次resume()，那么将协程自身保存的栈内容复制到共享栈中，设置状态为RUNNING，准备执行mainfunc。
无论哪种情况，都会调用swapcontext(&S->main, &C->ctx)，此函数将当前上下文保存到S->main，并将执行流切换到C->ctx，因为之前调用makecontext将mainfunc绑定到C->ctx，所以实际上会用C->ctx中的上下文调用mainfunc。

注意S->main的设置仅在此处发生，所以S->main始终代表coroutine_resume()函数尾处的上下文。

那么mainfunc()做了什么呢？

static void mainfunc(uint32_t low32, uint32_t hi32) {    uintptr_t ptr = (uintptr_t) low32 | ((uintptr_t) hi32 << 32); // 组合两个uint32_t拿到struct schedule*指针，此做法兼容32位/64位指针    struct schedule *S = (struct schedule *) ptr;    int id = S->running;    struct coroutine *C = S->co[id]; // 拿到要执行的协程的指针    C->func(S, C->ud); // 实际执行协程函数，内部可能会调用coroutine_yield，所以可能不会立即返回    _co_delete(C); // 一旦返回就说明此协程的函数return了，整个协程执行完毕，销回之    S->co[id] = NULL;    --S->nco;    S->running = -1;}

找到对应当前协程的coroutine对象，取得其保存的函数指针和要传入的参数，调用之：C->func(S, C->ud)。
一旦函数返回，说明当前协程的主体函数执行完毕，协程也应该被销毁，并且修改schedule对象中的相关信息。

但是要注意一般C->func()并不会立即返回，比如例程main.c::foo中，就是在一个for循环中调用5次coroutine_yield()才return。

让我们来看一下coroutine_yield()：

void coroutine_yield(struct schedule *S) {    int id = S->running;    assert(id >= 0);    struct coroutine *C = S->co[id];    assert((char *) &C > S->stack);    _save_stack(C, S->stack + STACK_SIZE); // 保存共享栈S->stack到当前协程的栈C->stack    C->status = COROUTINE_SUSPEND;    S->running = -1;    swapcontext(&C->ctx, &S->main); // 返回coroutine_resume函数尾（然后return）}

首先保存协程栈，然后修改协程状态为SUSPEND。
swapcontext(&C->ctx, &S->main)将当前函数尾的上下文保存到C->ctx中，并切换到S->main所代表的上下文，通过前面对coroutine_resume()的分析我们知道S->main始终代表coroutine_resume()函数尾处的上下文，所以这里执行流将会”跳转“到coroutine_resume()函数尾，紧接着返回到coroutine_resume()的调用方。

另外，对一个已经调用过coroutine_yield()的协程，其C->ctx保存的是coroutine_yield()结尾处的上下文，所以当它再次被调用coroutine_resume()时，执行流将跳转到此处并返回到该协程的主体函数中，从coroutine_yield()调用处继续向下执行。

下图展示了以ucontextAPI为核心的协程执行流：

到这里还有一个函数没有讲解，就是_save_stack()：

static void _save_stack(struct coroutine *C, const char *top) {// trick: 因为协程C运行在S->stack中，所以栈上对象dummy也位于S->stack中// [&dummy, top) 就是执行到目前为止整个协程所使用的栈空间，所以保存协程栈时就不需要保存整个S->stack了char dummy = 0;assert(top - &dummy <= STACK_SIZE);if (C->cap < top - &dummy) { // 如果是第一次保存或者协程栈变大了，那么（重）分配C->stackfree(C->stack);C->cap = top - &dummy;C->stack = malloc(C->cap);}C->size = top - &dummy;memcpy(C->stack, &dummy, C->size);
}

执行到_save_stack函数内部时，backtrace是这样的：

                                       ┌──────────────────────────────┐           │          stack top           *──┐        └──────────────────────────────┘  │        ┌──────────────────────────────┐  │        │           previous           │  │        │         stack frames         │  │        └──────────────────────────────┘  │        ┌──────────────────────────────┐  │        │            test()            │  │        │         stack frame          │  │active  └──────────────────────────────┘ coroutine ┌──────────────────────────────┐  │stack   │      coroutine_resume()      │  │        │         stack frame          │  │        └──────────────────────────────┘  │        ┌──────────────────────────────┐  │        │ (_save_stack() stack frame)  │  │        │ ebp of previous stack frame  │  │        │            dummy             *◀─┘        │          arguments           │           │             ...              │           └──────────────────────────────┘

所以_save_stack()中，top - &dummy所代表的就是当前协程活跃的栈空间，而共享栈S->stack固定大小为1024k，当前协程所使用的栈大小一般远小于这个数，所以在协程切换时只保存top - &dummy这一部分而非整个协程栈将极大减少协程的空间开销。

这也算是实现共享栈的一个trick吧，在libco中也有类似实现。