《gc:C语言的垃圾回收库-中文》

GitHub:gc:C语言的垃圾回收库

gc: mark & sweep garbage collection for C

gc is an implementation of a conservative, thread-local, mark-and-sweep
garbage collector. The implementation provides a fully functional replacement
for the standard POSIX malloc(), calloc(), realloc(), and free() calls.

The focus of gc is to provide a conceptually clean implementation of
a mark-and-sweep GC, without delving into the depths of architecture-specific
optimization (see e.g. the Boehm GC for such an undertaking). It
should be particularly suitable for learning purposes and is open for all kinds
of optimization (PRs welcome!).

The original motivation for gc is my desire to write my own LISP
in C, entirely from scratch - and that required garbage collection.

Acknowledgements

This work would not have been possible without the ability to read the work of
others, most notably the Boehm GC, orangeduck’s tgc (which also
follows the ideals of being tiny and simple), and The Garbage Collection
Handbook.

Table of contents

  • Table of contents
  • Documentation Overview
  • Quickstart
    • Download and test
    • Basic usage
  • Core API
    • Starting, stopping, pausing, resuming and running GC
    • Memory allocation and deallocation
    • Helper functions
  • Basic Concepts
    • Data Structures
    • Garbage collection
    • Reachability
    • The Mark-and-Sweep Algorithm
    • Finding roots
    • Depth-first recursive marking
    • Dumping registers on the stack
    • Sweeping

Documentation Overview

  • Read the quickstart below to see how to get started quickly
  • The concepts section describes the basic concepts and design
    decisions that went into the implementation of gc.
  • Interleaved with the concepts, there are implementation sections that detail
    the implementation of the core components, see hash map
    implementation, dumping registers on the
    stack, finding roots, and
    depth-first, recursive marking.

Quickstart

Download, compile and test

$ git clone git@github.com:mkirchner/gc.git
$ cd gc

To compile using the clang compiler:

$ make test

To use the GNU Compiler Collection (GCC):

$ make test CC=gcc

The tests should complete successfully. To create the current coverage report:

$ make coverage

Basic usage

...
#include "gc.h"
...void some_fun() {...int* my_array = gc_calloc(&gc, 1024, sizeof(int));for (size_t i=0; i<1024; ++i) {my_array[i] = 42;}...// look ma, no free!
}int main(int argc, char* argv[]) {gc_start(&gc, &argc);...some_fun();...gc_stop(&gc);return 0;
}

Core API

This describes the core API, see gc.h for more details and the low-level API.

Starting, stopping, pausing, resuming and running GC

In order to initialize and start garbage collection, use the gc_start()
function and pass a bottom-of-stack address:

void gc_start(GarbageCollector* gc, void* bos);

The bottom-of-stack parameter bos needs to point to a stack-allocated
variable and marks the low end of the stack from where root
finding (scanning) starts.

Garbage collection can be stopped, paused and resumed with

void gc_stop(GarbageCollector* gc);
void gc_pause(GarbageCollector* gc);
void gc_resume(GarbageCollector* gc);

and manual garbage collection can be triggered with

size_t gc_run(GarbageCollector* gc);

Memory allocation and deallocation

gc supports malloc(), calloc()and realloc()-style memory allocation.
The respective function signatures mimick the POSIX functions (with the
exception that we need to pass the garbage collector along as the first
argument):

void* gc_malloc(GarbageCollector* gc, size_t size);
void* gc_calloc(GarbageCollector* gc, size_t count, size_t size);
void* gc_realloc(GarbageCollector* gc, void* ptr, size_t size);

It is possible to pass a pointer to a destructor function through the
extended interface:

void* dtor(void* obj) {// do some cleanup workobj->parent->deregister();obj->db->disconnect()...// no need to free obj
}
...
SomeObject* obj = gc_malloc_ext(gc, sizeof(SomeObject), dtor);
...

gc supports static allocations that are garbage collected only when the
GC shuts down via gc_stop(). Just use the appropriate helper function:

void* gc_malloc_static(GarbageCollector* gc, size_t size, void (*dtor)(void*));

Static allocation expects a pointer to a finalization function; just set to
NULL if finalization is not required.

Note that gc currently does not guarantee a specific ordering when it
collects static variables, If static vars need to be deallocated in a
particular order, the user should call gc_free() on them in the desired
sequence prior to calling gc_stop(), see below.

It is also possible to trigger explicit memory deallocation using

void gc_free(GarbageCollector* gc, void* ptr);

Calling gc_free() is guaranteed to (a) finalize/destruct on the object
pointed to by ptr if applicable and (b) to free the memory that ptr points to
irrespective of the current scheduling for garbage collection and will also
work if GC has been paused using gc_pause() above.

Helper functions

gc also offers a strdup() implementation that returns a garbage-collected
copy:

char* gc_strdup (GarbageCollector* gc, const char* s);

Basic Concepts

The fundamental idea behind garbage collection is to automate the memory
allocation/deallocation cycle. This is accomplished by keeping track of all
allocated memory and periodically triggering deallocation for memory that is
still allocated but unreachable.

Many advanced garbage collectors also implement their own approach to memory
allocation (i.e. replace malloc()). This often enables them to layout memory
in a more space-efficient manner or for faster access but comes at the price of
architecture-specific implementations and increased complexity. gc sidesteps
these issues by falling back on the POSIX *alloc() implementations and keeping
memory management and garbage collection metadata separate. This makes gc
much simpler to understand but, of course, also less space- and time-efficient
than more optimized approaches.

Data Structures

The core data structure inside gc is a hash map that maps the address of
allocated memory to the garbage collection metadata of that memory:

The items in the hash map are allocations, modeled with the Allocation
struct:

typedef struct Allocation {void* ptr;                // mem pointersize_t size;              // allocated size in byteschar tag;                 // the tag for mark-and-sweepvoid (*dtor)(void*);      // destructorstruct Allocation* next;  // separate chaining
} Allocation;

Each Allocation instance holds a pointer to the allocated memory, the size of
the allocated memory at that location, a tag for mark-and-sweep (see below), an
optional pointer to the destructor function and a pointer to the next
Allocation instance (for separate chaining, see below).

The allocations are collected in an AllocationMap

typedef struct AllocationMap {size_t capacity;size_t min_capacity;double downsize_factor;double upsize_factor;double sweep_factor;size_t sweep_limit;size_t size;Allocation** allocs;
} AllocationMap;

that, together with a set of static functions inside gc.c, provides hash
map semantics for the implementation of the public API.

The AllocationMap is the central data structure in the GarbageCollector
struct which is part of the public API:

typedef struct GarbageCollector {struct AllocationMap* allocs;bool paused;void *bos;size_t min_size;
} GarbageCollector;

With the basic data structures in place, any gc_*alloc() memory allocation
request is a two-step procedure: first, allocate the memory through system (i.e.
standard malloc()) functionality and second, add or update the associated
metadata to the hash map.

For gc_free(), use the pointer to locate the metadata in the hash map,
determine if the deallocation requires a destructor call, call if required,
free the managed memory and delete the metadata entry from the hash map.

These data structures and the associated interfaces enable the
management of the metadata required to build a garbage collector.

Garbage collection

gc triggers collection under two circumstances: (a) when any of the calls to
the system allocation fail (in the hope to deallocate sufficient memory to
fulfill the current request); and (b) when the number of entries in the hash
map passes a dynamically adjusted high water mark.

If either of these cases occurs, gc stops the world and starts a
mark-and-sweep garbage collection run over all current allocations. This
functionality is implemented in the gc_run() function which is part of the
public API and delegates all work to the gc_mark() and gc_sweep() functions
that are part of the private API.

gc_mark() has the task of finding roots and tagging all
known allocations that are referenced from a root (or from an allocation that
is referenced from a root, i.e. transitively) as “used”. Once the marking of
is completed, gc_sweep() iterates over all known allocations and
deallocates all unused (i.e. unmarked) allocations, returns to gc_run() and
the world continues to run.

Reachability

gc will keep memory allocations that are reachable and collect everything
else. An allocation is considered reachable if any of the following is true:

  1. There is a pointer on the stack that points to the allocation content.
    The pointer must reside in a stack frame that is at least as deep in the call
    stack as the bottom-of-stack variable passed to gc_start() (i.e. bos is
    the smallest stack address considered during the mark phase).
  2. There is a pointer inside gc_*alloc()-allocated content that points to the
    allocation content.
  3. The allocation is tagged with GC_TAG_ROOT.

The Mark-and-Sweep Algorithm

The naïve mark-and-sweep algorithm runs in two stages. First, in a mark
stage, the algorithm finds and marks all root allocations and all allocations
that are reachable from the roots. Second, in the sweep stage, the algorithm
passes over all known allocations, collecting all allocations that were not
marked and are therefore deemed unreachable.

Finding roots

At the beginning of the mark stage, we first sweep across all known
allocations and find explicit roots with the GC_TAG_ROOT tag set.
Each of these roots is a starting point for depth-first recursive
marking.

gc subsequently detects all roots in the stack (starting from the bottom-of-stack
pointer bos that is passed to gc_start()) and the registers (by dumping them
on the stack prior to the mark phase) and
uses these as starting points for marking as well.

Depth-first recursive marking

Given a root allocation, marking consists of (1) setting the tag field in an
Allocation object to GC_TAG_MARK and (2) scanning the allocated memory for
pointers to known allocations, recursively repeating the process.

The underlying implementation is a simple, recursive depth-first search that
scans over all memory content to find potential references:

void gc_mark_alloc(GarbageCollector* gc, void* ptr)
{Allocation* alloc = gc_allocation_map_get(gc->allocs, ptr);if (alloc && !(alloc->tag & GC_TAG_MARK)) {alloc->tag |= GC_TAG_MARK;for (char* p = (char*) alloc->ptr;p < (char*) alloc->ptr + alloc->size;++p) {gc_mark_alloc(gc, *(void**)p);}}
}

In gc.c, gc_mark() starts the marking process by marking the
known roots on the stack via a call to gc_mark_roots(). To mark the roots we
do one full pass through all known allocations. We then proceed to dump the
registers on the stack.

Dumping registers on the stack

In order to make the CPU register contents available for root finding, gc
dumps them on the stack. This is implemented in a somewhat portable way using
setjmp(), which stores them in a jmp_buf variable right before we mark the
stack:

...
/* Dump registers onto stack and scan the stack */
void (*volatile _mark_stack)(GarbageCollector*) = gc_mark_stack;
jmp_buf ctx;
memset(&ctx, 0, sizeof(jmp_buf));
setjmp(ctx);
_mark_stack(gc);
...

The detour using the volatile function pointer _mark_stack to the
gc_mark_stack() function is necessary to avoid the inlining of the call to
gc_mark_stack().

Sweeping

After marking all memory that is reachable and therefore potentially still in
use, collecting the unreachable allocations is trivial. Here is the
implementation from gc_sweep():

size_t gc_sweep(GarbageCollector* gc)
{size_t total = 0;for (size_t i = 0; i < gc->allocs->capacity; ++i) {Allocation* chunk = gc->allocs->allocs[i];Allocation* next = NULL;while (chunk) {if (chunk->tag & GC_TAG_MARK) {/* unmark */chunk->tag &= ~GC_TAG_MARK;chunk = chunk->next;} else {total += chunk->size;if (chunk->dtor) {chunk->dtor(chunk->ptr);}free(chunk->ptr);next = chunk->next;gc_allocation_map_remove(gc->allocs, chunk->ptr, false);chunk = next;}}}gc_allocation_map_resize_to_fit(gc->allocs);return total;
}

We iterate over all allocations in the hash map (the for loop), following every
chain (the while loop with the chunk = chunk->next update) and either (1)
unmark the chunk if it was marked; or (2) call the destructor on the chunk and
free the memory if it was not marked, keeping a running total of the amount of
memory we free.

That concludes the mark & sweep run. The stopped world is resumed and we’re
ready for the next run!

gc:C语言的垃圾回收库-英文相关推荐

  1. gc:C语言的垃圾回收库-中文

    Table of Contents gc:标记并清除C的垃圾回收 致谢 文档概述 快速开始 下载,编译和测试 基本用法 核心API 启动,停止,暂停,恢复和运行GC 内存分配和释放 辅助功能 基本概念 ...

  2. 趣谈GC技术,解密垃圾回收的玄学理

    趣谈GC技术,解密垃圾回收的玄学理论 GC的由来 一个例子引发的问题 GC的意义 GC算法 Reference Counting 循环引用 破环之道 作者介绍 开源项目介绍 大多数程序员在日常开发中常 ...

  3. GC.Collect如何影响垃圾回收

    根据垃圾回收的算法,对象在内存中是按代的方式存放的,通常情况下,当第0代沾满分配的空间的时候(比如是256k),GC就会启动去回收第0代对象,幸存的第0代对象会被放入第1代中去,第1代的对象要等到放满 ...

  4. 趣谈GC技术,解密垃圾回收的玄学理论(一)

    点击上方蓝字,关注我们~ 导语:大多数程序员在日常开发中常常会碰到GC的问题:OOM异常.GC停顿等,这些异常直接导致糟糕的用户体验,如果不能得到及时处理,还会严重影响应用程序的性能.本系列从GC的基 ...

  5. Go 语言的垃圾回收演化历程:垃圾回收和运行时问题

    Google Go 团队的成员 Richard L. Hudson (Rick) 近日在 Go 的官方博客和大家分享了他在2018年6月18日国际内存管理研讨会(ISMM)上发表的主题演讲稿.在过去的 ...

  6. java_opts gc回收器_JVM之垃圾回收机制(GC)

    JVM之垃圾回收机制全解(GC)文章底部有思维导图,较为清晰,可参考 导读:垃圾回收是Java体系中最重要的组成部分之一,其提供了一套全自动的内存管理方案,要想掌握这套管理方案,就必须了解垃圾回收器的 ...

  7. system.gc会立即执行垃圾回收吗_JVM基础到实战03-垃圾回收概念

    1.为什么要进行垃圾回收? 不回收会造成内存泄漏 2.什么时候执行回收? 达到一定的比例,或者申请的内存超出了空闲内存,触发回收 3.如果是你,如何设计垃圾回收算法?就是回收哪些类型的对象 回收栈中没 ...

  8. GC算法-增量式垃圾回收

    概述 增量式垃圾回收也并不是一个新的回收算法, 而是结合之前算法的一种新的思路. 之前说的各种垃圾回收, 都需要暂停程序, 执行GC, 这就导致在GC执行期间, 程序得不到执行. 因此出现了增量式垃圾 ...

  9. GC算法-分代垃圾回收

    概述 分代垃圾回收并不是一个新的算法, 而是将之前的回收算法结合利用, 分场景使用. 简单来说, 分代垃圾回收的思路, 就是给每个对象都分配一个年龄, 年龄越大的, 活的越久, 被回收的概率就越小. ...

最新文章

  1. Red Hat Enteripse Linux5下配置yum源的方法
  2. 记录一次提交开源JAR包到中央仓库的过程
  3. jquery调用click事件的三种方式
  4. caffe 关于Deconvolution的初始化注意事项
  5. python sendline_python Pexpect模块的使用
  6. C++工作笔记-编译时类型检查与运行时类型检查
  7. Python生成随机高斯模糊图片
  8. Team Foundation Server 源代码控制权限问题
  9. 20190725 SecureCRT查看日志修改关键字颜色
  10. FD.io VPP官方邮件列表
  11. html jquery 不能自动完成,不能设置属性apos;_renderitem apos;定义jQuery UI自动完成HTML...
  12. 4月1号鸿蒙系统上线,4月1日太关键,鸿蒙迎来正式发布前的最后大考,华为将从此起飞...
  13. 三星android+l,高配原生安卓!移动定制版三星I9008L评测
  14. 系列课程 ElasticSearch 之第 8 篇 —— SpringBoot 整合 ElasticSearch 做查询(分页查询)
  15. Java字符串相关的类详解
  16. PythonTkinter 练习11之 自编工具 扫描地址段IP
  17. Protected multilib versions XXX
  18. 用计算机表白我不喜欢你了,“我喜欢你”用文言文怎么说?拿这3句去表白,男生不忍心拒绝你!...
  19. matlab实现PS算法之颜色变换、高反差保留、染色玻璃、碎片
  20. Linux 防火墙简介

热门文章

  1. C#简介和异常类总结
  2. easyui页内跳转
  3. 在Docker官网上浏览版本号
  4. 2019.7.27数组api
  5. 算法总结系列之八:复读机的故事 - 散列表.NET应用的研究(下集)
  6. 洛谷P2114 [NOI2014]起床困难综合症
  7. 关联容器----关联容器概述,关联容器操作,无序容器
  8. php实现最简单的MVC框架实例教程
  9. jeecg中的树形控件demo
  10. mybatis 注解忽略属性