关于如何理解Glibc堆管理器(Ⅹ——完结、补充、注释——Arena、heap_info、malloc

本篇实为个人笔记，可能存在些许错误；若各位师傅发现哪里存在错误，还望指正。感激不尽。

若有图片及文稿引用，将在本篇结尾处著名来源(也有置于篇首的情况)。

截至到本节内容，该系列算是正式完结了，后续或许会有补充，但基本上都将添加在本节内容中。在前几节中，笔者已经按照自己的思路尽可能详尽的将Glibc的堆管理器Ptmalloc2的方式做了一定的介绍，尽管Ptmalloc2的内容肯定不止这些，但已能大致了解其工作方式了

但也有一些必要的内容未曾在前几节中放出，诸如突然出现的Arena，以及Heap的结构等内容没能展开介绍，因此将这些内容补充在本系列最后一节

heap_info结构体：

malloc_state：

malloc_par：

Arena：

heap_info结构体：

typedef struct _heap_info
{mstate ar_ptr; /* Arena for this heap. */struct _heap_info *prev; /* Previous heap. */size_t size;   /* Current size in bytes. */size_t mprotect_size; /* Size in bytes that has been mprotectedPROT_READ|PROT_WRITE.  *//* Make sure the following data is properly aligned, particularlythat sizeof (heap_info) + 2 * SIZE_SZ is a multiple ofMALLOC_ALIGNMENT. */char pad[-6 * SIZE_SZ & MALLOC_ALIGN_MASK];
} heap_info;

每一个新开辟的堆都有一个独立的heap_info结构体

ar_ptr指针指向一个为该堆服务的arena
prev指针指向上一个堆的heap_info结构体
size记录了堆的大小
mprotect_size记录了堆中多大的空间是可读写的
pad字符串则用以堆其该结构体，使其能够按照0x10字节对齐(x86中则是8字节对齐)

这里引用CTF-WIKI中对pad的解释：

pad 里负数的缘由是什么呢？

pad 是为了确保分配的空间是按照 MALLOC_ALIGN_MASK+1 (记为 MALLOC_ALIGN_MASK_1) 对齐的。在 pad 之前该结构体一共有 6 个 SIZE_SZ 大小的成员, 为了确保 MALLOC_ALIGN_MASK_1 字节对齐, 可能需要进行 pad，不妨假设该结构体的最终大小为 MALLOC_ALIGN_MASK_1*x，其中 x 为自然数，那么需要 pad 的空间为 MALLOC_ALIGN_MASK_1 * x - 6 * SIZE_SZ = (MALLOC_ALIGN_MASK_1 * x - 6 * SIZE_SZ) % MALLOC_ALIGN_MASK_1 = 0 - 6 * SIZE_SZ % MALLOC_ALIGN_MASK_1=-6 * SIZE_SZ % MALLOC_ALIGN_MASK_1 = -6 * SIZE_SZ & MALLOC_ALIGN_MASK

malloc_state：

struct malloc_state
{/* Serialize access.  */__libc_lock_define (, mutex);/* Flags (formerly in max_fast).  */int flags;/* Set if the fastbin chunks contain recently inserted free blocks.  *//* Note this is a bool but not all targets support atomics on booleans.  */int have_fastchunks;/* Fastbins */mfastbinptr fastbinsY[NFASTBINS];/* Base of the topmost chunk -- not otherwise kept in a bin */mchunkptr top;/* The remainder from the most recent split of a small request */mchunkptr last_remainder;/* Normal bins packed as described above */mchunkptr bins[NBINS * 2 - 2];/* Bitmap of bins */unsigned int binmap[BINMAPSIZE];/* Linked list */struct malloc_state *next;/* Linked list for free arenas.  Access to this field is serializedby free_list_lock in arena.c.  */struct malloc_state *next_free;/* Number of threads attached to this arena.  0 if the arena is onthe free list.  Access to this field is serialized byfree_list_lock in arena.c.  */INTERNAL_SIZE_T attached_threads;/* Memory allocated from the system in this arena.  */INTERNAL_SIZE_T system_mem;INTERNAL_SIZE_T max_system_mem;
};

__libc_lock_define (, mutex)：笔者将其理解为一个开关(锁)，如果某个线程对这个堆进行操作时，就会将这个堆锁住，组织其他线程对这个堆的操作，直到其他线程发现这个变量被解开了，那么才会排队进行操作(笔者称锁住时为占用，否则为空闲)
flags：一个二进制数，bit0记录FastBins中是否有空闲块，bit1 标识分配区是否能返回连续的虚拟地址空间，具体定义见下面的定义表
fastbinsY[NFASTBINS]：一个存放了每个Fast Bin链表头指针的数组
top：指向堆中的Top chunk
last_reminder：指向最新切割chunk后剩余的部分
bins：用于存放各类Bins结构的数组
binmap：用来表示堆中是否还有空闲块
next：指向下一个相同类型结构体
next_free：指向下一个空闲的arena
attached_threads：指示有多少个线程连接这个堆
system_mem/max_system_mem：表示系统为这个堆分配了多少空间

/*FASTCHUNKS_BIT held in max_fast indicates that there are probablysome fastbin chunks. It is set true on entering a chunk into anyfastbin, and cleared only in malloc_consolidate.The truth value is inverted so that have_fastchunks will be trueupon startup (since statics are zero-filled), simplifyinginitialization checks.*/#define FASTCHUNKS_BIT (1U)#define have_fastchunks(M) (((M)->flags & FASTCHUNKS_BIT) == 0)
#define clear_fastchunks(M) catomic_or(&(M)->flags, FASTCHUNKS_BIT)
#define set_fastchunks(M) catomic_and(&(M)->flags, ~FASTCHUNKS_BIT)/*NONCONTIGUOUS_BIT indicates that MORECORE does not return contiguousregions.  Otherwise, contiguity is exploited in merging together,when possible, results from consecutive MORECORE calls.The initial value comes from MORECORE_CONTIGUOUS, but ischanged dynamically if mmap is ever used as an sbrk substitute.*/#define NONCONTIGUOUS_BIT (2U)#define contiguous(M) (((M)->flags & NONCONTIGUOUS_BIT) == 0)
#define noncontiguous(M) (((M)->flags & NONCONTIGUOUS_BIT) != 0)
#define set_noncontiguous(M) ((M)->flags |= NONCONTIGUOUS_BIT)
#define set_contiguous(M) ((M)->flags &= ~NONCONTIGUOUS_BIT)/* ARENA_CORRUPTION_BIT is set if a memory corruption was detected on thearena.  Such an arena is no longer used to allocate chunks.  Chunksallocated in that arena before detecting corruption are not freed.  */#define ARENA_CORRUPTION_BIT (4U)#define arena_is_corrupt(A) (((A)->flags & ARENA_CORRUPTION_BIT))
#define set_arena_corrupt(A) ((A)->flags |= ARENA_CORRUPTION_BIT)

malloc_par：

struct malloc_par
{/* Tunable parameters */unsigned long trim_threshold;INTERNAL_SIZE_T top_pad;INTERNAL_SIZE_T mmap_threshold;INTERNAL_SIZE_T arena_test;INTERNAL_SIZE_T arena_max;/* Memory map support */int n_mmaps;int n_mmaps_max;int max_n_mmaps;/* the mmap_threshold is dynamic, until the user setsit manually, at which point we need to disable anydynamic behavior. */int no_dyn_threshold;/* Statistics */INTERNAL_SIZE_T mmapped_mem;INTERNAL_SIZE_T max_mmapped_mem;/* First address handed out by MORECORE/sbrk.  */char *sbrk_base;#if USE_TCACHE/* Maximum number of buckets to use.  */size_t tcache_bins;size_t tcache_max_bytes;/* Maximum number of chunks in each bucket.  */size_t tcache_count;/* Maximum number of chunks to remove from the unsorted list, whicharen't used to prefill the cache.  */size_t tcache_unsorted_limit;
#endif
};

trim_threshold：收缩阈值，默认为128KB，当Top chunk大小超过该值时，调用free将可能引起堆的收缩，减少Top chunk的大小

如下内容摘自：https://www.lihaoranblog.cn/malloc_par/

在一定的条件下，调用free时会收缩内存，减小top chunk的大小。由于mmap分配阈值的动态调整，在free时可能将收缩阈值修改为mmap分配阈值的2倍，在64位系统上，mmap分配阈值最大值为32MB，所以收缩阈值的最大值为64MB，在32位系统上，mmap分配阈值最大值为512KB，所以收缩阈值的最大值为1MB。收缩阈值可以通过函数mallopt()进行设置

top_pad：默认为0，表示分配堆时是否添加了额外的pad

mmap_threshold：mmap分配阈值，默认为128K；32位中最大位512KB，64位中最大位32MB，但mmap会动调调整分配阈值，因此这个值可能会修改

arena_test/arena_max：

如下内容摘自：https://www.lihaoranblog.cn/malloc_par/

arena_test和arena_max用于PER_THREAD优化，在32位系统上arena_test默认值为2，64位系统上的默认值为8，当每个进程的分配区数量小于等于arena_test时，不会重用已有的分配区。为了限制分配区的总数，用arena_max来保存分配区的最大数量，当系统中的分配区数量达到arena_max，就不会再创建新的分配区，只会重用已有的分配区。这两个字段都可以使用mallopt()函数设置。

n_mmaps：当前堆用mmap分配内存块的数量

n_mmaps_max：当前堆用mmap分配内存块的最大数量，默认65536，可修改

no_dyn_threshold：默认为0，表示开启mmap分配阈值动调调整

mmapped_mem/max_mmapped_mem：统计mmap分配的内存大小，通常两值相等

在使用Tcache的情况下：

tcache_bins/tcache_max_bytes：Tcache链表数量，不会超过tcache_max_bytes

tcache_count：每个链表最多可挂的节点数

tcache_unsorted_limit：可从Unsorted Bin中拿出chunk的最大数量

Arena：

可能有的文章会将其翻译成“竞技场”，但笔者仍然会用“Arena”去称呼它

通常，一个线程只会有一个Arena，主线程的叫做Main_Arena，其他线程的叫做Thread_Arena，但Arena的数量并不会随着线程数而无限增加。其数量上限与系统和处理器核心数相关：

32位系统中：Number of arena = 2 * number of cores + 1.
64位系统中：Number of arena = 8 * number of cores + 1

CTF-WIKI： 与 thread 不同的是，main_arena 并不在申请的 heap 中，而是一个全局变量，在 libc.so 的数据段。

正如上述的malloc_par结构体中arena_test参数所说，如果当前Arena的数量小于arena_test，那么堆管理器就会在其他线程创建堆结构的时候为其另外创建一个Arena

但如果数量超过了arena_test，那么只在需要的时候才会创建(比如某线程发现其他Arena全都被占用了，为了不因为等待排队而浪费掉时间，于是另外开辟新的Arena)

笔者对Arena的理解为：

一个Arena包括了一系列的堆（可能有的读者会把额外开辟的空间归并到同一个堆里，但笔者习惯于将额外开辟的内存块称为“新堆”以区别最早初始化时的堆，笔者称之为“主堆”，这样在解释内存收缩时，能够将其理解为“归还主堆以外的堆”）

在之前的调试中也曾发现，main_arena似乎存在于栈上，而在CTF-WIKI中将其描述为全局变量。每个Arena都有自己的一套malloc_state、malloc_par、heap_info结构体，Arena中的一系列堆通过Arena进行管理(这样解释似乎有些怪异因为malloc_state结构体中存在指向Arena的指针，但也有一定的合理性，它在某种程度上方便了笔者的理解)

可以先假设这样一个场景：

某个进程存在两个线程A、B并发运行，存在一个chunk p。现在，线程A进行free(p)，而线程B则要往chunk p处写入一些数据。假设A稍快一点，它先被处理器进行处理，那么系统就要先阻塞线程B的请求，直到线程A的事情已经做完了为止。当线程A结束了，系统就会发现这块地址不可写，然后阻止它进行这个操作

但是，阻塞是一件非常耗时的工作。如果只有少量这种情况发生，似乎也不是不能接受这种开销，但如果需要处理大量的多线程工作，这种阻塞就将带来严重的浪费。

如果我们为不同的线程开辟不同的Arena，每个Arena都有自己的Bins，那么线程B就不需要等待线程A，可以直接操作自己的Arena去申请或是释放chunk

当然，实际调试中会发现，我们开辟的chunk总是在Arena下方(往高地址处)，我们也可以将其理解为每个线程自己的一个“主堆”，这样就能避开那些“可以不使用堆锁的情况”