LINUX内核狂想曲之SLOB分配器

LINUX内核狂想曲

@CopyLeft by ICANTH，I Can do ANy THing that I CAN THink！~

Author： WenHui， WuHan University，2012-5-12

Version：v1.0

Last Modified Time: 2012-5-20

SLOB概述

SLOB（Simple List Of Blocks）分配器，是一个在SLAB分配器类层之中的传统K&R/UNIX堆分配器。它比LINUX原slab分配器代码更小短小，且更有效率。但它比SLAB更容易产生内存碎片问题，仅应用于小系统，特别是嵌入式系统。

SLOB分配器总共只有三条半满空闲链（partial free list），free_slab_large、free_slab_medium和free_slab_small。每条链都由已被分配给SLOB的page元素构成。free_slab_small链只分配小于256字节的块，free_slab_medium只分配小于1024字节的块，而free_slab_large只分配小于PAGE_SIZE的块。

若通过SLOB分配器分配大于一页的对象，则SLOB分配器直接调用zone mem allocator (alloc_pages)分配连续页（compound pages）并返回。并修改相应首page结构体的flags字段，private字段中保存该块大小。

由于SLOB分配器需要重用page结构体中SLAB相关字段，为避免不必要的混淆，SLOB利用C语言union结构体特性将page重新封装成slab_page。在slab_page中，几个关键字段含义如下：

slobidx_t _units	在slob page中剩余空闲单元数。此处unit，等同于 slobidx_t、slob_block（其大小视PAGE_SIZE而定，当PAGE_SIZE=4KB时，为16bits）。SLOB将一页内存划分成slobidx_t个大小的单元。
slob_t *free	SLOB分配器将slob page内存中的空闲块按线性地址升序方式组织成空闲块链，由page中free字段指向链首元素。
list_head list	在空闲slob page链表中指向下一个slob page

SLOB分配器数据结构如下图所示：

SLOB分配器数据结构图

对于一个slob page所对应的物理页帧，它的内存区域被划分成固定大小的UNIT。BLOCK，是一个slob中基本分配对象，分为空闲块和已分配块。BLOCK大小只能是整数倍UNIT，且其起始地址也必须以UNIT对齐。一个BLOCK由slob_t描述，slob_t在BLOCK的头部，其大小为一个UNIT。

对于空闲块而言，若BLOCK_SIZE = 1 (UNIT)，则仅有一个slob_t描述符。且slob_t存放next block相对于物理页帧起始虚拟地址偏移的负值（以UNIT为单位，- offset）。若BLOCK_SIZE > 1 UNIT，则BLOCK头部有两个slob_t。slob_t[0]为BOCK_SIZE (UNIT)。slob_t[1]为next block的 + offset。当offset超过本页范围时，表示next block为 NUL。

对于已分配块而言，BLOCK因采用KMALLOC、及KMEM_CACHE两种不同分配方式而结构不同。对于KMALLOC而言，其已分配BLOCK结构如下：

KMALLOC的BLOCK内首部有一个align内存区，用以存储object的大小。返回BLOCK时，返回object的起始地址。而KMEM_CACHE若选择RCU方式释放，则BLOCK尾部有一个slob_rcu结构，用于采用RCU机制释放BLOCK，在slob_rcu中包含size字段，用以指示BLOCK中object的大小。

SLOB函数说明

SLOB主要函数功能如下表：

函数接口SLOB	函数功能
set_slob	设置slob空闲对象头部slob_t，包括本空闲对象大小（以单元为单位，而非字节）和下一空闲对象相对于所在物理页起始地址的偏移。
slob_page	将虚拟地址所对应的页框page对象转换成slab_page类型
slob_last	通过next offset是否起出偏移一页范围判断当前空闲slob对象是否为free list中链尾元素。
slob_page_alloc	查找指定page的freelist链并分配一个大小合适的空闲对象
slob_new_pages	根据node调用alloc_pages或alloc_pages_exact_node调用buddy分配器分配连续order个pages
slob_alloc	slob分配器分配对象接口，从slob中分配一个对象
slob_free	slob分配器释放对象接口，将一个block释放回slob分配器

函数接口KMALLOC	函数功能
__kmalloc_node	对slob_alloc的函数的封装若分配空间+对齐空间大于一页空间时，则直接从buddy分配器中分配连续order页并返回；否则，调用slob_alloc分配器进行分配block，此时align空间用于存放block中object大小
kfree	释放block，对于slob_free的封装。
ksize	返回block所占用的空间，对于大于页的分配，则返回object的真实大小。若为slob block，则返回对象占用空间

函数接口-KMEM_CACHE	函数功能
kmem_cache_create	从slob分配器中分配一个kmem_cache对象，并对其进行初始化
kmem_cache_destroy	将kmem_cache内存释放回slob分配器
kmem_cache_alloc_node	根据kmem_cache所提供有关对象的信息，分配一个block对象。若kmem_cache中size小于一页，则调用slob_alloc从slob分配器中分配一个block。否则，直接调用buddy分配器分配
__kmem_cache_free	释放通过kmem_cache所分配的block
kmem_rcu_free	采用rcu方式释放通过kmem_cache所分配的block
kmem_cache_free	kmem_cache释放block的接口，根据kmem_cache中flags字段判断采用kmem_rcu_free还是__kmem_cache_free方式释放block

SLOB分配器对外提供两个接口，KMEM_CACHE方式、以及KMALLOC方式。KMALLOC方式指定object size直接从三个空闲slob链中分配BLOCK。而KMEM_CACHE方式，则需先创建一个kmem_cache对象，其中存储固定分配block的size、初始化函数ctor等。

SLOB分配器分配和释放的两个核心函数接口为：slob_alloc和slob_free，其流程如下图所示。slob_alloc对于分配大于PAGE_SIE时，本质上直接调用alloc_pages()分配compound pages。否则找到BLOCK_SIZE > REQUIRE_UNITS的BLOCK，对其分配REQUIRE_UNITS，将该BLOCK从slob free block list中移除，并将剩余空间合并归一，放入slob free block list中。若此时slob处于全满状态，则将slob page 从free slob page list中移除。

slob_free，对于大于PAGE_SIZE的释放，直接调用free_pages()释放BLOCK。否则，将BLOCK合并归一放入slob free block list。若slob此时完全空闲，则调用free_pages()释放该页。若slob先前处于全满状态，则将该slob重新放入free_slab_small半满队列。

SLOB分配器alloc函数流程图

SLOB分配器slob_free函数流程图

对于KMEM_CACHE分配和释放函数接口为：kmem_cache_alloc_node和kmem_cache_free。kmem_cache_alloc_node所需分配BLOCK的信息由kmem_cache描述，并可初始化该对象。kmem_cache中flags设置采用RCU琐机制，则为BLOCK分配slob_rcu对象空间。当利用kmem_cache_free释放时，若采用RCU琐机制，则采用call_rcu注册回调函数kmem_rcu_free的方式，对BLOCK进行异步释放。

SLOB分配器源码

001 /*

002 * SLOB Allocator: Simple List Of Blocks

003 *

004 * Matt Mackall <mpm@selenic.com> 12/30/03

005 *

006 * NUMA support by Paul Mundt, 2007.

007 *

008 * How SLOB works:

009 *

010 * The core of SLOB is a traditional K&R style heap allocator, with

011 * support for returning aligned objects. The granularity of this

012 * allocator is as little as 2 bytes, however typically most architectures

013 * will require 4 bytes on 32-bit and 8 bytes on 64-bit.

014 *

015 * The slob heap is a set of linked list of pages from alloc_pages(),

016 * and within each page, there is a singly-linked list of free blocks

017 * (slob_t). The heap is grown on demand. To reduce fragmentation,

018 * heap pages are segregated into three lists, with objects less than

019 * 256 bytes, objects less than 1024 bytes, and all other objects.

020 *

021 * Allocation from heap involves first searching for a page with

022 * sufficient free blocks (using a next-fit-like approach) followed by

023 * a first-fit scan of the page. Deallocation inserts objects back

024 * into the free list in address order, so this is effectively an

025 * address-ordered first fit.

026 *

027 * Above this is an implementation of kmalloc/kfree. Blocks returned

028 * from kmalloc are prepended with a 4-byte header with the kmalloc size.

029 * If kmalloc is asked for objects of PAGE_SIZE or larger, it calls

030 * alloc_pages() directly, allocating compound pages so the page order

031 * does not have to be separately tracked, and also stores the exact

032 * allocation size in page->private so that it can be used to accurately

033 * provide ksize(). These objects are detected in kfree() because slob_page()

034 * is false for them.

035 *

036 * SLAB is emulated on top of SLOB by simply calling constructors and

037 * destructors for every SLAB allocation. Objects are returned with the

038 * 4-byte alignment unless the SLAB_HWCACHE_ALIGN flag is set, in which

039 * case the low-level allocator will fragment blocks to create the proper

040 * alignment. Again, objects of page-size or greater are allocated by

041 * calling alloc_pages(). As SLAB objects know their size, no separate

042 * size bookkeeping is necessary and there is essentially no allocation

043 * space overhead, and compound pages aren't needed for multi-page

044 * allocations.

045 *

046 * NUMA support in SLOB is fairly simplistic, pushing most of the real

047 * logic down to the page allocator, and simply doing the node accounting

048 * on the upper levels. In the event that a node id is explicitly

049 * provided, alloc_pages_exact_node() with the specified node id is used

050 * instead. The common case (or when the node id isn't explicitly provided)

051 * will default to the current node, as per numa_node_id().

052 *

053 * Node aware pages are still inserted in to the global freelist, and

054 * these are scanned for by matching against the node id encoded in the

055 * page flags. As a result, block allocations that can be satisfied from

056 * the freelist will only be done so on pages residing on the same node,

057 * in order to prevent random node placement.

058 */

SLOB适用于嵌入式系统，通常也将其归为slab类分配器。它的代码非常小巧、简单，管理也没有LINUX所提供的SLAB分配器复杂，例如没有per cpu cache。SLOB，为减少碎片，按对象大小将pages链接为三种队列：< 256 bytes、< 1024 bytes、其它大小。对于大于PAGE_SIZE的对象，直接调用alloc_pages分配对象，并将size_order在于首page的private字段中。page内部所有空闲对象都被链成一个free list，由slob_t结构描述。

059

060 #include <linux/kernel.h>

061 #include <linux/slab.h>

062 #include <linux/mm.h>

063 #include <linux/swap.h> /* struct reclaim_state */

064 #include <linux/cache.h>

065 #include <linux/init.h>

066 #include <linux/module.h>

067 #include <linux/rcupdate.h>

068 #include <linux/list.h>

069 #include <linux/kmemtrace.h>

070 #include <linux/kmemleak.h>

071 #include <asm/atomic.h>

072

073 /*

074 * slob_block has a field 'units', which indicates size of block if +ve,

075 * or offset of next block if -ve (in SLOB_UNITs).

076 *

077 * Free blocks of size 1 unit simply contain the offset of the next block.

078 * Those with larger size contain their size in the first SLOB_UNIT of

079 * memory, and the offset of the next free block in the second SLOB_UNIT.

080 */

081 #if PAGE_SIZE <= (32767 * 2)

082 typedef s16 slobidx_t;

083 #else

084 typedef s32 slobidx_t;

085 #endif

086

087 struct slob_block {

088 slobidx_t units;

089 };

090 typedef struct slob_block slob_t;

/*

slob_page，是对page结构体的重新“解释”。对照slob_page结构体，可以找到page与slob_page的映射关系。我将slob_page字段注释到page结构体中。slobidx_t的大小，对于4KB页而言，是2个字节。则对应关系将发生2个字节的偏移，但是这对page结构体的内存是没有影响的，因为所占用的是SLUB分配器的相关字段。

034 struct page {

035         unsigned long flags; /* slob_page：flags */

037         atomic_t_count; /* slob_page: _count */

038         union { /* slob_page: units (slobidx_t maybe 2 bytes oo~)*/

039                 atomic_t_mapcount;

043                 struct {

044                         u16inuse;

045                         u16objects;

046                 };

047         };

048         union { /* slob_page: pad[2] */

049             struct {

050                 unsigned long private;

057                 struct address_space *mapping;

064             };

068             struct kmem_cache *slab;

069             struct page *first_page;

070         };

071         union { /* slob_page: free, first free slob_t in page */

072                 pgoff_tindex;

073                 void *freelist;

074         };

075         struct list_headlru; /* slob_page: list, linked list of free pages */

};

*/

092 /*

093 * We use struct page fields to manage some slob allocation aspects,

094 * however to avoid the horrible mess in include/linux/mm_types.h, we'll

095 * just define our own struct page type variant here.

096 */

097 struct slob_page {

098 union {

099 struct {

100 unsigned long flags; /* mandatory */

101 atomic_t _count; /* mandatory */

102 slobidx_t units; /* free units left in page */

103 unsigned long pad[2];

104 slob_t *free; /* first free slob_t in page */

105 struct list_head list; /* linked list of free pages */

106 };

107 struct page page;

108 };

109 };

110 static inline void struct_slob_page_wrong_size(void)

111 { BUILD_BUG_ON(sizeof(struct slob_page) != sizeof(struct page)); }

112

113 /*

114 * free_slob_page: call before a slob_page is returned to the page allocator.

115 */

116 static inline void free_slob_page(struct slob_page *sp)

117 {

118 reset_page_mapcount(&sp->page);

119 sp->page.mapping = NULL;

120 }

121

122 /*

123 * All partially free slob pages go on these lists.

124 */

125 #define SLOB_BREAK1 256

126 #define SLOB_BREAK2 1024

127 static LIST_HEAD(free_slob_small);

128 static LIST_HEAD(free_slob_medium);

129 static LIST_HEAD(free_slob_large);

130

131 /*

132 * is_slob_page: True for all slob pages (false for bigblock pages)

133 */

134 static inline int is_slob_page(struct slob_page *sp)

135 {

136 return PageSlab((struct page *)sp);

137 }

138

139 static inline void set_slob_page(struct slob_page *sp)

140 {

141 __SetPageSlab((struct page *)sp);

142 }

143

144 static inline void clear_slob_page(struct slob_page *sp)

145 {

146 __ClearPageSlab((struct page *)sp);

147 }

/* 将一个虚拟地址转换成对应的page地址，并page对象转换成slob_page类型 */

149 static inline struct slob_page *slob_page(const void *addr)

150 {

151 return (struct slob_page *)virt_to_page(addr);

152 }

153

154 /*

155 * slob_page_free: true for pages on free_slob_pages list.

156 */

157 static inline int slob_page_free(struct slob_page *sp)

158 {

159 return PageSlobFree((struct page *)sp);

160 }

161

/* 将slob page添加进相应free slob list */

162 static void set_slob_page_free(struct slob_page *sp, struct list_head *list)

163 {

164 list_add(&sp->list, list);

165 __SetPageSlobFree((struct page *)sp);

166 }

167

168 static inline void clear_slob_page_free(struct slob_page *sp)

169 {

170 list_del(&sp->list);

171 __ClearPageSlobFree((struct page *)sp);

172 }

173

174 #define SLOB_UNIT sizeof(slob_t)

175 #define SLOB_UNITS(size) (((size) + SLOB_UNIT - 1)/SLOB_UNIT)

176 #define SLOB_ALIGN L1_CACHE_BYTES

177

178 /*

179 * struct slob_rcu is inserted at the tail of allocated slob blocks, which

180 * were created with a SLAB_DESTROY_BY_RCU slab. slob_rcu is used to free

181 * the block using call_rcu.

182 */

183 struct slob_rcu {

184 struct rcu_head head;

189 * slob_lock protects all slob allocator structures.

190 */

191 static DEFINE_SPINLOCK(slob_lock);

192

193 /*

194 * Encode the given size and next info into a free slob block s.

195 */

196 static void set_slob(slob_t *s, slobidx_t size, slob_t *next)

197 {

/* 获取free slob_t所在PHY_FN的内存起始地址base，并计算next slob相对于base的偏移大小。为什么不直接存储虚拟地址？？？在32位机器上，虚拟地址大小为32 bits，而当PAGE_SIZE = 4KB（12 bits）时，偏移大小units为16 bits，省2 bytes勒! */

198 slob_t *base = (slob_t *)((unsigned long)s & PAGE_MASK);

199 slobidx_t offset = next - base;

/* 当对象大小大于1 byte时，该slab有两个units，-offset，表示1 bytes空闲对象 */

201 if (size > 1) {

202 s[0].units = size;

203 s[1].units = offset;

204 } else

/* 当对象大小为1 byte时，该slab只有一个units，-offset，表示1 bytes空闲对象 */

205 s[0].units = -offset;

206 }

207

208 /*

209 * Return the size of a slob block.

210 */

211 static slobidx_t slob_units(slob_t *s)

212 {

213 if (s->units > 0)

219 * Return the next free slob block pointer after this one.

220 */

221 static slob_t *slob_next(slob_t *s)

222 {

223 slob_t *base = (slob_t *)((unsigned long)s & PAGE_MASK);

224 slobidx_t next;

225

226 if (s[0].units < 0)

227 next = -s[0].units;

228 else

229 next = s[1].units;

230 return base+next;

231 }

232

233 /*

234 * Returns true if s is the last free block in its page.

235 */

/* 若slob的next offset超出一页面大小，则无效。PAGE_MASK = 2 ^ 12 - 1，故offset = 2 ^ 12时，即将nex设置成下一物理页起始地址时，表示offset无效，故当前slob_t为block free list链尾 */

236 static int slob_last(slob_t *s)

237 {

238 return !((unsigned long)slob_next(s) & ~PAGE_MASK);

239 }

240

241 static void *slob_new_pages(gfp_t gfp, int order, int node)

242 {

243 void *page;

244

245 #ifdef CONFIG_NUMA

246 if (node != -1)

247 page = alloc_pages_exact_node(node, gfp, order);

248 else

249 #endif

250 page = alloc_pages(gfp, order);

251

252 if (!page)

253 return NULL;

/* 只能分配低端内存哦，亲！~ */

255 return page_address(page);

256 }

257

258 static void slob_free_pages(void *b, int order)

259 {

/* 更当前进程slab记账信息 */

260 if (current->reclaim_state)

261 current->reclaim_state->reclaimed_slab += 1 << order;

262 free_pages((unsigned long)b, order);

263 }

264

265 /*

266 * Allocate a slob block within a given slob_page sp.

267 */

268 static void *slob_page_alloc(struct slob_page *sp, size_t size, int align)

269 {

270 slob_t *prev, *cur, *aligned = NULL;

/* units = 将对象字节个数换算成单元个数。当对象处于空闲状态时，slob_t结构复用空闲对象的头部空间 */

271 int delta = 0, units =SLOB_UNITS(size);

272

273 for (prev = NULL, cur = sp->free; ; prev = cur, cur = slob_next(cur)) {

274 slobidx_t avail =slob_units(cur);

275

276 if (align) {

/* 将cur以align字节对齐 */

277 aligned = (slob_t *)ALIGN((unsigned long)cur, align);

/* slob_t指针相减，结果为指针间slob_t元素个数 */

278 delta = aligned - cur;

279 }

/* 当前剩余空间足以存放经过align对齐之后的对象 */

280 if (avail >= units + delta) { /* room enough? */

281 slob_t *next;

/* 若当前slob由于内存对齐可划分出两个slob，则将第二个slob对象aligned指向next，将第一个对象原cur指向aligned。并更新prev、cur使其指向两个slob */

283 if (delta) { /* need to fragment head to align? */

284 next = slob_next(cur);

285 set_slob(aligned, avail - delta, next);

286 set_slob(cur, delta, aligned);

287 prev = cur;

288 cur = aligned;

289 avail = slob_units(cur);

290 }

291

292 next = slob_next(cur);

/* 当新slob对象cur空间大小正好与所需对象大小完全匹配，则直接将cur从freee list中移除 */

293 if (avail == units) { /* exact fit? unlink. */

294 if (prev)

295 set_slob(prev, slob_units(prev), next);

296 else

297 sp->free = next;

298 } else { /* fragment */

/* 否则slob对象大于对象所需单元，将产生碎片。故将cur从free list移除时将该碎片添入free list */

299 if (prev)

300 set_slob(prev, slob_units(prev), cur + units);

301 else

302 sp->free = cur + units;

303 set_slob(cur + units, avail - units, next);

304 }

306 sp->units -= units;

307 if (!sp->units)

308 clear_slob_page_free(sp);

309 return cur;

310 }

/* 若循环至free list链表未尾，还未发现足以存放object_size的空闲对象，则结束循环并返回 */

311 if (slob_last(cur))

317 * slob_alloc: entry point into the slob allocator.

318 */

319 static void *slob_alloc(size_t size, gfp_t gfp, int align, int node)

320 {

321 struct slob_page *sp;

322 struct list_head *prev;

323 struct list_head *slob_list;

324 slob_t *b = NULL;

325 unsigned long flags;

326

327 if (size < SLOB_BREAK1)

328 slob_list = &free_slob_small;

329 else if (size < SLOB_BREAK2)

330 slob_list = &free_slob_medium;

331 else

332 slob_list = &free_slob_large;

333

334 spin_lock_irqsave(&slob_lock, flags);

335 /* Iterate through each partially free page, try to find room */

/* 循环遍历slob list中空闲slob_page，并调用slob_page_alloc尝试分配size的空闲块slob block */

336 list_for_each_entry(sp, slob_list, list) {

337 #ifdef CONFIG_NUMA

338 /*

339 * If there's a node specification, search for a partial

340 * page with a matching node id in the freelist.

341 */

342 if (node != -1 && page_to_nid(&sp->page) != node)

343 continue;

344 #endif

345 /* Enough room on this page? */

346 if (sp->units < SLOB_UNITS(size))

347 continue;

348

349 /* Attempt to alloc */

350 prev = sp->list.prev;

351 b = slob_page_alloc(sp, size, align);

352 if (!b)

353 continue;

354

355 /* Improve fragment distribution and reduce our average

356 * search time by starting our next search here. (see

357 * Knuth vol 1, sec 2.5, pg 449) */

/* 此时已经从sp分配空闲对象，若sp非链首或链尾元素，则根据knunth那神书上的算法，将slob_list指针指向sp，下次从此开始搜索可以有效减少平均搜索时间以及提高碎片随机分布 */

358 if (prev != slob_list->prev &&

359 slob_list->next != prev->next)

360 list_move_tail(slob_list, prev->next);

361 break;

362 }

363 spin_unlock_irqrestore(&slob_lock, flags);

/* 若在slob list中所有slob都没有满足条件的空闲对象，则新建一个slob page对象 */

365 /* Not enough space: must allocate a new page */

366 if (!b) {

367 b = slob_new_pages(gfp & ~__GFP_ZERO, 0, node);

368 if (!b)

369 return NULL;

/* 分配内存对应的页框page转换成slob_page类型，并调用__SetPageSlab将页框flags设置相应标记 */

370 sp = slob_page(b);

371 set_slob_page(sp);

372

373 spin_lock_irqsave(&slob_lock, flags);

/* 设置slob page的单元数及空闲链首地址 */

374 sp->units = SLOB_UNITS(PAGE_SIZE);

375 sp->free = b;

376 INIT_LIST_HEAD(&sp->list);

/* 设置slob整页内存空闲块的块头描述符slob_t中的空闲大小。将next offset设置成下一页的unit偏移时，表示b为sp的free链表为最后一个slob block */

377 set_slob(b, SLOB_UNITS(PAGE_SIZE), b + SLOB_UNITS(PAGE_SIZE));

/* 将新分配内存页的slob page将入slob_list中 */

378 set_slob_page_free(sp, slob_list);

379 b = slob_page_alloc(sp, size, align);

380 BUG_ON(!b);

381 spin_unlock_irqrestore(&slob_lock, flags);

382 }

383 if (unlikely((gfp & __GFP_ZERO) && b))

384 memset(b, 0, size);

389 * slob_free: entry point into the slob allocator.

390 */

391 static void slob_free(void *block, int size)

392 {

393 struct slob_page *sp;

394 slob_t *prev, *next, *b = (slob_t *)block;

395 slobidx_t units;

396 unsigned long flags;

397

398 if (unlikely(ZERO_OR_NULL_PTR(block)))

399 return;

400 BUG_ON(!size);

401

402 sp = slob_page(block);

403 units = SLOB_UNITS(size);

404

405 spin_lock_irqsave(&slob_lock, flags);

/* 若释放block后，slob page整个页的内存空间为空，则释放该slob page */

407 if (sp->units + units == SLOB_UNITS(PAGE_SIZE)) {

408 /* Go directly to page allocator. Do not pass slob allocator */

/* 若该page在free_slob_pages list中，则将该slob page从链表中删除，并调用__ClearPageSlobFree清除page中相关记录标志 */

409 if (slob_page_free(sp))

410 clear_slob_page_free(sp);

411 spin_unlock_irqrestore(&slob_lock, flags);

/* __ClearPageSlab */

412 clear_slob_page(sp);

/* reset_page_mapcount()，并清空page的mapping字段 */

413 free_slob_page(sp);

/* 将当前进程的recliam_state记账，并调用free_pages 将page释放到zone mem allocator中 */

414 slob_free_pages(b, 0);

415 return;

416 }

/* 若当前slob page除block之外，还有未释放的other blocks，即部分满slob 或全满slob */

/* 若slob page不在free_slob_pages链表中，（即slob page为全满slob）则将该slob page加入free_slob_small链表中 */

418 if (!slob_page_free(sp)) {

419 /* This slob page is about to become partially free. Easy! */

420 sp->units = units;

421 sp->free = b;

422 set_slob(b, units,

423 (void *)((unsigned long)(b +

424 SLOB_UNITS(PAGE_SIZE)) & PAGE_MASK)); /* 将当前唯一空闲block加入slob page的free链表 */

425 set_slob_page_free(sp, &free_slob_small); goto out;

427 }

428

429 /*

430 * Otherwise the page is already partially free, so find reinsertion

431 * point.

432 */

/* 若slob page是部分满blocks的slob */

433 sp->units += units;

/* 若要释放的block在free list first block之前，则将block插入并更新free链首。若block紧邻first block，则插入时还需合并空闲块 */

435 if (b < sp->free) {

436 if (b + units == sp->free) {

437 units += slob_units(sp->free);

438 sp->free = slob_next(sp->free);

439 }

440 set_slob(b, units, sp->free);

441 sp->free = b;

442 } else {

/* 由于slob free block list是按地址升序组织成链，故需先查找到适当的插入位置，并更新链表。 */

443 prev = sp->free;

444 next = slob_next(prev);

445 while (b > next) {

446 prev = next;

447 next = slob_next(prev);

448 }

/* 此时地址关系为： prev < block <= next */

/* 若prev存在且prev和block是邻居，则将prev合并到block，并将block的offset链接到next */

450 if (!slob_last(prev) && b + units == next) {

451 units += slob_units(next);

452 set_slob(b, units, slob_next(next));

453 } else

/* 否则只是简单地将block的offset链接到next */

454 set_slob(b, units, next);

455

/* 若prev和block是邻居，则将block合并到prev，并将prev的offset链接到block的offset */

456 if (prev + slob_units(prev) == b) {

457 units = slob_units(b) + slob_units(prev);

458 set_slob(prev, units, slob_next(b));

459 } else

/* 否则只是简单地将prev的offset链接到block */

460 set_slob(prev, slob_units(prev), b);

461 }

462 out:

463 spin_unlock_irqrestore(&slob_lock, flags);

464 }

465

466 /*

467 * End of slob allocator proper. Begin kmem_cache_alloc and kmalloc frontend.

468 */

469

470 void *__kmalloc_node(size_t size, gfp_t gfp, int node)

471 {

472 unsigned int *m;

473 int align = max(ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN);

474 void *ret;

475

476 lockdep_trace_alloc(gfp);

/* 如果对分配空间+ align空间之后，还不超过一页大小时，则直接调用slob分配器接口slob_alloc分配block。此处align空间用于存放对象大小，即block_size >= obj_size + align_size */

478 if (size < PAGE_SIZE - align) {

479 if (!size)

480 return ZERO_SIZE_PTR;

481

482 m = slob_alloc(size + align, gfp, align, node);

483

484 if (!m)

485 return NULL;

/* 将对象大小存放在block的头sizeof(size_t)空间中，紧随其后的是对象空间。block = (align + object) */

486 *m = size;

487 ret = (void *)m + align;

488

489 trace_kmalloc_node(_RET_IP_, ret,

490 size, size + align, gfp, node);

491 } else {

/* 如果所需分配空间超过一页大小，则直接调用slob_new_pages从buddy分配器中分配连续order个compound page（order > 1的连续页面）*/

492 unsigned int order = get_order(size);

493

494 ret = slob_new_pages(gfp | __GFP_COMP, get_order(size), node);

495 if (ret) {

496 struct page *page;

497 page = virt_to_page(ret);

498 page->private = size;

499 }

500

501 trace_kmalloc_node(_RET_IP_, ret,

502 size, PAGE_SIZE << order, gfp, node);

503 }

504

505 kmemleak_alloc(ret, size, 1, gfp);

506 return ret;

507 }

508 EXPORT_SYMBOL(__kmalloc_node);

509

510 void kfree(const void *block)

511 {

512 struct slob_page *sp;

513

514 trace_kfree(_RET_IP_, block);

515

516 if (unlikely(ZERO_OR_NULL_PTR(block)))

517 return;

518 kmemleak_free(block);

519

520 sp = slob_page(block);

/* 若所释放的空间属于slob，此处block_addr = object_addr – align，当分配对象时即给其附加分配align大小空间，见__kmalloc_node() */

521 if (is_slob_page(sp)) {

522 int align = max(ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN);

523 unsigned int *m = (unsigned int *)(block - align);

524 slob_free(m, *m + align);

525 } else

/* ??? 调用put_page释放该页 */

526 put_page(&sp->page);

527 }

528 EXPORT_SYMBOL(kfree);

529

530 /* can't use ksize for kmem_cache_alloc memory, only kmalloc */

531 size_t ksize(const void *block)

532 {

533 struct slob_page *sp;

534

535 BUG_ON(!block);

536 if (unlikely(block == ZERO_SIZE_PTR))

537 return 0;

538

539 sp = slob_page(block);

/* 返回object占用的空间，有可能 > object_size，由于slob是按SLOB_UNIT为单位进行分配。 */

540 if (is_slob_page(sp)) {

541 int align = max(ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN);

542 unsigned int *m = (unsigned int *)(block - align);

543 return SLOB_UNITS(*m) * SLOB_UNIT;

544 } else

/* 若调用kmalloc_node分配大于1页空间的对象，则返回compound page，在page的private记录object对象的字节大小 */

545 return sp->page.private;

546 }

547 EXPORT_SYMBOL(ksize);

548

/* 有了__kmalloc_node，已经可以使用slob，但为什么还要有kmem_cache？因为通过kmem_cache分配时，分配相同size的block，不需要在block内部还额外存储align空间用以存放object大小，但若定义SLAB_DESTROY_BY_RCU，则block_size = object_size + rcu_size。kmem_cache并可指定构造函数，对其初始化。另一方面，使用__kmalloc_node的好处是：分配可变大小的object。 */

549 struct kmem_cache {

550 unsigned int size, align;

551 unsigned long flags;

552 const char *name;

553 void (*ctor)(void *);

554 };

/* 从slob分配器中分配一个kmem_cache对象，并对其进行初始化 */

556 struct kmem_cache *kmem_cache_create(const char *name, size_t size,

557 size_t align, unsigned long flags, void (*ctor)(void *))

558 {

559 struct kmem_cache *c;

560

561 c = slob_alloc(sizeof(struct kmem_cache),

562 GFP_KERNEL, ARCH_KMALLOC_MINALIGN, -1);

/* 若定义了按slab destroy by rcu，则block_size = object_size + slob_rcu_size。在slob_rcu的size字段，还存放object_size大小 */

567 if (flags & SLAB_DESTROY_BY_RCU) {

568 /* leave room for rcu footer at the end of object */

569 c->size += sizeof(struct slob_rcu);

570 }

571 c->flags = flags;

572 c->ctor = ctor;

573 /* ignore alignment unless it's forced */

574 c->align = (flags & SLAB_HWCACHE_ALIGN) ? SLOB_ALIGN : 0;

575 if (c->align < ARCH_SLAB_MINALIGN)

576 c->align = ARCH_SLAB_MINALIGN;

577 if (c->align < align)

578 c->align = align;

579 } else if (flags & SLAB_PANIC)

580 panic("Cannot create slab cache %s\n", name);

581

582 kmemleak_alloc(c, sizeof(struct kmem_cache), 1, GFP_KERNEL);

583 return c;

584 }

585 EXPORT_SYMBOL(kmem_cache_create);

/* 将kmem_cache内存释放回slob分配器 */

587 void kmem_cache_destroy(struct kmem_cache *c)

588 {

589 kmemleak_free(c);

590 if (c->flags & SLAB_DESTROY_BY_RCU)

591 rcu_barrier();

592 slob_free(c, sizeof(struct kmem_cache));

593 }

594 EXPORT_SYMBOL(kmem_cache_destroy);

595

/* 根据kmem_cache所提供有关对象的信息，分配一个block对象。若kmem_cache中size小于一页，则调用slob_alloc从slob分配器中分配一个block。否则，直接调用buddy分配器分配 */

596 void *kmem_cache_alloc_node(struct kmem_cache *c, gfp_t flags, int node)

597 {

598 void *b;

599

600 if (c->size < PAGE_SIZE) {

/* 注意：仅要求slob分配object_size哦，未像__kmalloc_node还包含align_size~！因为object_size可以记录在kmem_cache的size字段中。 */

601 b = slob_alloc(c->size, flags, c->align, node);

602 trace_kmem_cache_alloc_node(_RET_IP_, b, c->size,

603 SLOB_UNITS(c->size) * SLOB_UNIT,

604 flags, node);

605 } else {

606 b = slob_new_pages(flags, get_order(c->size), node);

607 trace_kmem_cache_alloc_node(_RET_IP_, b, c->size,

608 PAGE_SIZE << get_order(c->size),

609 flags, node);

610 }

611

/* 若调用__kmalloc_node，则不能进行初始化啦~~这也是kmem_cache_alloc_node的一个优势 */

612 if (c->ctor)

613 c->ctor(b);

614

615 kmemleak_alloc_recursive(b, c->size, 1, c->flags, flags);

616 return b;

617 }

618 EXPORT_SYMBOL(kmem_cache_alloc_node);

619

/* 释放通过kmem_cache所创建的对象 */

620 static void __kmem_cache_free(void *b, int size)

621 {

622 if (size < PAGE_SIZE)

/* block无需做偏移调整 */

623 slob_free(b, size);

624 else

625 slob_free_pages(b, get_order(size));

626 }

627

/* 若slob采用rcu释放方式，则当释放一个rcu block时，由于其内存格局如下：block = [object | slob_rcu]，且slob_rcu的size为block_size */

628 static void kmem_rcu_free(struct rcu_head *head)

629 {

630 struct slob_rcu *slob_rcu = (struct slob_rcu *)head;

631 void *b = (void *)slob_rcu - (slob_rcu->size - sizeof(struct slob_rcu));

632

633 __kmem_cache_free(b, slob_rcu->size);

634 }

635

/* 通过kmem_cache接口将block释放回slob分配器。释放时，若定义了SLAB_DESTROY_BY_RCU，则采用rcu方式释放block */

636 void kmem_cache_free(struct kmem_cache *c, void *b)

637 {

638 kmemleak_free_recursive(b, c->flags);

639 if (unlikely(c->flags & SLAB_DESTROY_BY_RCU)) {

640 struct slob_rcu *slob_rcu;

641 slob_rcu = b + (c->size - sizeof(struct slob_rcu));

642 INIT_RCU_HEAD(&slob_rcu->head);

643 slob_rcu->size = c->size;

/* 将kmem_rcu_free回调函数注册到rcu回调链上并立即返回。一旦所有CPU都已完成临界区操作，kmem_rcu_free函数将被调用，以释放不在被使用的block */

644 call_rcu(&slob_rcu->head, kmem_rcu_free);

645 } else {

646 __kmem_cache_free(b, c->size);

647 }

648

649 trace_kmem_cache_free(_RET_IP_, b);

650 }

651 EXPORT_SYMBOL(kmem_cache_free);

652

653 unsigned int kmem_cache_size(struct kmem_cache *c)

654 {

655 return c->size;

656 }

657 EXPORT_SYMBOL(kmem_cache_size);

658

659 const char *kmem_cache_name(struct kmem_cache *c)

660 {

661 return c->name;

662 }

663 EXPORT_SYMBOL(kmem_cache_name);

664

665 int kmem_cache_shrink(struct kmem_cache *d)

666 {

667 return 0;

668 }

669 EXPORT_SYMBOL(kmem_cache_shrink);

670

671 int kmem_ptr_validate(struct kmem_cache *a, const void *b)

676 static unsigned int slob_ready __read_mostly;

677

678 int slab_is_available(void)

679 {

680 return slob_ready;

681 }

682

683 void __init kmem_cache_init(void)

688 void __init kmem_cache_init_late(void)

689 {

690 /* Nothing to do */

691 }

参考资料

《slob: introduce the SLOB allocator》，http://lwn.net/Articles/157944/

《What is RCU, Fundamentally?》，http://lwn.net/Articles/262464/