redis源码dict.c simple reading

2019独角兽企业重金招聘Python工程师标准>>>

0. precondition

首先，你需要知道Redis的六种数据结构，包括list, set, zset, string, hash, stream，本文的目标在于阐述Redis的hash结构的底层实现逻辑

其次，你需要知晓hash的几种常用命令，包括hset, hget, hlen, hscan等

再次，你需要知道hash的基本原理，以及解决冲突的几种常用方式（开放定址法，链地址法，公共溢出区），Redis采用的是链地址法解决冲突

1. start with HGET

Q: 当你输入`hget key filed`的时候，Redis会做哪些操作？

redis 将所有key保存在一个db里面，这个db的底层是一个dict字典结构（key-value, dictionary）
redis为每个key都存储了一个编码类型，包含上面介绍的六种类型
redis会判断 key的 encoding 是否是 OBJ_ENCODING_HT 类型，如果不是，返回WongTypeErr
从db里面取出这个key的value，这里的value实际上是一个字典（dict） O(1)
将field作为上一步value的key，获取真正的val O(1)
将上一步的val返回给客户端

st=>start: start
command=>inputoutput: hget key field
server=>operation: value = dictAddOrFind(db->dict, key)
dict=>operation: val = dictGetVal(value, field)
out=>inputoutput: val
e=>end: endst->command->server->dict->out->e

2. dict implementation

Q: 如何实现dict才能保证O(1)的查询效率？

根据经验，可以很快得出只有hash算法，才能做到O(1)的时间复杂度
为了解决冲突，Redis采用链地址法

Q: 如何设计hash表？

我们定义如下结构，用来表示hash表，其中dictEntry是一个{Key, Value, Next}的三元组，dictht是由n个dictEntry *构成的一个指针数组

typedef struct dictEntry {void *key;union {void *val;uint64_t u64;int64_t s64;double d;} v;struct dictEntry *next;
} dictEntry;typedef struct dictht {dictEntry **table;       // 指针数组unsigned long size;      // table 的长度unsigned long sizemask;  // size - 1, if size = 8,it looks like 111unsigned long used;      // 整个hash中元素的个数
} dictht;

![1](C:\Users\Administrator\Desktop\Redis Dict\1.png)

Q: 如何确定hash表的表长？

假设初始长度都为 long long 的最大值(2^64)，用hset key filed value 共导入100个不同的key，且每个key的filed长度不超过10，那么存储hash表将占用 100 * 2^64 * 4Bytes >>> 10^15 GB，消耗巨量空间
假设初始长度为4，每个key有n(n>=100)个field，那么平均查询时间为O(n/4) => O(n)，消耗太多时间
设定一个较小的初始值，然后根据情况动态调整hash表的表长，Redis通过两个条件来判断是否需要扩大或缩小hash表

used / size dict_can_resize dict_force_resize_ratio

当 used/size > 1 && (dict_can_resize == 1 || used/size > dict_force_resize_ratio) 时会扩大哈希表

当 used/size < 1 时，会缩小哈希表

Q: 如何动态调整hash表？

当Redis执行HSET或者HDEL的时候，需要进行如下操作：

st=>start: start
e=>end: end
input=>inputoutput: hset or hdel
cond=>condition: checkIfNeedRehash?
op=>operation: 构建新的哈希表
op1=>operation: 旧表数据迁移到新表
op2=>operation: 删除旧表st->input->cond(yes)->op->op1->op2->e
cond(no)->e

dictht *Rehash(dictht *old_ht) {// init new htnew_ht = zmalloc(new_size * sizeof(dictEntry *));dictInit(new_ht);// migrationfor (i = 0; i < old_ht->size; i++) {// get first element of current idxstart_ptr = old_ht->table[i];// migration all element of this idxwhile (start_ptr != NULL) {idx = hashFunction(start_ptr->key) & new_ht->sizemask;if (new_ht->table[idx]) {// 元素已存在，链地址法解决冲突} else {new_ht->table[idx] = start_ptr;}start_ptr = start_ptr->next;}}//returnreturn new_ht;
}

Q: rehash效率？

假设Redis移动一个元素耗时 0.1us，则Redis效率随着数据量增长发生如下变化：数据量 | rehash耗时 | 短时间内QPS影响 :-: | :-: | :-: 1W | 0.001s | ↓ 0.1% 10W | 0.01s | ↓ 1% 100W | 0.1s | ↓ 10% 1000W | 1s | ↓ 100%

Q: how to improve

当需要rehash的时候，先暂存new_ht，然后每次移动少量的数据，等到所有数据移动完成，在释放old_ht

typedef struct dict {dictType *type;void *privdata;dictht ht[2]; // ht[0]: old  ht[1]: newlong rehashidx; /* rehashing not in progress if rehashidx == -1 */unsigned long iterators; /* number of iterators currently running */
} dict;

伪代码

// return a status code to tell the caller if rehash has completed.
int dictRehash(dict *d, int n) {if (d->rehashidx == -1) return ; // 未发生Rehash// 移动N个槽，如果数据已经全部迁移，终止 while (n-- && d->ht[0].used != 0) {ptr = d->ht[0].table[d->rehashidx];// 将这个槽的所有元素全部移动新的哈希表while (ptr) {idx = d->type.hashFunctioin(ptr->key) & d->ht[1].sizemask;if (d->ht[1].table[idx]) {// 链地址法解决冲突} else {d->ht[1].table[idx] = ptr}d->ht[0].used--;d->ht[1].used++;}// 移动完成，置空d->ht[0].table[d->rehashidx] = NULL;d->rehashidx++; // 自增，准备移动下一个槽}if (d->ht[0].used == 0) {zfree(d->ht[0]);d->ht[0] = d->ht[1];d->ht[1] = NULL;d->rehashidx = -1; // must reset to -1return 0; // complete}return 1; // not complete, more to rehash
}

回忆一下Redis发生Rehash的条件，实际上used / size就是每个槽的最大元素个数，这个配置项默认是小于10的，所以每次移动不会超过10个key，那么按照上面的假设，Redis每次rehash都不会超过 10 * 0.1us = 1us的时间，这个时间对Redis的性能是忽略不计的。

下面是Redis的实现


/* Performs N steps of incremental rehashing. Returns 1 if there are still* keys to move from the old to the new hash table, otherwise 0 is returned.** Note that a rehashing step consists in moving a bucket (that may have more* than one key as we use chaining) from the old to the new hash table, however* since part of the hash table may be composed of empty spaces, it is not* guaranteed that this function will rehash even a single bucket, since it* will visit at max N*10 empty buckets in total, otherwise the amount of* work it does would be unbound and the function may block for a long time. */
int dictRehash(dict *d, int n) {int empty_visits = n*10; /* Max number of empty buckets to visit. */if (!dictIsRehashing(d)) return 0;while(n-- && d->ht[0].used != 0) {dictEntry *de, *nextde;/* Note that rehashidx can't overflow as we are sure there are more* elements because ht[0].used != 0 */assert(d->ht[0].size > (unsigned long)d->rehashidx);while(d->ht[0].table[d->rehashidx] == NULL) {d->rehashidx++;if (--empty_visits == 0) return 1;}de = d->ht[0].table[d->rehashidx];/* Move all the keys in this bucket from the old to the new hash HT */while(de) {uint64_t h;nextde = de->next;/* Get the index in the new hash table */h = dictHashKey(d, de->key) & d->ht[1].sizemask;de->next = d->ht[1].table[h];d->ht[1].table[h] = de;d->ht[0].used--;d->ht[1].used++;de = nextde;}d->ht[0].table[d->rehashidx] = NULL;d->rehashidx++;}/* Check if we already rehashed the whole table... */if (d->ht[0].used == 0) {zfree(d->ht[0].table);d->ht[0] = d->ht[1];_dictReset(&d->ht[1]);d->rehashidx = -1;return 0;}/* More to rehash... */return 1;
}

Q: Redis在什么时候rehash，在什么时候判断需要rehash？

实际上，在执行 hget, hget, hdel 等所有需要访问dict的时候，Redis都会判断哈希表是否在rehashing，如果是，则调用一次dictReash，这样就可以在更短的时间内完成rehash操作

在执行hset与hdel这样的更新操作时，Redis才会根据上面的条件判断哈希表是否需要进行rehash操作

3. Summary

本文初略的介绍了一下Redis的dict是如何设计与实现的，了解更多具体内容可以阅读Redis源码dict.h``dict.c这两个文件。

下一篇将介绍dict是如何处理scan操作的。

转载于:https://my.oschina.net/tigerBin/blog/3050350