levelDB源码阅读-skiplist跳表上

levelDB中的skiplist跳表上

前言

在之前的博客中已经简要介绍了跳表的原理，接下来将介绍跳表在levelDB中的实现。
本小节主要讲一下跳表和其Node的实现结构，关于具体的方法将在下一节中讲到。

代码位置：leveldb-master/db/skiplist.h

深入源码

`skiplist`结构

template<typename Key, class Comparator>
class SkipList {private://Node结构体声明struct Node;public:// Create a new SkipList object that will use "cmp" for comparing keys,// and will allocate memory using "*arena".  Objects allocated in the arena// must remain allocated for the lifetime of the skiplist object.//构造函数explicit SkipList(Comparator cmp, Arena* arena);// Insert key into the list.// REQUIRES: nothing that compares equal to key is currently in the list.//插入函数声明void Insert(const Key& key);// Returns true iff an entry that compares equal to key is in the list.bool Contains(const Key& key) const;//省略部分为迭代器private:enum { kMaxHeight = 12 };// Immutable after construction//比较器Comparator const compare_;//内存池Arena* const arena_;    // Arena used for allocations of nodes//头部节点Node* const head_;// Modified only by Insert().  Read racily by readers, but stale// values are ok.//整个list的高度port::AtomicPointer max_height_;   // Height of the entire list//get最高的高度inline int GetMaxHeight() const {return static_cast<int>(reinterpret_cast<intptr_t>(max_height_.NoBarrier_Load()));}// Read/written only by Insert().//在Insert中使用Random rnd_;//新建NodeNode* NewNode(const Key& key, int height);//获取随机高度int RandomHeight();//判断是否相等bool Equal(const Key& a, const Key& b) const { return (compare_(a, b) == 0); }// Return true if key is greater than the data stored in "n"//当前key是否在Node之后bool KeyIsAfterNode(const Key& key, Node* n) const;// Return the earliest node that comes at or after key.// Return nullptr if there is no such node.//// If prev is non-null, fills prev[level] with pointer to previous// node at "level" for every level in [0..max_height_-1].//找与key相等或比key大的节点//prev用于存放每一层中符合条件Node的前一个NodeNode* FindGreaterOrEqual(const Key& key, Node** prev) const;// Return the latest node with a key < key.// Return head_ if there is no such node.Node* FindLessThan(const Key& key) const;// Return the last node in the list.// Return head_ if list is empty.Node* FindLast() const;// No copying allowedSkipList(const SkipList&);void operator=(const SkipList&);
};

构造函数

在理解了跳表的原理，我们可以知道每一个高度的第一个节点总是head节点。

template<typename Key, class Comparator>
SkipList<Key,Comparator>::SkipList(Comparator cmp, Arena* arena): compare_(cmp),arena_(arena),head_(NewNode(0 /* any key will do */, kMaxHeight)),max_height_(reinterpret_cast<void*>(1)),rnd_(0xdeadbeef) {for (int i = 0; i < kMaxHeight; i++) {//给每一个高度上的head_设置next节点为nullptrhead_->SetNext(i, nullptr);}
}

跳表的结构其实比较简单，其内部成员比较重要的部分是整个List的头部Nodehead_，比较器compare_，内存池arena和一个枚举类型，限制了List的最大高度。（在leveldb实现中用的是height这个单词，为了理解方便我用了层次level这个说法，本质上是没有区别的，但是需要注意的是leveldb之所以称之为leveldb并不是因为跳表的结构，而是sstable的结构）

其中还有一个内部类Iterator迭代器，以及声明的一些函数，我将放在下一节讲这部分内容，接下来先了解一下跳表中的Node节点的结构。

`Node`-跳表的节点

之前提到，跳表是对链表的改进，在很多实现上其实跳表都和链表类似，但是为了实现跳表的独有特性，在一些地方做了改进。
我们的链表通常有单链表，双向循环链表等，都是比较规整的，而跳表可以说是不太规整，在不同的层次（或是高度height）上来看一个节点指向的next节点是不同的，所以这里使用了一个原子指数组来存放这些next节点，数组的下标值就是Node节点在的当前层次。下面来看源码实现：

// Implementation details follow
template<typename Key, class Comparator>
struct SkipList<Key,Comparator>::Node {//构造函数，防止隐式转换explicit Node(const Key& k) : key(k) { }Key const key;// Accessors/mutators for links.  Wrapped in methods so we can// add the appropriate barriers as necessary.//连接的访问器/变异器。用方法包装//这样我们就可以根据需要添加适当的屏障//参数n为指定高度Node* Next(int n) {assert(n >= 0);// Use an 'acquire load' so that we observe a fully initialized// version of the returned Node.return reinterpret_cast<Node*>(next_[n].Acquire_Load());}//设置当前节点的next节点，n指定高度，x为下一个节点void SetNext(int n, Node* x) {assert(n >= 0);// Use a 'release store' so that anybody who reads through this// pointer observes a fully initialized version of the inserted node.next_[n].Release_Store(x);}// No-barrier variants that can be safely used in a few locations.//无内存屏障Node* NoBarrier_Next(int n) {assert(n >= 0);return reinterpret_cast<Node*>(next_[n].NoBarrier_Load());}void NoBarrier_SetNext(int n, Node* x) {assert(n >= 0);next_[n].NoBarrier_Store(x);}private:// Array of length equal to the node height.  next_[0] is lowest level link.//数组的长度等于节点的高度，next_[0]是最底层的link//AtomicPointer：原子指针port::AtomicPointer next_[1];
};

数据库管理系统中事务具有四个特性：ACID原子性，一致性，隔离性和持久性。
原子性是指事务是一个不可再分割的工作单元，事务中的操作要么都发生，要么都不发生。

port::AtomicPointer 是一个封装类，其作用是保证操作的原子性。其采用了内存屏障（关于内存屏障这里不介绍太多）来实现同步，其内部成员其实是一个void*类型的指针。
代码位置：leveldb-master/port/atomic_pointer.h

class AtomicPointer {private://内部私有成员void* rep_;public:AtomicPointer() { }explicit AtomicPointer(void* p) : rep_(p) {}inline void* NoBarrier_Load() const { return rep_; }inline void NoBarrier_Store(void* v) { rep_ = v; }inline void* Acquire_Load() const {void* result = rep_;MemoryBarrier();return result;}inline void Release_Store(void* v) {//使用内存屏障来保证同步MemoryBarrier();rep_ = v;}
};

总结

在之前的博客中讲到了跳表的原理，本节主要讲了一下跳表的结构。
利用数组的下标值来表示高度是非常巧妙的一种做法，虽然跳表的整体结构可能不像链表那么规整，但是其实并不难理解，使用起来也和链表类似。
比如说访问下一个节点，在一般的链表中我们会用Node->next，在条表中实现了next()方法，我们可以使用Node->next(n)来访问，其中的n表示的是高度。
接下来我们将详细介绍跳表中的一些方法。