布谷鸟哈希函数的参数_CuckooHash（布谷鸟散列）

概念：

定义：

CuckooHash(布谷鸟散列)是为了解决哈希冲突问题而提出，利用较少的计算换取较大的空间。

特点：

占用空间少，查询速度快。

来源：

之所以起这个名字是因为布谷鸟生性贪婪，不自己筑巢，而是在别的鸟巢里面鸟蛋孵化，先成长的幼鸟会将别的鸟蛋挤出，这样独享“母爱”，类似于哈希冲突处理过程。

算法描述:

使用hashA、hashB计算对应的key位置：

1、两个位置均为空，则任选一个插入；

2、两个位置中一个为空，则插入到空的那个位置

3、两个位置均不为空，则踢出一个位置后插入，被踢出的对调用该算法，再执行该算法找其另一个位置，循环直到插入成功。

4、如果被踢出的次数达到一定的阈值，则认为hash表已满，并进行重新哈希rehash

实现过程：

我们知道实现布谷鸟散列是需要一个散列函数的集合。因此，我们要定义一个接口来获取到这样的一个集合。

public interface HashFamily {

//根据which来选择散列函数，并返回hash值

int hash(AnyType x, int which);

//返回集合中散列函数的个数

int getNumberOfFunctions();

//获取到新的散列函数

void generateNewFunctions();

}

定义变量：

//定义最大装填因子为0.4

private static final double MAX_LOAD = 0.4;

//定义rehash次数达到一定时，进行

private static final int ALLOWED_REHASHES = 1;

//定义默认表的大小

private static final int DEFAULT_TABLE_SIZE = 101;

//定义散列函数集合

private final HashFamily super AnyType> hashFunctions;

//定义散列函数个数

private final int numHashFunctions;

//定义当前表

private AnyType[] array;

//定义当前表的大小

private int currentSize;

//定义rehash的次数

private int rehashes = 0;

//定义一个随机数

private Random r = new Random();

初始化操作：

public CuckooHashTable(HashFamily super AnyType> hf){

this(hf, DEFAULT_TABLE_SIZE);

}

//初始化操作

public CuckooHashTable(HashFamily super AnyType> hf, int size){

allocateArray(nextPrime(size));

doClear();

hashFunctions = hf;

numHashFunctions = hf.getNumberOfFunctions();

}

public void makeEmpty(){

doClear();

}

//清空操作

private void doClear(){

currentSize = 0;

for (int i = 0; i < array.length; i ++){

array[i] = null;

}

//初始化表

private void allocateArray(int arraySize){

array = (AnyType[]) new Object[arraySize];

}

定义hash函数：

/**

* @param x 当前的元素

* @param which 选取的散列函数对应的位置

* @return

private int myHash(AnyType x, int which){

//调用散列函数集合中的hash方法获取到hash值

int hashVal = hashFunctions.hash(x, which);

//再做一定的处理

hashVal %= array.length;

if (hashVal < 0){

hashVal += array.length;

}

return hashVal;

}

查询元素是否存在：

/**

* 查询元素的位置，若找到元素，则返回其当前位置，否则返回-1

* @param x

* @return

private int findPos(AnyType x){

//遍历散列函数集合，因为不确定元素所用的散列函数为哪个

for (int i = 0; i < numHashFunctions; i ++){

//获取到当前hash值

int pos = myHash(x, i);

//判断表中是否存在当前元素

if (array[pos] != null && array[pos].equals(x)){

return pos;

}

return -1;

}

public boolean contains(AnyType x){

return findPos(x) != -1;

}

删除元素：

/**

* 删除元素：先查询表中是否存在该元素，若存在，则进行删除该元素

* @param x

* @return

public boolean remove(AnyType x){

int pos = findPos(x);

if (pos != -1){

array[pos] = null;

currentSize --;

}

return pos != -1;

}

插入元素：

/**

* 插入：先判断该元素是否存在，若存在，在判断表的大小是否达到最大负载，

* 若达到，则进行扩展，最后调用insertHelper方法进行插入元素

* @param x

* @return

public boolean insert(AnyType x){

if (contains(x)){

return false;

}

if (currentSize >= array.length * MAX_LOAD){

expand();

}

return insertHelper(x);

}

具体的插入过程：

* a. 先遍历散列函数集合，找出元素所有的可存放的位置，若找到的位置为空，则放入即可，完成插入

* b. 若没有找到空闲位置，随机产生一个位置

* c. 将插入的元素替换随机产生的位置，并将要插入的元素更新为被替换的元素

* d. 替换后，回到步骤a.

* e. 若超过查找次数，还是没有找到空闲位置，那么根据rehash的次数，判断是否需要进行扩展表，若超过rehash的最大次数，则进行扩展表，否则进行rehash操作，并更新散列函数集合

private boolean insertHelper(AnyType x) {

//记录循环的最大次数

final int COUNT_LIMIT = 100;

while (true){

//记录上一个元素位置

int lastPos = -1;

int pos;

//进行查找插入

for (int count = 0; count < COUNT_LIMIT; count ++){

for (int i = 0; i < numHashFunctions; i ++){

pos = myHash(x, i);

//查找成功，直接返回

if (array[pos] == null){

array[pos] = x;

currentSize ++;

return true;

}

//查找失败，进行替换操作，产生随机数位置，当产生的位置不能与原来的位置相同

int i = 0;

do {

pos = myHash(x, r.nextInt(numHashFunctions));

} while (pos == lastPos && i ++ < 5);

//进行替换操作

AnyType temp = array[lastPos = pos];

array[pos] = x;

x = temp;

}

//超过次数，还是插入失败，则进行扩表或rehash操作

if (++ rehashes > ALLOWED_REHASHES){

expand();

rehashes = 0;

} else {

rehash();

}

扩表和rehash操作：

private void expand(){

rehash((int) (array.length / MAX_LOAD));

}

private void rehash(){

hashFunctions.generateNewFunctions();

rehash(array.length);

}

private void rehash(int newLength){

AnyType [] oldArray = array;

allocateArray(nextPrime(newLength));

currentSize = 0;

for (AnyType str : oldArray){

if (str != null){

insert(str);

}

进行测试：

public class CuckooHashTableTest {

//定义散列函数集合

private static HashFamily hashFamily = new HashFamily() {

//根据which选取不同的散列函数

@Override

public int hash(String x, int which) {

int hashVal = 0;

switch (which){

case 0:{

for (int i = 0; i < x.length(); i ++){

hashVal += x.charAt(i);

}

break;

}

case 1:

for (int i = 0; i < x.length(); i ++){

hashVal = 37 * hashVal + x.charAt(i);

}

break;

}

return hashVal;

}

//返回散列函数集合的个数

@Override

public int getNumberOfFunctions() {

return 2;

}

@Override

public void generateNewFunctions() {

}

};

public static void main(String[] args){

//定义布谷鸟散列

CuckooHashTable cuckooHashTable = new CuckooHashTable(hashFamily, 5);

String[] strs = {"abc","aba","abcc","abca"};

//插入

for (int i = 0; i < strs.length; i ++){

cuckooHashTable.insert(strs[i]);

}

//打印表

cuckooHashTable.printArray();

}

运行结果：

当前散列表如下：

表的大小为：13

current pos: 1 current value: abca

current pos: 3 current value: abcc

current pos: 6 current value: aba

current pos: 8 current value: abc

CuckooHashTable完整代码：

public class CuckooHashTable {

public CuckooHashTable(HashFamily super AnyType> hf){

this(hf, DEFAULT_TABLE_SIZE);

}

//初始化操作

public CuckooHashTable(HashFamily super AnyType> hf, int size){

allocateArray(nextPrime(size));

doClear();

hashFunctions = hf;

numHashFunctions = hf.getNumberOfFunctions();

}

public void makeEmpty(){

doClear();

}

public boolean contains(AnyType x){

return findPos(x) != -1;

}

/**

* @param x 当前的元素

* @param which 选取的散列函数对应的位置

* @return

private int myHash(AnyType x, int which){

//调用散列函数集合中的hash方法获取到hash值

int hashVal = hashFunctions.hash(x, which);

//再做一定的处理

hashVal %= array.length;

if (hashVal < 0){

hashVal += array.length;

}

return hashVal;

}

/**

* 查询元素的位置，若找到元素，则返回其当前位置，否则返回-1

* @param x

* @return

private int findPos(AnyType x){

//遍历散列函数集合，因为不确定元素所用的散列函数为哪个

for (int i = 0; i < numHashFunctions; i ++){

//获取到当前hash值

int pos = myHash(x, i);

//判断表中是否存在当前元素

if (array[pos] != null && array[pos].equals(x)){

return pos;

}

return -1;

}

/**

* 删除元素：先查询表中是否存在该元素，若存在，则进行删除该元素

* @param x

* @return

public boolean remove(AnyType x){

int pos = findPos(x);

if (pos != -1){

array[pos] = null;

currentSize --;

}

return pos != -1;

}

/**

* 插入：先判断该元素是否存在，若存在，在判断表的大小是否达到最大负载，

* 若达到，则进行扩展，最后调用insertHelper方法进行插入元素

* @param x

* @return

public boolean insert(AnyType x){

if (contains(x)){

return false;

}

if (currentSize >= array.length * MAX_LOAD){

expand();

}

return insertHelper(x);

}

/**

* 具体的插入过程：

* a. 先遍历散列函数集合，找出元素所有的可存放的位置，若找到的位置为空，则放入即可，完成插入

* b. 若没有找到空闲位置，随机产生一个位置

* c. 将插入的元素替换随机产生的位置，并将要插入的元素更新为被替换的元素

* d. 替换后，回到步骤a.

* e. 若超过查找次数，还是没有找到空闲位置，那么根据rehash的次数，

* 判断是否需要进行扩展表，若超过rehash的最大次数，则进行扩展表，

* 否则进行rehash操作，并更新散列函数集合

* @param x

* @return

private boolean insertHelper(AnyType x) {

//记录循环的最大次数

final int COUNT_LIMIT = 100;

while (true){

//记录上一个元素位置

int lastPos = -1;

int pos;

//进行查找插入

for (int count = 0; count < COUNT_LIMIT; count ++){

for (int i = 0; i < numHashFunctions; i ++){

pos = myHash(x, i);

//查找成功，直接返回

if (array[pos] == null){

array[pos] = x;

currentSize ++;

return true;

}

//查找失败，进行替换操作，产生随机数位置，当产生的位置不能与原来的位置相同

int i = 0;

do {

pos = myHash(x, r.nextInt(numHashFunctions));

} while (pos == lastPos && i ++ < 5);

//进行替换操作

AnyType temp = array[lastPos = pos];

array[pos] = x;

x = temp;

}

//超过次数，还是插入失败，则进行扩表或rehash操作

if (++ rehashes > ALLOWED_REHASHES){

expand();

rehashes = 0;

} else {

rehash();

}

private void expand(){

rehash((int) (array.length / MAX_LOAD));

}

private void rehash(){

hashFunctions.generateNewFunctions();

rehash(array.length);

}

private void rehash(int newLength){

AnyType [] oldArray = array;

allocateArray(nextPrime(newLength));

currentSize = 0;

for (AnyType str : oldArray){

if (str != null){

insert(str);

}

//清空操作

private void doClear(){

currentSize = 0;

for (int i = 0; i < array.length; i ++){

array[i] = null;

}

//初始化表

private void allocateArray(int arraySize){

array = (AnyType[]) new Object[arraySize];

}

public void printArray(){

System.out.println("当前散列表如下：");

System.out.println("表的大小为：" + array.length);

for (int i = 0; i < array.length; i ++){

if (array[i] != null)

System.out.println("current pos: " + i + " current value: " + array[i]);

}

//定义最大装填因子为0.4

private static final double MAX_LOAD = 0.4;

//定义rehash次数达到一定时，进行

private static final int ALLOWED_REHASHES = 1;

//定义默认表的大小

private static final int DEFAULT_TABLE_SIZE = 101;

//定义散列函数集合

private final HashFamily super AnyType> hashFunctions;

//定义散列函数个数

private final int numHashFunctions;

//定义当前表

private AnyType[] array;

//定义当前表的大小

private int currentSize;

//定义rehash的次数

private int rehashes = 0;

//定义一个随机数

private Random r = new Random();

//返回下一个素数

private static int nextPrime(int n){

while (!isPrime(n)){

n ++;

}

return n;

}

//判断是否为素数

private static boolean isPrime(int n){

for (int i = 2; i <= Math.sqrt(n); i ++){

if (n % i == 0 && n != 2){

return false;

}

return true;

}

优化(减少哈希碰撞)：

1、将一维改成多维，使用桶(bucket)的4路槽位(slot)；

2、一个key对应多个value；

3、增加哈希函数，从两个增加到多个；

4、增加哈希表，类似于第一种；

布谷鸟哈希函数的参数_CuckooHash（布谷鸟散列）相关推荐

布谷鸟哈希函数的参数_系统学习hash算法（哈希算法）
系统学习hash算法(哈希算法) 转载请说明出处. 前言: 关于本文<系统学习hash算法>的由来.在看到了<十一.从头到尾彻底解析Hash 表算法>这篇文章之后,原文中没有暴 ...
布谷鸟哈希函数的参数_Cuckoo Hash 布谷鸟哈希
布谷鸟哈希最早于2001 年由Rasmus Pagh 和Flemming Friche Rodler 提出.该哈希方法是为了解决哈希冲突的问题而提出,利用较少计算换取了较大空间.名称源于该哈希方法行为 ...
布谷鸟哈希函数的参数_Cuckoo Hash 布谷鸟哈希
查看原文:http://www.dullgull.com/2012/05/cuckoo-hash-%e5%b8%83%e8%b0%b7%e9%b8%9f%e5%93%88%e5%b8%8c/ 布谷鸟哈 ...
布谷鸟哈希函数的参数_用于并发读密集型的乐观Cuckoo(布谷鸟) Hashing
用于并发读密集型的乐观Cuckoo(布谷鸟) Hashing:Optimistic Cuckoo Hashing for concurrent, read-intensive applications ...
ds哈希查找—二次探测再散列_大白话之哈希表和哈希算法
哈希表概念哈希表(散列表),是基于关键码值(Key value)而直接进行访问的数据结构.也就是说,它通过把关键码值映射到表中一个位置来访问记录,以加快查找的速度.这个映射函数叫做散列函数(哈希函数 ...
【哈希表】线性探测再散列的相关知识与计算
注意概念: 装填因子等概率下查找成功的平均查找长度等概率下查找不成功的平均查找长度线性探测再散列时以存储空间的长度来取余查找时比较次数,如在 {12}中查找12,12跟12也要进行一次比较. ...
解决vlookup函数查找参数不在第一列_if({1,0},,)用法
问题: 查找学生 a3 的成绩成绩表 A B 1 成绩学号 2 90 a1 3 80 a2 4 70 a3 使用 vlookup 函数, vlookup(查找值, 所在范围, 返回值在第几列, ...
bloomFilter和哈希函数murmur3
Murmur哈希算法是一种非加密hash算法,适用于哈希查找. 优点是时间和空间消耗较少,可检索一个元素是否在集合中缺点是误识别率和删除困难 bloomFilter原理元素被加入集合时,选择k ...
哈希 ---《哈希函数》------除数的选取为什么是质数？、《哈希冲突》------解决方法、《闭散列》、《开散列》
一.哈希概念顺序结构以及平衡树中,元素关键码与其存储位置之间没有对应的关系,因此在查找一个元素时,必须要经过关键码的多次比较**.顺序查找时间复杂度为O(N),平衡树中为树的高度,即O(logN ) ...
哈希：哈希冲突、负载因子、哈希函数、哈希表、哈希桶
文章目录哈希哈希(散列)函数常见的哈希函数字符串哈希函数哈希冲突闭散列(开放地址法) 开散列(链地址法/拉链法) 负载因子以及增容对于闭散列对于开散列结构具体实现哈希表(闭散列) ...

布谷鸟哈希函数的参数_CuckooHash（布谷鸟散列）

布谷鸟哈希函数的参数_CuckooHash（布谷鸟散列）相关推荐

最新文章

热门文章