今天在看ClickHouse源码时,注意到ClickHouse使用了cityhash128作为自己的HASH算法:

The first 16 bytes are the checksum from all other bytes of the block. Now only CityHash128 is used.

cityhash 算法是谷歌提出的哈希算法,之前从来没有听说过。
GitHub仓库6年没更新过了:google/cityhash: Automatically exported from code.google.com/p/cityhash
网上我也找不到多少相关的资料,如果想在C++中使用cityhash算法,唯一的方法就是编译源代码进行安装。
我找到了一个Python对cityhash 的实现:cityhash · PyPI
直接安装就行:

pip install cityhash

使用方法如下:

from cityhash import CityHash32, CityHash64, CityHash128
print(CityHash64('16'))

结果如下:

>>> print(CityHash64('16'))
179832329939032581

可以直接使用ClickHouse 自带的CityHash64算法:

ubuntu :) SELECT cityHash64('16') AS CityHash, toTypeName(CityHash) AS type;SELECT cityHash64('16') AS CityHash, toTypeName(CityHash) AS type┌───────────CityHash─┬─type───┐
│ 696724486834661759 │ UInt64 │
└────────────────────┴────────┘1 rows in set. Elapsed: 0.001 sec. 

结果不一样,这涉及到编码的问题了。

ClickHouse 在自己的jdbc接口中实现了CityHash算法,我不知道实现的是否正确:
clickhouse-jdbc/ClickHouseCityHash.java at master · yandex/clickhouse-jdbc
代码如下:

/** Copyright 2017 YANDEX LLC** Licensed under the Apache License, Version 2.0 (the "License");* you may not use this file except in compliance with the License.* You may obtain a copy of the License at** http://www.apache.org/licenses/LICENSE-2.0** Unless required by applicable law or agreed to in writing, software* distributed under the License is distributed on an "AS IS" BASIS,* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.* See the License for the specific language governing permissions and* limitations under the License.*//** Copyright (C) 2012 tamtam180** Licensed under the Apache License, Version 2.0 (the "License");* you may not use this file except in compliance with the License.* You may obtain a copy of the License at** http://www.apache.org/licenses/LICENSE-2.0** Unless required by applicable law or agreed to in writing, software* distributed under the License is distributed on an "AS IS" BASIS,* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.* See the License for the specific language governing permissions and* limitations under the License.*/package ru.yandex.clickhouse.util;/*** @author tamtam180 - kirscheless at gmail.com* @see http://google-opensource.blogspot.jp/2011/04/introducing-cityhash.html* @see http://code.google.com/p/cityhash/**//*** NOTE: The code is modified to be compatible with CityHash128 used in ClickHouse*/
public class ClickHouseCityHash {private static final long k0 = 0xc3a5c85c97cb3127L;private static final long k1 = 0xb492b66fbe98f273L;private static final long k2 = 0x9ae16a3b2f90404fL;private static final long k3 = 0xc949d7c7509e6557L;private static long toLongLE(byte[] b, int i) {return (((long)b[i+7] << 56) +((long)(b[i+6] & 255) << 48) +((long)(b[i+5] & 255) << 40) +((long)(b[i+4] & 255) << 32) +((long)(b[i+3] & 255) << 24) +((b[i+2] & 255) << 16) +((b[i+1] & 255) <<  8) +((b[i+0] & 255) <<  0));}private static long toIntLE(byte[] b, int i) {return (((b[i+3] & 255L) << 24) + ((b[i+2] & 255L) << 16) + ((b[i+1] & 255L) << 8) + ((b[i+0] & 255L) << 0));}private static long fetch64(byte[] s, int pos) {return toLongLE(s, pos);}private static long fetch32(byte[] s, int pos) {return toIntLE(s, pos);}private static int staticCastToInt(byte b) {return b & 0xFF;}private static long rotate(long val, int shift) {return shift == 0 ? val : (val >>> shift) | (val << (64 - shift));}private static long rotateByAtLeast1(long val, int shift) {return (val >>> shift) | (val << (64 - shift));}private static long shiftMix(long val) {return val ^ (val >>> 47);}private static final long kMul = 0x9ddfea08eb382d69L;private static long hash128to64(long u, long v) {long a = (u ^ v) * kMul;a ^= (a >>> 47);long b = (v ^ a) * kMul;b ^= (b >>> 47);b *= kMul;return b;}private static long hashLen16(long u, long v) {return hash128to64(u, v);}private static long hashLen0to16(byte[] s, int pos, int len) {if (len > 8) {long a = fetch64(s, pos + 0);long b = fetch64(s, pos + len - 8);return hashLen16(a, rotateByAtLeast1(b + len, len)) ^ b;}if (len >= 4) {long a = fetch32(s, pos + 0);return hashLen16((a << 3) + len, fetch32(s, pos + len - 4));}if (len > 0) {byte a = s[pos + 0];byte b = s[pos + (len >>> 1)];byte c = s[pos + len - 1];int y = staticCastToInt(a) + (staticCastToInt(b) << 8);int z = len + (staticCastToInt(c) << 2);return shiftMix(y * k2 ^ z * k3) * k2;}return k2;}private static long[] weakHashLen32WithSeeds(long w, long x, long y, long z,long a, long b) {a += w;b = rotate(b + a + z, 21);long c = a;a += x;a += y;b += rotate(a, 44);return new long[]{ a + z, b + c };}private static long[] weakHashLen32WithSeeds(byte[] s, int pos, long a, long b) {return weakHashLen32WithSeeds(fetch64(s, pos + 0),fetch64(s, pos + 8),fetch64(s, pos + 16),fetch64(s, pos + 24),a,b);}private static long[] cityMurmur(byte[] s, int pos, int len, long seed0, long seed1) {long a = seed0;long b = seed1;long c = 0;long d = 0;int l = len - 16;if (l <= 0) {a = shiftMix(a * k1) * k1;c = b * k1 + hashLen0to16(s, pos, len);d = shiftMix(a + (len >= 8 ? fetch64(s, pos + 0) : c));} else {c = hashLen16(fetch64(s, pos + len - 8) + k1, a);d = hashLen16(b + len, c + fetch64(s, pos + len - 16));a += d;do {a ^= shiftMix(fetch64(s, pos + 0) * k1) * k1;a *= k1;b ^= a;c ^= shiftMix(fetch64(s, pos + 8) * k1) * k1;c *= k1;d ^= c;pos += 16;l -= 16;} while (l > 0);}a = hashLen16(a, c);b = hashLen16(d, b);return new long[]{ a ^ b, hashLen16(b, a) };}private static long[] cityHash128WithSeed(byte[] s, int pos, int len, long seed0, long seed1) {if (len < 128) {return cityMurmur(s, pos, len, seed0, seed1);}long[] v = new long[2], w = new long[2];long x = seed0;long y = seed1;long z = k1 * len;v[0] = rotate(y ^ k1, 49) * k1 + fetch64(s, pos);v[1] = rotate(v[0], 42) * k1 + fetch64(s, pos + 8);w[0] = rotate(y + z, 35) * k1 + x;w[1] = rotate(x + fetch64(s, pos + 88), 53) * k1;// This is the same inner loop as CityHash64(), manually unrolled.do {x = rotate(x + y + v[0] + fetch64(s, pos + 16), 37) * k1;y = rotate(y + v[1] + fetch64(s, pos + 48), 42) * k1;x ^= w[1];y ^= v[0] ;z = rotate(z ^ w[0], 33);v = weakHashLen32WithSeeds(s, pos, v[1] * k1, x + w[0]);w = weakHashLen32WithSeeds(s, pos + 32, z + w[1], y);{ long swap = z; z = x; x = swap; }pos += 64;x = rotate(x + y + v[0] + fetch64(s, pos + 16), 37) * k1;y = rotate(y + v[1] + fetch64(s, pos + 48), 42) * k1;x ^= w[1];y ^= v[0];z = rotate(z ^ w[0], 33);v = weakHashLen32WithSeeds(s, pos, v[1] * k1, x + w[0]);w = weakHashLen32WithSeeds(s, pos + 32, z + w[1], y);{ long swap = z; z = x; x = swap; }pos += 64;len -= 128;} while (len >= 128);y += rotate(w[0], 37) * k0 + z;x += rotate(v[0] + z, 49) * k0;// If 0 < len < 128, hash up to 4 chunks of 32 bytes each from the end of s.for (int tail_done = 0; tail_done < len; ) {tail_done += 32;y = rotate(y - x, 42) * k0 + v[1];w[0] += fetch64(s, pos + len - tail_done + 16);x = rotate(x, 49) * k0 + w[0];w[0] += v[0];v = weakHashLen32WithSeeds(s, pos + len - tail_done, v[0], v[1]);}// At this point our 48 bytes of state should contain more than// enough information for a strong 128-bit hash.  We use two// different 48-byte-to-8-byte hashes to get a 16-byte final result.x = hashLen16(x, v[0]);y = hashLen16(y, w[0]);return new long[]{hashLen16(x + v[1], w[1]) + y,hashLen16(x + w[1], y + v[1])};}static long[] cityHash128(byte[] s, int pos, int len) {if (len >= 16) {return cityHash128WithSeed(s, pos + 16,len - 16,fetch64(s, pos) ^ k3,fetch64(s, pos + 8));} else if (len >= 8) {return cityHash128WithSeed(new byte[0], 0, 0,fetch64(s, pos ) ^ (len * k0),fetch64(s, pos + len -8) ^ k1);} else {return cityHash128WithSeed(s, pos, len, k0, k1);}}}

cityhash 算法的使用相关推荐

  1. MurmurHash算法:高运算性能,低碰撞率的hash算法

    MurmurHash算法:高运算性能,低碰撞率,由Austin Appleby创建于2008年,现已应用到Hadoop.libstdc++.nginx.libmemcached等开源系统.2011年A ...

  2. 一文搞懂负载均衡中的一致性哈希算法

    一致性哈希算法在很多领域有应用,例如分布式缓存领域的 MemCache,Redis,负载均衡领域的 Nginx,各类 RPC 框架.不同领域场景不同,需要顾及的因素也有所差异,本文主要讨论在负载均衡中 ...

  3. 哈希算法——murmurhash一致性哈希算法

    Murmurhash: 是一种非加密型哈希函数,适用于一般的哈希检索操作.高运算性能,低碰撞率,由Austin Appleby创建于2008年,现已应用到Hadoop.libstdc++.nginx. ...

  4. MurmurHash PK CityHash

    MurmurHash PK CityHash 标签: 算法测试alignmentgooglenull语言 2012-03-14 21:08 14636人阅读 评论(2) 收藏 举报  分类: c/c+ ...

  5. 【算法】哈希算法——murmurhash一致性哈希算法

    Murmurhash: 是一种非加密型哈希函数,适用于一般的哈希检索操作.高运算性能,低碰撞率,由Austin Appleby创建于2008年,现已应用到Hadoop.libstdc++.nginx. ...

  6. hash函数MurmurHash

    一.介绍 MurmurHash算法:高运算性能,低碰撞率,由Austin Appleby创建于2008年,现已应用到Hadoop.libstdc++.nginx.libmemcached等开源系统.2 ...

  7. Java - HuTool 使用 EscapeUtil、XmlUtil等工具类(四)

    Java - HuTool 使用 EscapeUtil.XmlUtil等工具类(四) 本篇主要介绍 HuTool工具, 其是 java工具类,对于一些静态方法进行封装,虽然很小,但很全,里面拥有平时我 ...

  8. Java版cityHash64 与cityHash128算法的实现

    简介 cityhash系列字符串散列算法是由著名的搜索引擎公司Google 发布的 (http://www.cityhash.org.uk/).Google发布的有两种算法:cityhash64 与 ...

  9. golang实现线程安全的map

    转载自:https://halfrost.com/go_map_chapter_one/ https://halfrost.com/go_map_chapter_two/ Map 是一种很常见的数据结 ...

最新文章

  1. 课时 16 深入理解 etcd:基于原理解析(曾凡松)
  2. GIT的基本操作(建立自己的git远程仓库)
  3. 测序发展史:150年的风雨历程
  4. vue中style的scoped属性的设计方式
  5. java 可以直接当自定义标示符_第2章 Java基本语法.ppt
  6. OpenShift 4 Tekton (3) - 通过控制台的图形化方式配置Tekton Pipeline
  7. JAVA常用的XML解析方法
  8. python 串口实例_串口编程(python串口通信实例)
  9. python3安装MySQLdb
  10. docker卸载mysql_Docker卸载镜像
  11. 基于Html+Css+javascript的体育网站
  12. matlab栅格化处理,栅格化处理方法和栅格图像处理器的制造方法
  13. 概览:可视化前端测试
  14. vRealize Automation 8.0+安装配置
  15. QGC地面站配置PX4Flow光流传感器
  16. 鸿蒙能和ios媲美吗,把鸿蒙打造成一个和iOS相媲美的操作系统需要多久?
  17. 如何流畅远程登录另一台电脑(尽可能流畅的远程桌面连接设置)
  18. Excel之UPPER、LOWER、IFERROR和SUBSTITUTE函数
  19. t - 分布的区间估计
  20. 图像质量评估各项指标(一)

热门文章

  1. NGLView 安装与配置-交互式分子结构和轨迹查看
  2. Nature | 机器学习在药物研发中的应用
  3. Android类动态加载技术
  4. Android中有关数据库SQLite的介绍
  5. NBT:PICRUSt2预测宏基因组功能
  6. 中国科学:中科院遗传发育所揭示拟南芥二半萜对根系微生物组的调控机制
  7. QIIME 2用户文档. 4人体各部位微生物组分析实战Moving Pictures(2018.11)
  8. R语言使用caret包构建gbdt模型(随机梯度提升树、Stochastic Gradient Boosting )构建回归模型、通过method参数指定算法名称
  9. MySQL数据库中默认事务隔离级别是?
  10. R语言deLong‘s test:通过统计学的角度来比较两个ROC曲线、检验两个ROC曲线的差异是否具有统计显著性