1. 引言

零知识证明中，需要 prove the knowledge of a preimage under a cryptographic hash function，针对的场景为：

public info：h,Hh,Hh,H
private info：xxx
待证明relation：h=H(x)h=H(x)h=H(x)

其中H(∗)H(*)H(∗) 为hash函数。

1.1 hash函数属性

Hash函数的属性有：

second pre-image resistance：对于 特定的 m1m_1m1，很难找到不同的m2m_2m2，使得H(m1)=H(m2)H(m_1)=H(m_2)H(m1)=H(m2)。【攻击方不可选择m1m_1m1】
collision resistance：对于 任意的 m1m_1m1，很难找到不同的m2m_2m2，使得H(m1)=H(m2)H(m_1)=H(m_2)H(m1)=H(m2)。【攻击方可自由选择m1,m2m_1,m_2m1,m2，仅要求m1≠m2m_1\neq m_2m1=m2而H(m1)=H(m2)H(m_1)=H(m_2)H(m1)=H(m2)。】

因此collision resistance 包含了 second pre-image resistance。

1.2 hash函数分类

Hash函数的类型可分为：

hash to bit string。
hash to elliptic curve，即hash结果为an elliptic curve point。

hash to bit string 的应用场景有：

Schnorr签名中：s=k+H(P,R,m)⋅pks=k+H(P,R,m)\cdot pks=k+H(P,R,m)⋅pk，详细可参见 ECDSA VS Schnorr signature VS BLS signature。
SNARK中：借助Fiat-Shamir转换（通过hash函数）来实现non-interactive SNARK。

hash to elliptic curve 的应用场景有：

VRF（verifiable random function 可验证随机数）中：有 H=hash_to_curve(Q,m)H=hash\_to\_curve(Q,m)H=hash_to_curve(Q,m)。
PAKE（Password authenticated key exchange）中：有session key K=H(K)\mathcal{K}=H(K)K=H(K)。
BLS signature中：有Gm=H(m)G_m=H(m)Gm=H(m)，详细可参见 ECDSA VS Schnorr signature VS BLS signature。

1.3 之前主流hash函数

之前主流的hash函数主要有：

SHA256：详细可参看博客 SHA256算法原理详解。
SHA3
BLAKE2
BLAKE3
Pedersen hash：效率高，但是其输入上限为32字节。仅满足second pre-image resistance，可能存在collision resistance。

根据 https://github.com/BLAKE3-team/BLAKE3，有：

2. 为何需要设计新的hash函数？

已有Blake2/3和SHA2/3 等hash函数，为何需要研究针对zero-knowledge proof的hash函数——MiMC/Poseidon等等？主要原因在于：

通常，生成 a zero-knowledge proof的complexity主要取决于statement中所包含的乘法门数量。因此，新的hash函数应使乘法复杂度最小化，从而简化ZKP生成。【如在Zcash Sapling中shielded payment中的zkSNARK，其大多数的proving time和proving complexity来自于 proving statements about invocations of Pedersen and Blake2b hash函数。】
Merkle-tree实现中，需要一个能接受64字节输入的hash函数。64字节为：2个32字节 hash输出。

2018年7月2日，以太坊基金会给StarkWare团队2年的赞助，用于寻找新的STARK friendly hash (SFH) 函数，可用于在区块链中构建transparent且抗量子安全的proof系统。其要求很高，具体为：

Prover应能为同一hash函数的多次调用生成STARK proofs，在4核1G RAM主机上，相应的proof creation rate不低于 100 hashes/second。
10万个hashes 合并为一个STARK proof 在单核CPU上的验证时间应不超过10ms，且相应的proof size不高于200KB。

这几年出现的hash函数有：
（1）hash to bit string 的hash函数
1.1）受MPC启发的low-complexity hash函数：

MiMC hash：Albrecht等人 2016年论文《MiMC: Efficient Encryption and Cryptographic Hashing with Minimal Multiplicative Complexity》。
GMiMC hash： Albrecht等人2019年论文《Feistel Structures for MPC, and More》。

1.2）受HadesMiMC启发的hash函数：包含了Starkad (binary field)、Poseidon (prime field)，由IAIK团队的研究者设计。

Starkad/Poseidon：Grassi等人2019年论文《POSEIDON: A New Hash Function for Zero-Knowledge Proof Systems》。

1.3）受MARVELlous启发的hash函数：包含了Jarvis、Vision (binary field)、Pepper、Rescue (prime field)，由COSIC团队的研究者设计。

Vision/Rescue：Aly等人2019年论文《Efficient Symmetric Primitives for Advanced Cryptographic Protocols (A Marvellous Contribution)》。
Rescue Prime hash：Szepieniec等人2020年论文《Rescue-Prime: a Standard Specification (SoK)》。

（2）hash to elliptic curve的hash函数

Wahby和Boneh等人2019年论文《Fast and simple constant-time hashing to the BLS12-381 elliptic curve》。相应的代码实现参见：https://github.com/kwantam/bls12-381_hash 以及 https://github.com/algorand/bls_sigs_ref 。

目前的SFH候选函数有：

MiMC家族
GMiMC家族
HadesMiMC家族
MARVELlous家族

经过一系列安全分析，目前StarkWare主要关注2类候选函数：【考虑到STARK中的算术运算，倾向于选择基于小素数域而不是大域或者二进制。】

MiMC家族
MARVELlous家族

在Ben-Sasson等人2020年论文《STARK Friendly Hash – Survey and Recommendation》中，推荐了MARVELlous家族中的基于素数域size约为2612^{61}261的Rescue hash函数作为以太坊基金的标准SFH。

3. 各hash函数性能对比

3.1 各STARK friendly hash函数性能对比

Ben-Sasson等人2020年论文《STARK Friendly Hash – Survey and Recommendation》中有：

3.2 Poseidon VS Rescue VS Rescue Prime

在https://github.com/matter-labs/rescue-poseidon 中，在1GHz Intel Core i5上的性能对比为：

hashes	1x permutation runtime (μs)	1x permutation gates	number of rounds
Poseidon	13	166	8f + 33p
Rescue	680	266	44f
Rescue Prime	300	104	9f

3.3 SHA256 VS Pedersen VS MiMC

在以太坊的zkSNARKs toolbox ZoKrates 中，分别实现了以下3种hash函数的ZKP circuit：

SHA256：目前在以太坊中，SHA256已以预编译合约的形式实现，因此在EVM中进行SHA256运算是便宜的。但是在circuit中实现相对来说仍是昂贵的，因为SHA256运算重度依赖于bit操作，需要定义binary inputs和binary outputs。
SHA256属于SHA2家族，SHA2家族的hash函数都可认为是伪随机的。
对于512bit message，SHA2的constraints数为27,446，生成proof的时间约为1.5秒，即每个bit对应约53.6个constraints。
Pedersen hash：受Pedersen commitment启发，Pedersen hash函数的安全性依赖于discrete logarithm problem。其输入为fixed-length binary，输出为a group element，如在ZoKrates中为a point on the BabyJubJub elliptic curve。
Pedersen hash不具备伪随机性，因此不可用于random oracle等场景。
在EVM中，未实现对BabyJubJub curve的原生支持。因此在链上执行Pedersen hash是昂贵的，应尽量避免。
具有nnn个generators GiG_iGi，消息mmm表示为nnn bits mim_imi，相应的Pedersen hash为：
m1G1+m2G2+⋯+mnGnm_1G_1+m_2G_2+\cdots +m_nG_nm1G1+m2G2+⋯+mnGn
对于512bit message，Pedersen hash 的constraints数约为898个，即每个bit对应约1.7个constraints。
MiMC hash：为使用MiMC-Feistel permutation over a prime field in a sponge construction来实现的secure and efficiently provable hash函数。
MiMC hash构建过程中借鉴了对称密码学中现有的hash函数设计原则，
MiMC hash可认为是伪随机的。
由于MiMC hash本身采用素数域运算，因此其在circuit中是高效的，同时在EVM中也可以以很低的开销计算。
MiMC hash的输入为field elements，输出也为field elements，因此以其输出为输入时，无需packing/unpacking操作，可节约相应的开销。

3.4 TurboPlonk’s SHA256/MiMC VS Groth16’s SHA256/MiMC

代码：https://github.com/AZTECProtocol/barretenberg

AZTEC团队在 PLONK Benchmarks I — 2.5x faster than Groth16 on MiMC 中指出，其TurboPlonk的MiMC hash实现比Groth16中的hash速度快2.5倍。

3.5 智能合约中基于Poseidon/MiMC/SHA26的incremental merkle tree证明对比

已审计的incrementalMerkleTree合约实现见：https://github.com/appliedzkp/maci/blob/36387984f15a448152f5bb4227db978c75082e59/contracts/sol/IncrementalMerkleTree.sol

以太坊团队 Gas and circuit constraint benchmarks of binary and quinary incremental Merkle trees using the Poseidon hash function
中，基于 circomlib库，相应对比为：

根据 https://github.com/weijiekoh/poseidon_benchmarks 采用了circomlib 0.5.1版本中的Poseidon实现，npm run benchmark的性能为：

Gas used to deploy PoseidonT3: 2156516
Gas used to deploy PoseidonT6: 2156516
Gas used by hash2(): 55543
Gas used by hash5(): 125425
// R1CS constraints for Poseidon (t=3)
[INFO]  snarkJS: Curve: bn-128
[INFO]  snarkJS: # of Wires: 243
[INFO]  snarkJS: # of Constraints: 240
[INFO]  snarkJS: # of Private Inputs: 0
[INFO]  snarkJS: # of Public Inputs: 2
[INFO]  snarkJS: # of Labels: 1108
[INFO]  snarkJS: # of Outputs: 1
// R1CS constraints for Poseidon (t=6)
[INFO]  snarkJS: Curve: bn-128
[INFO]  snarkJS: # of Wires: 327
[INFO]  snarkJS: # of Constraints: 321
[INFO]  snarkJS: # of Private Inputs: 0
[INFO]  snarkJS: # of Public Inputs: 5
[INFO]  snarkJS: # of Labels: 2071
[INFO]  snarkJS: # of Outputs: 1

参考资料

[1] 2020年2月版本 Report on the Security of STARK-friendly Hash Functions (Version 2.0)
[2] Grassi等人2019年论文 POSEIDON: A New Hash Function for Zero-Knowledge Proof Systems
[3] ZCash讨论 evaluate SNARK-friendly hash functions, e.g. MiMC or the MARVELlous family #2233
[4] 以太坊研究 Gas and circuit constraint benchmarks of binary and quinary incremental Merkle trees using the Poseidon hash function
[5] Ben-Sasson等人2020年论文 STARK Friendly Hash – Survey and Recommendation
[6] AZTEC博客 PLONK Benchmarks I — 2.5x faster than Groth16 on MiMC
[7] O(1) Labs snarks < 3 hash functions
[8] Why invent new hash functions for zero-knowledge proofs?
[9] Comparison of SNARK-friendly hash algorithms MiMC7, Poseidon, Pederson?
[10] 以太坊matters-lab团队实现的 Rescue, Poseidon, Rescue Prime
[11] SNARK-friendly one-way compression functions
[12] MiMC介绍
[13] StarkWare 2019年博客 STARK-Friendly Hash
[14] 以太坊研究 Cheap hash functions for zkSNARK merkle-tree proofs, which can be calculated on-chain
[15] Ashur等人2018年论文 MARVELlous: a STARK-Friendly Family of Cryptographic Primitives

snark/stark-friendly hash函数相关推荐

Hash函数的安全性
我们为了保证消息的完整性,引进了散列函数,那么散列函数会对安全正造成什么影响呢?这是需要好好研究一番的问题. 三个概念: 1.如果y<>x,且h(x)=h(y),则称为碰撞. 2.对于给定 ...
Hash 函数资源链接汇总
Hash 链接: ［1］General Purpose Hash Function Algorithms:http://www.partow.net/programming/hashfunctions ...
HashMap中的hash函数
在写一个HashSet时候有个需求,是判断HashSet中是否已经存在对象,存在则取出,不存在则add添加.HashSet也是通过HashMap实现,只用了HashMap的key,value都存储一个 ...
文本去重之MinHash算法——就是多个hash函数对items计算特征值，然后取最小的计算相似度...
来源:http://my.oschina.net/pathenon/blog/65210 1.概述跟SimHash一样,MinHash也是LSH的一种,可以用来快速估算两个集合的相似度.MinHas ...
Cuckoo hash算法分析——其根本思想和bloom filter一致增加hash函数来解决碰撞节省了空间但代价是查找次数增加...
基本思想: cuckoo hash是一种解决hash冲突的方法,其目的是使用简单的hash 函数来提高hash table的利用率,同时保证O(1)的查询时间基本思想是使用2个hash函数来处理碰撞 ...
Hash函数及其应用
本文部分内容摘自网络,参考资料链接会在文后给出,在此感谢原作者的分享. 计算理论中,没有Hash函数的说法,只有单向函数的说法.所谓的单向函数,是一个复杂的定义,大家可以去看计算理论或者密码学方面的数 ...
常见的一些 Hash 函数
Hash的主要原理就是把大范围映射到小范围:所以,你输入的实际值的个数必须和小范围相当或者比它更小.不然冲突就会很多. 不同的应用对Hash函数有着不同的要求:比如,用于加密的Hash函数主要考虑它和 ...
认知理论与技术（hash函数，SHA，MD
概述: 认证是个过程,通过这个过程一个实体向另一个实体证明某种声称的属性认证参数: "口令"相当于平时登录系统时输入的密码. "密钥"相当于公钥密码体制中的私 ...
密码学基础知识（六）Hash函数与消息认证
Hash函数和消息认证先说Hash 哈希函数,可以将任意长度的消息压缩为某一固定长度的消息摘要函数.一句话,Hash简直了. 当然有逆天的一面就有大缺点,过程不可逆.傻了吧,哈哈. Hash的性质: ...

snark/stark-friendly hash函数