rescue-prime:基于Goldilocks域的Rescue-Prime 哈希函数加速
1. 引言
前序博客:
- Goldilocks域
所谓计算友好的哈希函数,是指:
- 基于素数域元素,而不是 通常的如SHA3-256/SHA256/BLAKE3中的raw bits/bytes/N-bit words。原因是,在STARK证明系统中,基于素数域的计算电路更易于证明。
Rescue-Prime哈希函数为计算友好的哈希函数,目前已用于Winterfell STARK证明系统中。
在Winterfell STARK证明系统中,基于的素数域为Goldilocks域,即q=264−232+1q=2^{64}-2^{32}+1q=264−232+1,相应的哈希函数API接口为:
- 输入:N>0N>0N>0个元素,每个元素均∈Zq,q=264−232+1\in Z_q,q=2^{64}-2^{32}+1∈Zq,q=264−232+1。
- 输出:4个元素,每个元素均∈Zq,q=264−232+1\in Z_q,q=2^{64}-2^{32}+1∈Zq,q=264−232+1。
开源代码见:
- https://github.com/itzmeanjan/rescue-prime(C++,采用AVX2和NEON(高级SIMD)进行加速)
在https://github.com/itzmeanjan/rescue-prime中,做了scalar和vectorized Rescue两种实现。若目标CPU支持AVX2或NEON指令,则运行Rescue permutation将更快。
make benchmark # benchmarks scalar implementation
AVX2=1 make benchmark # benchmarks AVX2 implementation
NEON=1 make benchmark # benchmarks NEON implementation
CPU加速后,在各平台benchmark的性能为:
- On Intel® Xeon® Platinum 8375C CPU @ 2.90GHz ( Scalar implementation compiled with GCC )
2022-12-22T15:39:15+00:00
Running ./bench/a.out
Run on (128 X 2072.24 MHz CPU s)
CPU Caches:L1 Data 48 KiB (x64)L1 Instruction 32 KiB (x64)L2 Unified 1280 KiB (x64)L3 Unified 55296 KiB (x2)
Load Average: 0.11, 0.04, 0.01
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations items_per_second max_exec_time (ns) median_exec_time (ns) min_exec_time (ns)
-------------------------------------------------------------------------------------------------------------------------------------------------------------
bench_rphash::permutation/manual_time 31499 ns 43123 ns 22224 31.7467k/s 33.988k 31.489k 31.363k
bench_rphash::hash/4/manual_time 32136 ns 36090 ns 21782 31.118k/s 38.102k 32.124k 31.998k
bench_rphash::hash/8/manual_time 31821 ns 39618 ns 22000 31.426k/s 42.13k 31.806k 31.681k
bench_rphash::hash/16/manual_time 63608 ns 79070 ns 11005 15.7213k/s 68.194k 63.588k 63.417k
bench_rphash::hash/32/manual_time 127188 ns 157987 ns 5504 7.8624k/s 129.934k 127.15k 126.867k
bench_rphash::hash/64/manual_time 254360 ns 315933 ns 2752 3.93144k/s 288.681k 254.272k 253.928k
bench_rphash::hash/128/manual_time 508638 ns 632906 ns 1376 1.96603k/s 522.51k 508.456k 507.97k
- On Intel® Xeon® Platinum 8375C CPU @ 2.90GHz ( AVX2 implementation compiled with GCC )
2022-12-22T15:40:04+00:00
Running ./bench/a.out
Run on (128 X 1292.25 MHz CPU s)
CPU Caches:L1 Data 48 KiB (x64)L1 Instruction 32 KiB (x64)L2 Unified 1280 KiB (x64)L3 Unified 55296 KiB (x2)
Load Average: 0.13, 0.06, 0.01
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations items_per_second max_exec_time (ns) median_exec_time (ns) min_exec_time (ns)
-------------------------------------------------------------------------------------------------------------------------------------------------------------
bench_rphash::permutation/manual_time 13742 ns 25366 ns 50938 72.7718k/s 16.197k 13.738k 13.646k
bench_rphash::hash/4/manual_time 13756 ns 17694 ns 50886 72.6964k/s 21.138k 13.752k 13.677k
bench_rphash::hash/8/manual_time 13751 ns 21534 ns 50901 72.7217k/s 16.273k 13.747k 13.669k
bench_rphash::hash/16/manual_time 27480 ns 42949 ns 25473 36.3899k/s 32.871k 27.473k 27.356k
bench_rphash::hash/32/manual_time 54944 ns 85784 ns 12740 18.2003k/s 59.909k 54.927k 54.784k
bench_rphash::hash/64/manual_time 109873 ns 171524 ns 6371 9.10138k/s 185.71k 109.828k 109.609k
bench_rphash::hash/128/manual_time 219703 ns 344066 ns 3186 4.55159k/s 232.839k 219.643k 219.373k
- On Intel® Xeon® Platinum 8375C CPU @ 2.90GHz ( Scalar implementation compiled with Clang )
2022-12-22T15:40:48+00:00
Running ./bench/a.out
Run on (128 X 1295.09 MHz CPU s)
CPU Caches:L1 Data 48 KiB (x64)L1 Instruction 32 KiB (x64)L2 Unified 1280 KiB (x64)L3 Unified 55296 KiB (x2)
Load Average: 0.14, 0.08, 0.02
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations items_per_second max_exec_time (ns) median_exec_time (ns) min_exec_time (ns)
-------------------------------------------------------------------------------------------------------------------------------------------------------------
bench_rphash::permutation/manual_time 9735 ns 22576 ns 71907 102.723k/s 12.152k 9.732k 9.673k
bench_rphash::hash/4/manual_time 9746 ns 14100 ns 71820 102.606k/s 18.543k 9.742k 9.681k
bench_rphash::hash/8/manual_time 9749 ns 18356 ns 71795 102.578k/s 11.831k 9.746k 9.681k
bench_rphash::hash/16/manual_time 19480 ns 36583 ns 35937 51.3342k/s 23.983k 19.474k 19.382k
bench_rphash::hash/32/manual_time 38931 ns 73003 ns 17979 25.6863k/s 42.868k 38.921k 38.769k
bench_rphash::hash/64/manual_time 77837 ns 145841 ns 8994 12.8474k/s 81.722k 77.814k 77.656k
bench_rphash::hash/128/manual_time 155657 ns 291537 ns 4497 6.4244k/s 158.186k 155.614k 155.404k
- On Intel® Xeon® Platinum 8375C CPU @ 2.90GHz ( AVX2 implementation compiled with Clang )
2022-12-22T15:41:42+00:00
Running ./bench/a.out
Run on (128 X 2035.27 MHz CPU s)
CPU Caches:L1 Data 48 KiB (x64)L1 Instruction 32 KiB (x64)L2 Unified 1280 KiB (x64)L3 Unified 55296 KiB (x2)
Load Average: 0.21, 0.11, 0.03
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations items_per_second max_exec_time (ns) median_exec_time (ns) min_exec_time (ns)
-------------------------------------------------------------------------------------------------------------------------------------------------------------
bench_rphash::permutation/manual_time 9603 ns 22470 ns 72890 104.13k/s 17.661k 9.599k 9.527k
bench_rphash::hash/4/manual_time 9612 ns 13965 ns 72826 104.039k/s 18.814k 9.609k 9.535k
bench_rphash::hash/8/manual_time 9613 ns 18212 ns 72818 104.024k/s 16.387k 9.61k 9.532k
bench_rphash::hash/16/manual_time 19203 ns 36295 ns 36448 52.0739k/s 23.494k 19.198k 19.104k
bench_rphash::hash/32/manual_time 38372 ns 72446 ns 18242 26.0607k/s 41.034k 38.361k 38.22k
bench_rphash::hash/64/manual_time 76717 ns 144751 ns 9124 13.0349k/s 80.947k 76.692k 76.503k
bench_rphash::hash/128/manual_time 153405 ns 289330 ns 4563 6.51869k/s 178.467k 153.358k 153.099k
- On Intel® Core™ i5-8279U CPU @ 2.40GHz ( Scalar implementation compiled with Clang )
2022-12-22T19:43:28+04:00
Running ./bench/a.out
Run on (8 X 2400 MHz CPU s)
CPU Caches:L1 Data 32 KiBL1 Instruction 32 KiBL2 Unified 256 KiB (x4)L3 Unified 6144 KiB
Load Average: 1.19, 2.15, 2.89
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations items_per_second max_exec_time (ns) median_exec_time (ns) min_exec_time (ns)
-------------------------------------------------------------------------------------------------------------------------------------------------------------
bench_rphash::permutation/manual_time 11417 ns 137884 ns 61405 87.5911k/s 119.946k 11.218k 11.065k
bench_rphash::hash/4/manual_time 11567 ns 54524 ns 61425 86.4507k/s 120.79k 11.236k 11.071k
bench_rphash::hash/8/manual_time 11464 ns 95950 ns 61527 87.228k/s 145.688k 11.236k 11.081k
bench_rphash::hash/16/manual_time 22696 ns 189947 ns 30847 44.0598k/s 138.261k 22.402k 22.13k
bench_rphash::hash/32/manual_time 45439 ns 381059 ns 15416 22.0077k/s 191.966k 44.724k 44.247k
bench_rphash::hash/64/manual_time 91015 ns 763956 ns 7709 10.9872k/s 1.46905M 89.331k 88.494k
bench_rphash::hash/128/manual_time 188629 ns 1596825 ns 3871 5.30141k/s 435.33k 178.67k 176.944k
- On Intel® Core™ i5-8279U CPU @ 2.40GHz ( AVX2 implementation compiled with Clang )
2022-12-22T19:45:04+04:00
Running ./bench/a.out
Run on (8 X 2400 MHz CPU s)
CPU Caches:L1 Data 32 KiBL1 Instruction 32 KiBL2 Unified 256 KiB (x4)L3 Unified 6144 KiB
Load Average: 1.97, 2.11, 2.79
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations items_per_second max_exec_time (ns) median_exec_time (ns) min_exec_time (ns)
-------------------------------------------------------------------------------------------------------------------------------------------------------------
bench_rphash::permutation/manual_time 8858 ns 134181 ns 79128 112.898k/s 126.803k 8.722k 8.545k
bench_rphash::hash/4/manual_time 8874 ns 50963 ns 79380 112.692k/s 123.209k 8.724k 8.55k
bench_rphash::hash/8/manual_time 8880 ns 92534 ns 79036 112.609k/s 97.469k 8.728k 8.549k
bench_rphash::hash/16/manual_time 17671 ns 185488 ns 39656 56.59k/s 98.486k 17.376k 17.082k
bench_rphash::hash/32/manual_time 35122 ns 368258 ns 19880 28.4719k/s 183.345k 34.633k 34.112k
bench_rphash::hash/64/manual_time 70485 ns 737101 ns 10012 14.1873k/s 4.48529M 69.141k 68.175k
bench_rphash::hash/128/manual_time 140581 ns 1482180 ns 4985 7.11335k/s 330.147k 138.158k 136.303k
- On ARM Neoverse-V1 aka AWS Graviton3 ( Scalar implementation compiled with GCC )
2022-12-22T15:48:54+00:00
Running ./bench/a.out
Run on (64 X 2100 MHz CPU s)
CPU Caches:L1 Data 64 KiB (x64)L1 Instruction 64 KiB (x64)L2 Unified 1024 KiB (x64)L3 Unified 32768 KiB (x1)
Load Average: 0.08, 0.02, 0.01
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations items_per_second max_exec_time (ns) median_exec_time (ns) min_exec_time (ns)
-------------------------------------------------------------------------------------------------------------------------------------------------------------
bench_rphash::permutation/manual_time 17051 ns 31612 ns 41058 58.6473k/s 23.325k 17.024k 16.957k
bench_rphash::hash/4/manual_time 17107 ns 22032 ns 40920 58.4551k/s 27.503k 17.08k 17.013k
bench_rphash::hash/8/manual_time 17060 ns 26937 ns 41039 58.617k/s 38.753k 17.032k 16.963k
bench_rphash::hash/16/manual_time 34090 ns 53631 ns 20533 29.3341k/s 45.077k 34.035k 33.944k
bench_rphash::hash/32/manual_time 68140 ns 107193 ns 10274 14.6757k/s 82.014k 68.029k 67.896k
bench_rphash::hash/64/manual_time 136230 ns 214227 ns 5138 7.34055k/s 146.935k 136.014k 135.83k
bench_rphash::hash/128/manual_time 272428 ns 428993 ns 2570 3.6707k/s 283.816k 271.989k 271.714k
- On ARM Neoverse-V1 aka AWS Graviton3 ( NEON implementation compiled with GCC )
2022-12-22T15:50:11+00:00
Running ./bench/a.out
Run on (64 X 2100 MHz CPU s)
CPU Caches:L1 Data 64 KiB (x64)L1 Instruction 64 KiB (x64)L2 Unified 1024 KiB (x64)L3 Unified 32768 KiB (x1)
Load Average: 0.15, 0.05, 0.02
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations items_per_second max_exec_time (ns) median_exec_time (ns) min_exec_time (ns)
-------------------------------------------------------------------------------------------------------------------------------------------------------------
bench_rphash::permutation/manual_time 17119 ns 31696 ns 40891 58.4156k/s 50.317k 17.092k 17.019k
bench_rphash::hash/4/manual_time 17236 ns 22195 ns 40606 58.0188k/s 26.107k 17.21k 17.151k
bench_rphash::hash/8/manual_time 17152 ns 27013 ns 40812 58.3027k/s 40.492k 17.126k 17.056k
bench_rphash::hash/16/manual_time 34278 ns 53945 ns 20420 29.1735k/s 57.256k 34.225k 34.102k
bench_rphash::hash/32/manual_time 68511 ns 107882 ns 10215 14.5962k/s 92.947k 68.413k 68.261k
bench_rphash::hash/64/manual_time 136978 ns 215582 ns 5110 7.30046k/s 143.904k 136.786k 136.576k
bench_rphash::hash/128/manual_time 273909 ns 431178 ns 2555 3.65085k/s 286.895k 273.525k 273.209k
- On ARM Neoverse-V1 aka AWS Graviton3 ( Scalar implementation compiled with Clang )
2022-12-22T15:51:07+00:00
Running ./bench/a.out
Run on (64 X 2100 MHz CPU s)
CPU Caches:L1 Data 64 KiB (x64)L1 Instruction 64 KiB (x64)L2 Unified 1024 KiB (x64)L3 Unified 32768 KiB (x1)
Load Average: 0.13, 0.07, 0.02
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations items_per_second max_exec_time (ns) median_exec_time (ns) min_exec_time (ns)
-------------------------------------------------------------------------------------------------------------------------------------------------------------
bench_rphash::permutation/manual_time 17550 ns 32973 ns 39886 56.9786k/s 30.101k 17.521k 17.427k
bench_rphash::hash/4/manual_time 17560 ns 22696 ns 39864 56.948k/s 40.568k 17.53k 17.43k
bench_rphash::hash/8/manual_time 17555 ns 27777 ns 39855 56.9635k/s 26.61k 17.527k 17.433k
bench_rphash::hash/16/manual_time 35088 ns 55486 ns 19950 28.4996k/s 54.669k 35.032k 34.889k
bench_rphash::hash/32/manual_time 70136 ns 110887 ns 9978 14.258k/s 78.483k 70.023k 69.828k
bench_rphash::hash/64/manual_time 140230 ns 221700 ns 4992 7.13113k/s 149.666k 140.014k 139.697k
bench_rphash::hash/128/manual_time 280417 ns 443432 ns 2496 3.56611k/s 286.263k 280.025k 279.561k
- On ARM Neoverse-V1 aka AWS Graviton3 ( NEON implementation compiled with Clang )
2022-12-22T15:51:50+00:00
Running ./bench/a.out
Run on (64 X 2100 MHz CPU s)
CPU Caches:L1 Data 64 KiB (x64)L1 Instruction 64 KiB (x64)L2 Unified 1024 KiB (x64)L3 Unified 32768 KiB (x1)
Load Average: 0.15, 0.09, 0.03
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations items_per_second max_exec_time (ns) median_exec_time (ns) min_exec_time (ns)
-------------------------------------------------------------------------------------------------------------------------------------------------------------
bench_rphash::permutation/manual_time 18277 ns 33684 ns 38300 54.7146k/s 30.028k 18.25k 18.154k
bench_rphash::hash/4/manual_time 18287 ns 23426 ns 38278 54.6826k/s 25.803k 18.26k 18.169k
bench_rphash::hash/8/manual_time 18290 ns 28515 ns 38273 54.6752k/s 42.531k 18.262k 18.158k
bench_rphash::hash/16/manual_time 36553 ns 56968 ns 19150 27.3573k/s 45.235k 36.5k 36.317k
bench_rphash::hash/32/manual_time 73058 ns 113815 ns 9581 13.6877k/s 95.43k 72.949k 72.773k
bench_rphash::hash/64/manual_time 146111 ns 227554 ns 4791 6.84411k/s 156.203k 145.895k 145.631k
bench_rphash::hash/128/manual_time 292153 ns 455092 ns 2396 3.42286k/s 302.318k 291.729k 291.347k
相关背景知识可参看:
- STARK入门知识
- Rescue-Prime: a Standard Specification (SoK)
- Winterfell中的Rescue Prime
- https://github.com/novifinancial/winterfell(Rust)
- Rescue Permutation test vectors
rescue-prime:基于Goldilocks域的Rescue-Prime 哈希函数加速相关推荐
- ACMNO.23 C语言-素数判定 写一个判断素数的函数,在主函数输入一个整数,输出是否是素数的消息。 输入 一个数 输出 如果是素数输出prime 如果不是输出not prime
题目描述 写一个判断素数的函数,在主函数输入一个整数,输出是否是素数的消息. 输入 一个数 输出 如果是素数输出prime 如果不是输出not prime 样例输入 97 样例输出 prime 来源/ ...
- mpeg b帧 编码 matlab,一种基于压缩域的镜头检测算法
文章编号: 1673- 5196( 2008) 06- 0097- 05 一种基于压缩域的镜头检测算法 摘要: 针对传统的非压缩域镜头检测算法数据量大.运算量大和效率低的缺点, 提出一种基于压缩域的镜 ...
- matlab信息隐藏算法,实验四--基于DCT域的信息隐藏算法
<实验四--基于DCT域的信息隐藏算法>由会员分享,可在线阅读,更多相关<实验四--基于DCT域的信息隐藏算法(6页珍藏版)>请在人人文库网上搜索. 1.实验四 基于DCT域的 ...
- 【Pytorch神经网络理论篇】 26 基于空间域的图卷积GCNs(ConvGNNs):定点域+谱域+图卷积的操作步骤
图卷积网络(Graph Convolutional Network,GCN)是一种能对图数据进行深度学习的方法.图卷积中的"图"是指数学(图论)中用顶点和边建立的有相关联系的拓扑图 ...
- 一种基于加密域的数字图像水印算法的设计与实现(附Matlab源码)
一种基于加密域的数字图像水印算法的设计与实现 项目介绍 毕设项目 题目:一种基于加密域的数字图像水印算法的设计与实现 随着数字媒体技术的发展,数字媒体版权的保护得到了越来越多人的重视,数字水印技术作为 ...
- 基于空间域的信息隐藏关键技术研究
实践题目:基于空间域的信息隐藏关键技术研究 目标是实现对320x240的灰度图像(样本自选,不能是lena图像)进行信息隐藏设计,应用空间域信息隐藏方法(例如LSB替换方法等)进行实验测试.对上述技术 ...
- Polygon zkEVM中Goldilocks域元素circom约束
1. 引言 前序博客有: Goldilocks域 Goldilocks域 p=264−232+1p= 2^{64} - 2^{32} + 1p=264−232+1. Polygon zkEVM中Gol ...
- 【论文摘要】一种基于NSPD-DCT域变参数混沌映射的零水印新方案
A Novel Zero-Watermarking Scheme Based on Variable Parameter Chaotic Mapping in NSPD-DCT Domain 标题:一 ...
- 基于信赖域的动态径向基函数代理模型优化策略
DRBF法回顾 TR-DRBF简介 拉丁超方设计 信頼域思想 算例 DRBF法回顾 在上一篇基于动态径向基函数(DRBF)代理模型的优化策略中我们简要介绍了DRBF算法,这种算法收敛次数少,但由于收敛 ...
最新文章
- 套娃成功!在《我的世界》里运行Win95、玩游戏,软件和教程现已公开
- 在spring MVC项目中集成Spring session redis (使用spring session框架,redis作为存储缓存)...
- 一个简单的PHP模板引擎
- Unknown SSL protocol error in connection to xxx:443
- x86服务器中网络性能分析与调优(高并发、大流量网卡调优)
- Asp.Net Core 混合全球化与本地化支持
- flask Form表单数据传递与取值
- python 爬虫性能_Python 爬虫性能相关总结
- 蓝桥杯 PREV-27 历届试题 蚂蚁感冒
- 13.看板方法---使用两层系统扩展看板
- php laravel框架失败_急急急!!!ubuntu+laravel+nginx安装完成后,请求laravel框架失败...
- Android系统 miui主题,MIUI 主题完全折腾指南
- 【STM32】ADC的DMA方式采集(16通道)
- 史上最强Js流程控制三大结构
- 伤疤好了有黑印怎么办_疤痕留下黑印怎么办
- PCIE设备的x1,x4,x8,x16有什么区别?
- 百度云有关Token
- cvs配电保护断路器_CVS100F断路器|施耐德CVS100F100A断路器
- dormer natalie_【图片】[Natalie Dormer]娜塔莉·多默尔【娜塔莉多默尔吧】_百度贴吧...
- 杰理之AC695_3.0.4_SDK做发射器连接接收器无声问题解决方法【篇】
热门文章
- 原生JS设置CSS样式有多少方式
- linux硬盘克隆 软件,分享|10 个免费的磁盘克隆软件
- Python工程师之JA3 指纹
- 下面linux程序中哪一个是调试器,【编程】noi2009笔试复习题(1)
- $\frac{dy}{dx}$ 是什么意思?
- SDRAM的数据存储实现并对其数据进行读写操作
- dnf电信区服务器位置,DNF新跨区计划更新 拍卖行全服关闭下架
- 20230606夏新(Amoi)的4K显示器D320B2000的亮点检测
- 特仑苏VS金典,解读高手过招
- js之深浅克隆(深浅拷贝)