1. 引言

前序博客:

  • Goldilocks域

所谓计算友好的哈希函数,是指:

  • 基于素数域元素,而不是 通常的如SHA3-256/SHA256/BLAKE3中的raw bits/bytes/N-bit words。原因是,在STARK证明系统中,基于素数域的计算电路更易于证明。

Rescue-Prime哈希函数为计算友好的哈希函数,目前已用于Winterfell STARK证明系统中。

在Winterfell STARK证明系统中,基于的素数域为Goldilocks域,即q=264−232+1q=2^{64}-2^{32}+1q=264−232+1,相应的哈希函数API接口为:

  • 输入:N>0N>0N>0个元素,每个元素均∈Zq,q=264−232+1\in Z_q,q=2^{64}-2^{32}+1∈Zq​,q=264−232+1。
  • 输出:4个元素,每个元素均∈Zq,q=264−232+1\in Z_q,q=2^{64}-2^{32}+1∈Zq​,q=264−232+1。

开源代码见:

  • https://github.com/itzmeanjan/rescue-prime(C++,采用AVX2和NEON(高级SIMD)进行加速)

在https://github.com/itzmeanjan/rescue-prime中,做了scalar和vectorized Rescue两种实现。若目标CPU支持AVX2或NEON指令,则运行Rescue permutation将更快。

make benchmark # benchmarks scalar implementation
AVX2=1 make benchmark # benchmarks AVX2 implementation
NEON=1 make benchmark # benchmarks NEON implementation


CPU加速后,在各平台benchmark的性能为:

  • On Intel® Xeon® Platinum 8375C CPU @ 2.90GHz ( Scalar implementation compiled with GCC )
2022-12-22T15:39:15+00:00
Running ./bench/a.out
Run on (128 X 2072.24 MHz CPU s)
CPU Caches:L1 Data 48 KiB (x64)L1 Instruction 32 KiB (x64)L2 Unified 1280 KiB (x64)L3 Unified 55296 KiB (x2)
Load Average: 0.11, 0.04, 0.01
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                      Time             CPU   Iterations items_per_second max_exec_time (ns) median_exec_time (ns) min_exec_time (ns)
-------------------------------------------------------------------------------------------------------------------------------------------------------------
bench_rphash::permutation/manual_time      31499 ns        43123 ns        22224       31.7467k/s            33.988k               31.489k            31.363k
bench_rphash::hash/4/manual_time           32136 ns        36090 ns        21782        31.118k/s            38.102k               32.124k            31.998k
bench_rphash::hash/8/manual_time           31821 ns        39618 ns        22000        31.426k/s             42.13k               31.806k            31.681k
bench_rphash::hash/16/manual_time          63608 ns        79070 ns        11005       15.7213k/s            68.194k               63.588k            63.417k
bench_rphash::hash/32/manual_time         127188 ns       157987 ns         5504        7.8624k/s           129.934k               127.15k           126.867k
bench_rphash::hash/64/manual_time         254360 ns       315933 ns         2752       3.93144k/s           288.681k              254.272k           253.928k
bench_rphash::hash/128/manual_time        508638 ns       632906 ns         1376       1.96603k/s            522.51k              508.456k            507.97k
  • On Intel® Xeon® Platinum 8375C CPU @ 2.90GHz ( AVX2 implementation compiled with GCC )
2022-12-22T15:40:04+00:00
Running ./bench/a.out
Run on (128 X 1292.25 MHz CPU s)
CPU Caches:L1 Data 48 KiB (x64)L1 Instruction 32 KiB (x64)L2 Unified 1280 KiB (x64)L3 Unified 55296 KiB (x2)
Load Average: 0.13, 0.06, 0.01
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                      Time             CPU   Iterations items_per_second max_exec_time (ns) median_exec_time (ns) min_exec_time (ns)
-------------------------------------------------------------------------------------------------------------------------------------------------------------
bench_rphash::permutation/manual_time      13742 ns        25366 ns        50938       72.7718k/s            16.197k               13.738k            13.646k
bench_rphash::hash/4/manual_time           13756 ns        17694 ns        50886       72.6964k/s            21.138k               13.752k            13.677k
bench_rphash::hash/8/manual_time           13751 ns        21534 ns        50901       72.7217k/s            16.273k               13.747k            13.669k
bench_rphash::hash/16/manual_time          27480 ns        42949 ns        25473       36.3899k/s            32.871k               27.473k            27.356k
bench_rphash::hash/32/manual_time          54944 ns        85784 ns        12740       18.2003k/s            59.909k               54.927k            54.784k
bench_rphash::hash/64/manual_time         109873 ns       171524 ns         6371       9.10138k/s            185.71k              109.828k           109.609k
bench_rphash::hash/128/manual_time        219703 ns       344066 ns         3186       4.55159k/s           232.839k              219.643k           219.373k
  • On Intel® Xeon® Platinum 8375C CPU @ 2.90GHz ( Scalar implementation compiled with Clang )
2022-12-22T15:40:48+00:00
Running ./bench/a.out
Run on (128 X 1295.09 MHz CPU s)
CPU Caches:L1 Data 48 KiB (x64)L1 Instruction 32 KiB (x64)L2 Unified 1280 KiB (x64)L3 Unified 55296 KiB (x2)
Load Average: 0.14, 0.08, 0.02
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                      Time             CPU   Iterations items_per_second max_exec_time (ns) median_exec_time (ns) min_exec_time (ns)
-------------------------------------------------------------------------------------------------------------------------------------------------------------
bench_rphash::permutation/manual_time       9735 ns        22576 ns        71907       102.723k/s            12.152k                9.732k             9.673k
bench_rphash::hash/4/manual_time            9746 ns        14100 ns        71820       102.606k/s            18.543k                9.742k             9.681k
bench_rphash::hash/8/manual_time            9749 ns        18356 ns        71795       102.578k/s            11.831k                9.746k             9.681k
bench_rphash::hash/16/manual_time          19480 ns        36583 ns        35937       51.3342k/s            23.983k               19.474k            19.382k
bench_rphash::hash/32/manual_time          38931 ns        73003 ns        17979       25.6863k/s            42.868k               38.921k            38.769k
bench_rphash::hash/64/manual_time          77837 ns       145841 ns         8994       12.8474k/s            81.722k               77.814k            77.656k
bench_rphash::hash/128/manual_time        155657 ns       291537 ns         4497        6.4244k/s           158.186k              155.614k           155.404k
  • On Intel® Xeon® Platinum 8375C CPU @ 2.90GHz ( AVX2 implementation compiled with Clang )
2022-12-22T15:41:42+00:00
Running ./bench/a.out
Run on (128 X 2035.27 MHz CPU s)
CPU Caches:L1 Data 48 KiB (x64)L1 Instruction 32 KiB (x64)L2 Unified 1280 KiB (x64)L3 Unified 55296 KiB (x2)
Load Average: 0.21, 0.11, 0.03
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                      Time             CPU   Iterations items_per_second max_exec_time (ns) median_exec_time (ns) min_exec_time (ns)
-------------------------------------------------------------------------------------------------------------------------------------------------------------
bench_rphash::permutation/manual_time       9603 ns        22470 ns        72890        104.13k/s            17.661k                9.599k             9.527k
bench_rphash::hash/4/manual_time            9612 ns        13965 ns        72826       104.039k/s            18.814k                9.609k             9.535k
bench_rphash::hash/8/manual_time            9613 ns        18212 ns        72818       104.024k/s            16.387k                 9.61k             9.532k
bench_rphash::hash/16/manual_time          19203 ns        36295 ns        36448       52.0739k/s            23.494k               19.198k            19.104k
bench_rphash::hash/32/manual_time          38372 ns        72446 ns        18242       26.0607k/s            41.034k               38.361k             38.22k
bench_rphash::hash/64/manual_time          76717 ns       144751 ns         9124       13.0349k/s            80.947k               76.692k            76.503k
bench_rphash::hash/128/manual_time        153405 ns       289330 ns         4563       6.51869k/s           178.467k              153.358k           153.099k
  • On Intel® Core™ i5-8279U CPU @ 2.40GHz ( Scalar implementation compiled with Clang )
2022-12-22T19:43:28+04:00
Running ./bench/a.out
Run on (8 X 2400 MHz CPU s)
CPU Caches:L1 Data 32 KiBL1 Instruction 32 KiBL2 Unified 256 KiB (x4)L3 Unified 6144 KiB
Load Average: 1.19, 2.15, 2.89
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                      Time             CPU   Iterations items_per_second max_exec_time (ns) median_exec_time (ns) min_exec_time (ns)
-------------------------------------------------------------------------------------------------------------------------------------------------------------
bench_rphash::permutation/manual_time      11417 ns       137884 ns        61405       87.5911k/s           119.946k               11.218k            11.065k
bench_rphash::hash/4/manual_time           11567 ns        54524 ns        61425       86.4507k/s            120.79k               11.236k            11.071k
bench_rphash::hash/8/manual_time           11464 ns        95950 ns        61527        87.228k/s           145.688k               11.236k            11.081k
bench_rphash::hash/16/manual_time          22696 ns       189947 ns        30847       44.0598k/s           138.261k               22.402k             22.13k
bench_rphash::hash/32/manual_time          45439 ns       381059 ns        15416       22.0077k/s           191.966k               44.724k            44.247k
bench_rphash::hash/64/manual_time          91015 ns       763956 ns         7709       10.9872k/s           1.46905M               89.331k            88.494k
bench_rphash::hash/128/manual_time        188629 ns      1596825 ns         3871       5.30141k/s            435.33k               178.67k           176.944k
  • On Intel® Core™ i5-8279U CPU @ 2.40GHz ( AVX2 implementation compiled with Clang )
2022-12-22T19:45:04+04:00
Running ./bench/a.out
Run on (8 X 2400 MHz CPU s)
CPU Caches:L1 Data 32 KiBL1 Instruction 32 KiBL2 Unified 256 KiB (x4)L3 Unified 6144 KiB
Load Average: 1.97, 2.11, 2.79
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                      Time             CPU   Iterations items_per_second max_exec_time (ns) median_exec_time (ns) min_exec_time (ns)
-------------------------------------------------------------------------------------------------------------------------------------------------------------
bench_rphash::permutation/manual_time       8858 ns       134181 ns        79128       112.898k/s           126.803k                8.722k             8.545k
bench_rphash::hash/4/manual_time            8874 ns        50963 ns        79380       112.692k/s           123.209k                8.724k              8.55k
bench_rphash::hash/8/manual_time            8880 ns        92534 ns        79036       112.609k/s            97.469k                8.728k             8.549k
bench_rphash::hash/16/manual_time          17671 ns       185488 ns        39656         56.59k/s            98.486k               17.376k            17.082k
bench_rphash::hash/32/manual_time          35122 ns       368258 ns        19880       28.4719k/s           183.345k               34.633k            34.112k
bench_rphash::hash/64/manual_time          70485 ns       737101 ns        10012       14.1873k/s           4.48529M               69.141k            68.175k
bench_rphash::hash/128/manual_time        140581 ns      1482180 ns         4985       7.11335k/s           330.147k              138.158k           136.303k
  • On ARM Neoverse-V1 aka AWS Graviton3 ( Scalar implementation compiled with GCC )
2022-12-22T15:48:54+00:00
Running ./bench/a.out
Run on (64 X 2100 MHz CPU s)
CPU Caches:L1 Data 64 KiB (x64)L1 Instruction 64 KiB (x64)L2 Unified 1024 KiB (x64)L3 Unified 32768 KiB (x1)
Load Average: 0.08, 0.02, 0.01
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                      Time             CPU   Iterations items_per_second max_exec_time (ns) median_exec_time (ns) min_exec_time (ns)
-------------------------------------------------------------------------------------------------------------------------------------------------------------
bench_rphash::permutation/manual_time      17051 ns        31612 ns        41058       58.6473k/s            23.325k               17.024k            16.957k
bench_rphash::hash/4/manual_time           17107 ns        22032 ns        40920       58.4551k/s            27.503k                17.08k            17.013k
bench_rphash::hash/8/manual_time           17060 ns        26937 ns        41039        58.617k/s            38.753k               17.032k            16.963k
bench_rphash::hash/16/manual_time          34090 ns        53631 ns        20533       29.3341k/s            45.077k               34.035k            33.944k
bench_rphash::hash/32/manual_time          68140 ns       107193 ns        10274       14.6757k/s            82.014k               68.029k            67.896k
bench_rphash::hash/64/manual_time         136230 ns       214227 ns         5138       7.34055k/s           146.935k              136.014k            135.83k
bench_rphash::hash/128/manual_time        272428 ns       428993 ns         2570        3.6707k/s           283.816k              271.989k           271.714k
  • On ARM Neoverse-V1 aka AWS Graviton3 ( NEON implementation compiled with GCC )
2022-12-22T15:50:11+00:00
Running ./bench/a.out
Run on (64 X 2100 MHz CPU s)
CPU Caches:L1 Data 64 KiB (x64)L1 Instruction 64 KiB (x64)L2 Unified 1024 KiB (x64)L3 Unified 32768 KiB (x1)
Load Average: 0.15, 0.05, 0.02
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                      Time             CPU   Iterations items_per_second max_exec_time (ns) median_exec_time (ns) min_exec_time (ns)
-------------------------------------------------------------------------------------------------------------------------------------------------------------
bench_rphash::permutation/manual_time      17119 ns        31696 ns        40891       58.4156k/s            50.317k               17.092k            17.019k
bench_rphash::hash/4/manual_time           17236 ns        22195 ns        40606       58.0188k/s            26.107k                17.21k            17.151k
bench_rphash::hash/8/manual_time           17152 ns        27013 ns        40812       58.3027k/s            40.492k               17.126k            17.056k
bench_rphash::hash/16/manual_time          34278 ns        53945 ns        20420       29.1735k/s            57.256k               34.225k            34.102k
bench_rphash::hash/32/manual_time          68511 ns       107882 ns        10215       14.5962k/s            92.947k               68.413k            68.261k
bench_rphash::hash/64/manual_time         136978 ns       215582 ns         5110       7.30046k/s           143.904k              136.786k           136.576k
bench_rphash::hash/128/manual_time        273909 ns       431178 ns         2555       3.65085k/s           286.895k              273.525k           273.209k
  • On ARM Neoverse-V1 aka AWS Graviton3 ( Scalar implementation compiled with Clang )
2022-12-22T15:51:07+00:00
Running ./bench/a.out
Run on (64 X 2100 MHz CPU s)
CPU Caches:L1 Data 64 KiB (x64)L1 Instruction 64 KiB (x64)L2 Unified 1024 KiB (x64)L3 Unified 32768 KiB (x1)
Load Average: 0.13, 0.07, 0.02
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                      Time             CPU   Iterations items_per_second max_exec_time (ns) median_exec_time (ns) min_exec_time (ns)
-------------------------------------------------------------------------------------------------------------------------------------------------------------
bench_rphash::permutation/manual_time      17550 ns        32973 ns        39886       56.9786k/s            30.101k               17.521k            17.427k
bench_rphash::hash/4/manual_time           17560 ns        22696 ns        39864        56.948k/s            40.568k                17.53k             17.43k
bench_rphash::hash/8/manual_time           17555 ns        27777 ns        39855       56.9635k/s             26.61k               17.527k            17.433k
bench_rphash::hash/16/manual_time          35088 ns        55486 ns        19950       28.4996k/s            54.669k               35.032k            34.889k
bench_rphash::hash/32/manual_time          70136 ns       110887 ns         9978        14.258k/s            78.483k               70.023k            69.828k
bench_rphash::hash/64/manual_time         140230 ns       221700 ns         4992       7.13113k/s           149.666k              140.014k           139.697k
bench_rphash::hash/128/manual_time        280417 ns       443432 ns         2496       3.56611k/s           286.263k              280.025k           279.561k
  • On ARM Neoverse-V1 aka AWS Graviton3 ( NEON implementation compiled with Clang )
2022-12-22T15:51:50+00:00
Running ./bench/a.out
Run on (64 X 2100 MHz CPU s)
CPU Caches:L1 Data 64 KiB (x64)L1 Instruction 64 KiB (x64)L2 Unified 1024 KiB (x64)L3 Unified 32768 KiB (x1)
Load Average: 0.15, 0.09, 0.03
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                      Time             CPU   Iterations items_per_second max_exec_time (ns) median_exec_time (ns) min_exec_time (ns)
-------------------------------------------------------------------------------------------------------------------------------------------------------------
bench_rphash::permutation/manual_time      18277 ns        33684 ns        38300       54.7146k/s            30.028k                18.25k            18.154k
bench_rphash::hash/4/manual_time           18287 ns        23426 ns        38278       54.6826k/s            25.803k                18.26k            18.169k
bench_rphash::hash/8/manual_time           18290 ns        28515 ns        38273       54.6752k/s            42.531k               18.262k            18.158k
bench_rphash::hash/16/manual_time          36553 ns        56968 ns        19150       27.3573k/s            45.235k                 36.5k            36.317k
bench_rphash::hash/32/manual_time          73058 ns       113815 ns         9581       13.6877k/s             95.43k               72.949k            72.773k
bench_rphash::hash/64/manual_time         146111 ns       227554 ns         4791       6.84411k/s           156.203k              145.895k           145.631k
bench_rphash::hash/128/manual_time        292153 ns       455092 ns         2396       3.42286k/s           302.318k              291.729k           291.347k

相关背景知识可参看:

  • STARK入门知识
  • Rescue-Prime: a Standard Specification (SoK)
  • Winterfell中的Rescue Prime
  • https://github.com/novifinancial/winterfell(Rust)
  • Rescue Permutation test vectors

rescue-prime:基于Goldilocks域的Rescue-Prime 哈希函数加速相关推荐

  1. ACMNO.23 C语言-素数判定 写一个判断素数的函数,在主函数输入一个整数,输出是否是素数的消息。 输入 一个数 输出 如果是素数输出prime 如果不是输出not prime

    题目描述 写一个判断素数的函数,在主函数输入一个整数,输出是否是素数的消息. 输入 一个数 输出 如果是素数输出prime 如果不是输出not prime 样例输入 97 样例输出 prime 来源/ ...

  2. mpeg b帧 编码 matlab,一种基于压缩域的镜头检测算法

    文章编号: 1673- 5196( 2008) 06- 0097- 05 一种基于压缩域的镜头检测算法 摘要: 针对传统的非压缩域镜头检测算法数据量大.运算量大和效率低的缺点, 提出一种基于压缩域的镜 ...

  3. matlab信息隐藏算法,实验四--基于DCT域的信息隐藏算法

    <实验四--基于DCT域的信息隐藏算法>由会员分享,可在线阅读,更多相关<实验四--基于DCT域的信息隐藏算法(6页珍藏版)>请在人人文库网上搜索. 1.实验四 基于DCT域的 ...

  4. 【Pytorch神经网络理论篇】 26 基于空间域的图卷积GCNs(ConvGNNs):定点域+谱域+图卷积的操作步骤

    图卷积网络(Graph Convolutional Network,GCN)是一种能对图数据进行深度学习的方法.图卷积中的"图"是指数学(图论)中用顶点和边建立的有相关联系的拓扑图 ...

  5. 一种基于加密域的数字图像水印算法的设计与实现(附Matlab源码)

    一种基于加密域的数字图像水印算法的设计与实现 项目介绍 毕设项目 题目:一种基于加密域的数字图像水印算法的设计与实现 随着数字媒体技术的发展,数字媒体版权的保护得到了越来越多人的重视,数字水印技术作为 ...

  6. 基于空间域的信息隐藏关键技术研究

    实践题目:基于空间域的信息隐藏关键技术研究 目标是实现对320x240的灰度图像(样本自选,不能是lena图像)进行信息隐藏设计,应用空间域信息隐藏方法(例如LSB替换方法等)进行实验测试.对上述技术 ...

  7. Polygon zkEVM中Goldilocks域元素circom约束

    1. 引言 前序博客有: Goldilocks域 Goldilocks域 p=264−232+1p= 2^{64} - 2^{32} + 1p=264−232+1. Polygon zkEVM中Gol ...

  8. 【论文摘要】一种基于NSPD-DCT域变参数混沌映射的零水印新方案

    A Novel Zero-Watermarking Scheme Based on Variable Parameter Chaotic Mapping in NSPD-DCT Domain 标题:一 ...

  9. 基于信赖域的动态径向基函数代理模型优化策略

    DRBF法回顾 TR-DRBF简介 拉丁超方设计 信頼域思想 算例 DRBF法回顾 在上一篇基于动态径向基函数(DRBF)代理模型的优化策略中我们简要介绍了DRBF算法,这种算法收敛次数少,但由于收敛 ...

最新文章

  1. 套娃成功!在《我的世界》里运行Win95、玩游戏,软件和教程现已公开
  2. 在spring MVC项目中集成Spring session redis (使用spring session框架,redis作为存储缓存)...
  3. 一个简单的PHP模板引擎
  4. Unknown SSL protocol error in connection to xxx:443
  5. x86服务器中网络性能分析与调优(高并发、大流量网卡调优)
  6. Asp.Net Core 混合全球化与本地化支持
  7. flask Form表单数据传递与取值
  8. python 爬虫性能_Python 爬虫性能相关总结
  9. 蓝桥杯 PREV-27 历届试题 蚂蚁感冒
  10. 13.看板方法---使用两层系统扩展看板
  11. php laravel框架失败_急急急!!!ubuntu+laravel+nginx安装完成后,请求laravel框架失败...
  12. Android系统 miui主题,MIUI 主题完全折腾指南
  13. 【STM32】ADC的DMA方式采集(16通道)
  14. 史上最强Js流程控制三大结构
  15. 伤疤好了有黑印怎么办_疤痕留下黑印怎么办
  16. PCIE设备的x1,x4,x8,x16有什么区别?
  17. 百度云有关Token
  18. cvs配电保护断路器_CVS100F断路器|施耐德CVS100F100A断路器
  19. dormer natalie_【图片】[Natalie Dormer]娜塔莉·多默尔【娜塔莉多默尔吧】_百度贴吧...
  20. 杰理之AC695_3.0.4_SDK做发射器连接接收器无声问题解决方法【篇】

热门文章

  1. 原生JS设置CSS样式有多少方式
  2. linux硬盘克隆 软件,分享|10 个免费的磁盘克隆软件
  3. Python工程师之JA3 指纹
  4. 下面linux程序中哪一个是调试器,【编程】noi2009笔试复习题(1)
  5. $\frac{dy}{dx}$ 是什么意思?
  6. SDRAM的数据存储实现并对其数据进行读写操作
  7. dnf电信区服务器位置,DNF新跨区计划更新 拍卖行全服关闭下架
  8. 20230606夏新(Amoi)的4K显示器D320B2000的亮点检测
  9. 特仑苏VS金典,解读高手过招
  10. js之深浅克隆(深浅拷贝)