ABSTRACT
Zero-knowledge proof is a cryptographic primitive that allows for the validation of statements without disclosing any sensitive information, foundational in applications like verifiable outsourcing and digital currency. However, the extensive proof generation time limits its widespread adoption. Even with GPU acceleration, proof generation can still take minutes, with Multi-Scalar Multiplication (MSM) accounting for about 78.2% of the workload. To address this, we present DistMSM, a novel MSM algorithm tailored for distributed multi-GPU systems. At the algorithmic level, DistMSM adapts Pippenger's algorithm for multi-GPU setups, effectively identifying and addressing bottlenecks that emerge during scaling. At the GPU kernel level, DistMSM introduces an elliptic curve arithmetic kernel tailored for contemporary GPU architectures. It optimizes register pressure with two innovative techniques and leverages tensor cores for specific big integer multiplications. Compared to state-of-the-art MSM implementations, DistMSM offers an average 6.39× speedup across various elliptic curves and GPU counts. An MSM task that previously took seconds on a single GPU can now be completed in mere tens of milliseconds. It showcases the substantial potential and efficiency of distributed multi-GPU systems in ZKP acceleration.
- Inc Advanced Micro Devices. Amd rocm open software platform. https://rocm.docs.amd.com, 2023.Google Scholar
- Sebastian Angel, Andrew J Blumberg, Eleftherios Ioannidis, and Jess Woods. Efficient representation of numerical optimization problems for snarks. In 31st USENIX Security Symposium, 2022.Google Scholar
- Samuel Antao, Jean-Claude Bajard, and Leonel Sousa. Elliptic curve point multiplication on gpus. In ASAP 2010-21st IEEE International Conference on Application-specific Systems, Architectures and Processors, pages 192--199. IEEE, 2010.Google ScholarCross Ref
- Gautam Botrel and Youssef El Housni. Faster montgomery multiplication and multi-scalar-multiplication for snarks. 2023.Google ScholarCross Ref
- Henri Cohen, Atsuko Miyaji, and Takatoshi Ono. Efficient elliptic curve exponentiation using mixed coordinates. In Advances in Cryptology---ASIACRYPT'98: International Conference on the Theory and Application of Cryptology and Information Security Beijing, China, October 18--22, 1998 Proceedings, pages 51--65. Springer, 1998.Google ScholarCross Ref
- ZCash Crop. Zcash is cash for the new age. https://z.cash, 2023.Google Scholar
- Cysic. Hardware accelerating zero-knowledge proofs. http://cysic.xyz, 2023.Google Scholar
- George Danezis, Cedric Fournet, Markulf Kohlweiss, and Bryan Parno. Pinocchio coin: building zerocoin from a succinct pairing-based proof system. In First ACM workshop on Language support for privacy-enhancing technologies, pages 27--30, 2013.Google ScholarDigital Library
- Marwa Elteir, Heshan Lin, and Wu-chun Feng. Performance characterization and optimization of atomic operations on amd gpus. In 2011 IEEE International Conference on Cluster Computing, 2011.Google ScholarDigital Library
- Boyuan Feng, Lianke Qin, Zhenfei Zhang, Yufei Ding, and Shumo Chu. Zen: An optimizing compiler for verifiable, zero-knowledge neural network inferences. Cryptology ePrint Archive, 2021.Google Scholar
- Filecoin. bellperson: Gpu parallel acceleration for zk-snark. https://github.com/filecoin-project/bellperson, 2023.Google Scholar
- Mina Foundation. Gpu groth16 prover (3x faster than cpu). https://github.com/MinaProtocol/gpu-groth16-prover-3x, 2023.Google Scholar
- Hisham S Galal and Amr M Youssef. Verifiable sealed-bid auction on the ethereum blockchain. In Financial Cryptography and Data Security: FC 2018 International Workshops, pages 265--278. Springer, 2019.Google ScholarDigital Library
- Lili Gao, Fangyu Zheng, Niall Emmart, Jiankuo Dong, Jingqiang Lin, and Charles Weems. Dpf-ecc: accelerating elliptic curve cryptography with floating-point computing power of gpus. In 2020 IEEE International Parallel and Distributed Processing Symposium, 2020.Google ScholarCross Ref
- Craig Gentry and Daniel Wichs. Separating succinct non-interactive arguments from all falsifiable assumptions. In 43th annual ACM symposium on Theory of computing, pages 99--108, 2011.Google ScholarDigital Library
- Oded Goldreich and Hugo Krawczyk. On the composition of zero-knowledge proof systems. SIAM Journal on Computing, 25(1):169--192, 1996.Google ScholarDigital Library
- Shafi Goldwasser, Silvio Micali, and Charles Rackoff. The knowledge complexity of interactive proof systems. SIAM J. COMPUT, 18(1):186--208, 1989.Google ScholarDigital Library
- Yinjie Gong, Yifei Jin, Yuchan Li, Ziyi Liu, and Zhiyi Zhu. Analysis and comparison of the main zero-knowledge proof scheme. In 2022 International Conference on Big Data, Information and Computer Network, pages 366--372. IEEE, 2022.Google ScholarCross Ref
- Jens Groth. Non-interactive zero-knowledge arguments for voting. In Applied Cryptography and Network Security: Third International Conference, pages 467--482. Springer, 2005.Google ScholarDigital Library
- Jens Groth. On the size of pairing-based non-interactive arguments. In 35th Annual International Conference on the Theory and Applications of Cryptographic Techniques, pages 305--326. Springer, 2016.Google ScholarCross Ref
- Icicle. a gpu library for zero-knowledge acceleration. https://github.com/ingonyama-zk/icicle, 2023.Google Scholar
- Immutable. Immutable x: powering the next generation of web3 games. https://www.immutable.com/products/immutable-x, 2023.Google Scholar
- C Kaya Koc, Tolga Acar, and Burton S Kaliski. Analyzing and comparing montgomery multiplication algorithms. IEEE, 1996.Google ScholarDigital Library
- Karl Leboeuf, Roberto Muscedere, and Majid Ahmadi. A gpu implementation of the montgomery multiplication algorithm for elliptic curve cryptography. In 2013 IEEE International Symposium on Circuits and Systems, pages 2593--2596. IEEE, 2013.Google ScholarCross Ref
- Honglei Li and Weilian Xue. A blockchain-based sealed-bid e-auction scheme with smart contract and zero-knowledge proof. Security and Communication Networks, 2021:1--10, 2021.Google Scholar
- Shigang Li, Kazuki Osawa, and Torsten Hoefler. Efficient quantized sparse matrix operations on tensor cores. In SC22: International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1--15. IEEE, 2022.Google ScholarDigital Library
- Supranational LLC. Zero-knowledge template library. https://github.com/supranational/sppark, 2023.Google Scholar
- Loopring. zkrollup layer 2 for trading and payment. https://loopring.org, 2023.Google Scholar
- Tao Lu, Chengkun Wei, Ruijing Yu, Chaochao Chen, Wenjing Fang, Lei Wang, Zeke Wang, and Wenzhi Chen. Cuzk: Accelerating zero-knowledge proof with a faster parallel multi-scalar multiplication algorithm on gpus. Cryptology ePrint Archive, 2022.Google Scholar
- Weiliang Ma, Qian Xiong, Xuanhua Shi, Xiaosong Ma, Hai Jin, Haozhao Kuang, Mingyu Gao, Ye Zhang, Haichen Shen, and Weifang Hu. Gzkp: A gpu accelerated zero-knowledge proof system. In 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, 2023.Google ScholarDigital Library
- Peter L Montgomery. Modular multiplication without trial division. Mathematics of computation, 44(170):519--521, 1985.Google ScholarCross Ref
- Steven Muchnick. Advanced compiler design implementation. Morgan kaufmann, 1997.Google Scholar
- Ning Ni and Yongxin Zhu. Enabling zero knowledge proof by accelerating zk-snark kernels on gpu. Journal of Parallel and Distributed Computing, 173:20--31, 2023.Google ScholarDigital Library
- NVIDIA. Nvidia a100 tensor core gpu architecture. https://images.nvidia.com/aem-dam/en-zz/Solutions/data-center/nvidia-ampere-architecture-whitepaper.pdf, 2020.Google Scholar
- Wuqiong Pan, Fangyu Zheng, Yuan Zhao, Wen-Tao Zhu, and Jiwu Jing. An efficient elliptic curve cryptography signature server with gpu acceleration. IEEE Transactions on Information Forensics and Security, 12(1):111--122, 2016.Google ScholarDigital Library
- Andy Ray, Ben Devlin, Fu Yong Quah, and Rahul Yesantharao. High performance, open source cryptographic solutions for large scale number theoretic transforms and multi-scalar multiplications in hardcaml. https://github.com/fyquah/hardcaml_zprize, 2023.Google Scholar
- Nicolae Roşia, Virgil Cervicescu, and Mihai Togan. Efficient montgomery multiplication on gpus. In International Conference for Information Technology and Communications. Springer, 2015.Google Scholar
- Howard Wu, Wenting Zheng, Alessandro Chiesa, Raluca Ada Popa, and Ion Stoica. {DIZK}: A distributed zero knowledge proof system. In 27th USENIX Security Symposium, pages 675--692, 2018.Google Scholar
- Yrrid. https://www.yrrid.com, 2023.Google Scholar
- Ye Zhang, Shuo Wang, Xian Zhang, Jiangbin Dong, Xingzhong Mao, Fan Long, Cong Wang, Dong Zhou, Mingyu Gao, and Guangyu Sun. Pipezk: Accelerating zero-knowledge proof with a pipelined architecture. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture, pages 416--428. IEEE, 2021.Google ScholarDigital Library
- Yupeng Zhang, Daniel Genkin, Jonathan Katz, Dimitrios Papadopoulos, and Charalampos Papamanthou. vsql: Verifying arbitrary sql queries over dynamic outsourced databases. In 2017 IEEE Symposium on Security and Privacy, pages 863--880. IEEE, 2017.Google ScholarCross Ref
- Kaiyong Zhao. Implementation of multiple-precision modular multiplication on gpu. In GPU Technology Conference, 2009.Google Scholar
- Zprize. Accelerating the future of zero knowledge cryptography. https://www.zprize.io, 2023.Google Scholar
Index Terms
- Accelerating Multi-Scalar Multiplication for Efficient Zero Knowledge Proofs with Multi-GPU Systems
Recommendations
Techniques for the parallelization of unstructured grid applications on multi-GPU systems
PMAM '12: Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and ManycoresCurrently the set of scientific applications suitable for running on GPUs has increased due to the computational power of GPUs and the availability of programming languages that make more approachable writing scientific applications for GPUs. However, ...
Towards High-Level Programming of Multi-GPU Systems Using the SkelCL Library
IPDPSW '12: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD ForumApplication programming for GPUs (Graphics Processing Units) is complex and error-prone, because the popular approaches - CUDA and OpenCL - are intrinsically low-level and offer no special support for systems consisting of multiple GPUs. The SkelCL ...
Accelerating the multi-zone scalar pentadiagonal CFD algorithm with OpenACC
WACCPD '15: Proceedings of the Second Workshop on Accelerator Programming using DirectivesThe multi-zone scalar pentadiagonal (SP-MZ) benchmark, part of the multi-zone NAS Parallel Benchmark suite, is ported to graphics processing units (GPUs) using OpenACC compiler directives. The sequence of optimizations necessary to transform the SP-MZ ...
Comments