skip to main content
10.1145/3555776.3577701acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

BcBench: Exploring Throughput Processor Designs based on Blockchain Benchmarking

Published: 07 June 2023 Publication History

Abstract

Benchmark suites have become the most important wheel to drive the research of GPU architecture designs. The most popular GPU benchmark suites, such as Rodinia and Parboil, cover a wide range of real-world applications, which provide a good reference for the researchers to estimate the performance behaviors of the GPU devices in production. However, the existing GPU benchmarks were initially proposed a decade ago, which unfortunately cannot reflect the unique features of the latest GPU applications accurately. Therefore, it is inappropriate to employ the stale benchmarks as the reference to evaluate the state-of-the-art GPU architectures.
Tackling this challenge, we propose BcBench, a new set of GPU workloads collected from the emerging blockchain applications. BcBench is designed for estimating the performance behaviors of the future GPU architectures in accelerating the execution of the emerging applications. To this end, we have ported BcBench to the popular GPU simulators (e.g., GPGPU-Sim) for in-depth performance analysis. We first characterize the blockchain workloads at the micro-architecture level of GPUs fully by leveraging the adequate statistics generated by the GPU simulator. We then conclude five key observations from the workload characterization. We further explore five future GPU architecture designs, which target for the blockchain applications.

References

[1]
2022. ccminer. https://github.com/tpruvot/ccminer.
[2]
2022. equihash. https://github.com/tromp/equihash.
[3]
2022. GPGPU-Sim. https://github.com/gpgpu-sim.
[4]
2022. NvidiaRTX-LHRv2Unlocker. https://github.com/liji3278/NvidiaRTX-LHRv2Unlocker.
[5]
Joe Abou Jaoude and Raafat George Saade. 2019. Blockchain applications-usage in different domains. IEEE Access 7 (2019), 45360--45381.
[6]
Yehia Arafa, Abdel-Hameed A Badawy, Gopinath Chennupati, Nandakishore Santhi, and Stephan Eidenbenz. 2019. Low overhead instruction latency characterization for nvidia gpgpus. In 2019 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 1--8.
[7]
Alex Biryukov and Dmitry Khovratovich. 2017. Equihash: Asymmetric proof-of-work based on the generalized birthday problem. Ledger 2 (2017), 1--30.
[8]
M. Burtscher, R. Nasre, and K. Pingali. 2012. A quantitative study of irregular programs on GPUs. In Proceedings of Workload Characterization (IISWC), 2012 IEEE International Symposium on.
[9]
Gregory J Chaitin. 1982. Register allocation & spilling via graph coloring. ACM Sigplan Notices 17, 6 (1982), 98--101.
[10]
Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In 2009 IEEE international symposium on workload characterization (IISWC). Ieee, 44--54.
[11]
Shuai Che, Jeremy W Sheaffer, Michael Boyer, Lukasz G Szafaryn, Liang Wang, and Kevin Skadron. 2010. A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads. In IEEE International Symposium on Workload Characterization (IISWC'10). IEEE, 1--11.
[12]
Hyungmin Cho. 2018. ASIC-resistance of multi-hash proof-of-work mechanisms for blockchain consensus protocols. IEEE Access 6 (2018), 66210--66222.
[13]
Jack Choquette, Olivier Giroux, and Denis Foley. 2018. Volta: Performance and programmability. Ieee Micro 38, 2 (2018), 42--52.
[14]
John Doering. 2014. Neoscrypt, a strong memory intensive key derivation function.
[15]
Zonghao Feng and Qiong Luo. 2020. Evaluating memory-hard proof-of-work algorithms on three processors. Proceedings of the VLDB Endowment 13, 6 (2020), 898--911.
[16]
Md Sadek Ferdous, Mohammad Jabed Morshed Chowdhury, and Mohammad A Hoque. 2021. A survey of consensus algorithms in public blockchain systems for crypto-currencies. Journal of Network and Computer Applications 182 (2021), 103035.
[17]
Wilson WL Fung, Ivan Sham, George Yuan, and Tor M Aamodt. 2007. Dynamic warp formation and scheduling for efficient GPU control flow. In 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007). IEEE, 407--420.
[18]
Runchao Han, Nikos Foutris, and Christos Kotselidis. 2019. Demystifying cryptomining: Analysis and optimizations of memory-hard pow algorithms. In 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 22--33.
[19]
Tianyi David Han and Tarek S Abdelrahman. 2011. Reducing branch divergence in GPU programs. In Proceedings of the fourth workshop on general purpose processing on graphics processing units. 1--8.
[20]
Ari B Hayes and Eddy Z Zhang. 2014. Unified on-chip memory allocation for SIMT architecture. In Proceedings of the 28th ACM international conference on Supercomputing. 293--302.
[21]
Simon Heron. 2009. Advanced encryption standard (AES). Network Security 2009, 12 (2009), 8--12.
[22]
Sunpyo Hong and Hyesoon Kim. 2009. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. In Proceedings of the 36th annual international symposium on Computer architecture. 152--163.
[23]
Markus Jakobsson and Ari Juels. 1999. Proofs of work and bread pudding protocols. In Secure information networks. Springer, 258--272.
[24]
Jeyhun Karimov, Tilmann Rabl, and Volker Markl. 2018. Polybench: The first benchmark for polystores. In Technology Conference on Performance Evaluation and Benchmarking. Springer, 24--41.
[25]
Mahmoud Khairy, Zhesheng Shen, Tor M Aamodt, and Timothy G Rogers. 2020. Accel-Sim: An extensible simulation framework for validated GPU modeling. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). IEEE, 473--486.
[26]
Hugo Krawczyk. 2010. Cryptographic extraction and key derivation: The HKDF scheme. In Annual Cryptology Conference. Springer, 631--648.
[27]
Shin-Ying Lee and Carole-Jean Wu. 2014. Characterizing the latency hiding ability of gpus. In 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 145--146.
[28]
Jonathan Lew, Deval A Shah, Suchita Pati, Shaylin Cattell, Mengchi Zhang, Amruth Sandhupatla, Christopher Ng, Negar Goli, Matthew D Sinclair, Timothy G Rogers, et al. 2019. Analyzing machine learning workloads using a detailed GPU simulator. In 2019 IEEE international symposium on performance analysis of systems and software (ISPASS). IEEE, 151--152.
[29]
Chao Li, Yi Yang, Zhen Lin, and Huiyang Zhou. 2015. Automatic data placement into GPU on-chip memory resources. In 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). 23--33.
[30]
Erik Lindholm, John Nickolls, Stuart Oberman, and John Montrym. 2008. NVIDIA Tesla: A unified graphics and computing architecture. IEEE micro 28, 2 (2008), 39--55.
[31]
Shin Morishima and Hiroki Matsutani. 2018. Accelerating blockchain search of full nodes using GPUs. In 2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP). IEEE, 244--248.
[32]
Mahmood Naderan-Tahan and Lieven Eeckhout. 2021. Cactus: Top-Down GPU-Compute Benchmarking using Real-Life Applications. In 2021 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 176--188.
[33]
Nvidia. 2010. NVIDIA's Next Generation CUDA Compute Architecture: Fermi. Technical Report.
[34]
Nvidia. 2012. NVIDIA's Next Generation CUDA Compute Architecture: Kepler GK110. Technical Report.
[35]
Nvidia. 2019. CUDA Toolkit Documentation v 10.1.243. https://docs.nvidia.com/.
[36]
Nvidia. 2022. Ampere Architecture. https://www.nvidia.com/en-us/data-center/ampere-architecture/.
[37]
Nvidia. 2022. CUDA C++ Programming Guide. https://docs.nvidia.com/cuda/.
[38]
Colin Percival and Simon Josefsson. 2016. The scrypt password-based key derivation function. Technical Report.
[39]
Minsoo Rhu and Mattan Erez. 2013. Maximizing SIMD resource utilization in GPGPUs with SIMD lane permutation. In Proceedings of the 40th Annual International Symposium on Computer Architecture. 356--367.
[40]
Timothy G Rogers, Mike O'Connor, and Tor M Aamodt. 2012. Cache-conscious wavefront scheduling. In 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 72--83.
[41]
Abdurrashid Ibrahim Sanka and Ray CC Cheung. 2021. A systematic review of blockchain scalability: Issues, solutions, analysis and future research. Journal of Network and Computer Applications 195 (2021), 103232.
[42]
John Sartori and Rakesh Kumar. 2012. Branch and data herding: Reducing control and memory divergence for error-tolerant GPU applications. IEEE Transactions on Multimedia 15, 2 (2012), 279--290.
[43]
Mohammed Shuaib, Sumit Badotra, Muhammad Irfan Khalid, Abeer D Algarni, Syed Sajid Ullah, Sami Bourouis, Jawaid Iqbal, Salil Bharany, and Lokesh Gundaboina. 2022. A Novel Optimization for GPU Mining Using Overclocking and Undervolting. Sustainability 14, 14 (2022), 8708.
[44]
John A Stratton, Christopher Rodrigues, I-Jui Sung, Nady Obeid, Li-Wen Chang, Nasser Anssari, Geng Daniel Liu, and Wen-mei W Hwu. 2012. Parboil: A revised benchmark suite for scientific and commercial throughput computing. Center for Reliable and High-Performance Computing 127 (2012), 27.
[45]
John A Stratton, Christopher I Rodrigues, I-Jui Sung, Nady Obeid, Li-Wen Chang, Nasser Anssari, Geng Daniel Liu, and W Hwu Wen-mei. 2012. Parboil: A revised benchmark suite for scientific and commercial throughput computing. (2012).
[46]
Pinyaphat Tasatanattakool and Chian Techapanupreeda. 2018. Blockchain: Challenges and applications. In 2018 International Conference on Information Networking (ICOIN). IEEE, 473--475.
[47]
Nicolas Van Saberhagen. 2013. CryptoNote v2.0. (2013).
[48]
Jie Zhang, Shuwen Gao, Nam Sung Kim, and Myoungsoo Jung. 2018. CIAO: Cache interference-aware throughput-oriented architecture and scheduling for GPUs. In 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 149--159.
[49]
Hongyu Zhu, Mohamed Akrout, Bojian Zheng, Andrew Pelegris, Anand Jayarajan, Amar Phanishayee, Bianca Schroeder, and Gennady Pekhimenko. 2018. Benchmarking and analyzing deep neural network training. In 2018 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 88--100.

Index Terms

  1. BcBench: Exploring Throughput Processor Designs based on Blockchain Benchmarking

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SAC '23: Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing
      March 2023
      1932 pages
      ISBN:9781450395175
      DOI:10.1145/3555776
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 07 June 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. blockchain
      2. benchmarking
      3. GPU architecture

      Qualifiers

      • Research-article

      Funding Sources

      • NSFC

      Conference

      SAC '23
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

      Upcoming Conference

      SAC '25
      The 40th ACM/SIGAPP Symposium on Applied Computing
      March 31 - April 4, 2025
      Catania , Italy

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 69
        Total Downloads
      • Downloads (Last 12 months)29
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 16 Feb 2025

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media