skip to main content
10.1145/3624062.3625534acmotherconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
short-paper

Hardware Specialization: Estimating Monte Carlo Cross-Section Lookup Kernel Performance and Area

Published: 12 November 2023 Publication History

Abstract

Hardware specialization is one of the promising directions in the post-Moore era. It is imperative to understand how hardware specialization paradigms can benefit HPC. An essential question revolves around estimating the theoretical performance of an optimally specialized architecture without requiring extensive hardware development expertise and efforts. Focusing on the Monte Carlo cross-section lookup kernel, known for its notably low resource utilization, we develop a workflow to simulate a specialized architecture’s timing and estimate resource usage to answer these questions, leveraging open-source hardware tools. We implement building blocks of the kernel pipeline in the Chisel construction language and generate Verilog codes for resource estimation. Our late-breaking results show that the kernel latency is 46 cycles per lookup while the optimized CPU code takes 680 cycles, and a potential 15k pipeline copies within a 698 mm2 die, reflective of the Intel Xeon Platinum 8180 dimensions.

Supplemental Material

MP4 File
Recording of "Hardware Specialization: Estimating Monte Carlo Cross-Section Lookup Kernel Performance and Area" presentation at PMBS23.

References

[1]
Muhammad Shoaib Bin Altaf and David A Wood. 2014. LogCA: a performance model for hardware accelerators. IEEE Computer Architecture Letters 14, 2 (2014), 132–135.
[2]
J Bachrach, H Vo, B Richards, and Y Lee DAC an d 2012 Design. 2012. Chisel: constructing hardware in a Scala embedded language. DAC Design Automation Conference (2012), 1212–1221.
[3]
JW Jonathan Bachrach and Krste Asanović. 2017. Chisel 3.0 Tutorial. EECS Department, UC Berkeley, Tech. Rep. (2017).
[4]
Yinxiao Feng and Kaisheng Ma. 2022. Chiplet actuary: A quantitative cost model and multi-chiplet architecture exploration. In Proceedings of the 59th ACM/IEEE Design Automation Conference. 121–126.
[5]
Jaydeep P Kulkarni, John Keane, Kyung-Hoae Koo, Satyanand Nalam, Zheng Guo, Eric Karl, and Kevin Zhang. 2016. 5.6 Mb/mm2 1R1W 8T SRAM Arrays Operating Down to 560 mV Utilizing Small-Signal Sensing With Charge Shared Bitline and Asymmetric Sense Amplifier in 14 nm FinFET CMOS Technology. IEEE Journal of Solid-State Circuits 52, 1 (2016), 229–239.
[6]
Gary Lauterbach. 2021. The path to successful wafer-scale integration: The cerebras story. IEEE Micro 41, 6 (2021), 52–57.
[7]
Paul K Romano, Nicholas E Horelik, Bryan R Herman, Adam G Nelson, Benoit Forget, and Kord Smith. 2015. OpenMC: A state-of-the-art Monte Carlo code for research and development. Annals of Nuclear Energy 82 (2015), 90–97.
[8]
Yakun Sophia Shao, Brandon Reagen, Gu-Yeon Wei, and David Brooks. 2014. Aladdin: A pre-rtl, power-performance accelerator simulator enabling large design space exploration of customized architectures. ACM SIGARCH Computer Architecture News 42, 3 (2014), 97–108.
[9]
David E Shaw, Peter J Adams, Asaph Azaria, Joseph A Bank, Brannon Batson, Alistair Bell, Michael Bergdorf, Jhanvi Bhatt, J Adam Butts, Timothy Correia, 2021. Anton 3: twenty microseconds of molecular dynamics simulation before lunch. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1–11.
[10]
A. Siegel, K. Smith, K. Felker, P. Romano, B. Forget, and P. Beckman. 2014. Improved cache performance in Monte Carlo transport calculations using energy banding. Computer Physics Communications 185, 4 (2014), 1195–1199. https://doi.org/10.1016/j.cpc.2013.10.008
[11]
Dylan Stow, Itir Akgun, Russell Barnes, Peng Gu, and Yuan Xie. 2016. Cost and thermal analysis of high-performance 2.5 D and 3D integrated circuit design space. In 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI). IEEE, 637–642.
[12]
John R Tramm, Paul K Romano, Johannes Doerfert, Amanda L Lund, Patrick C Shriwise, Andrew R Siegel, Gavin Ridley, and Andrew Pastrello. 2022. Toward Portable GPU Acceleration of the OpenMC Monte Carlo Particle Transport Code. In PHYSOR 2022 - International Conference on Physics of Reactors.
[13]
John R Tramm and Andrew R Siegel. 2014. Memory bottlenecks and memory contention in multi-core Monte Carlo transport codes. In SNA+ MC 2013-Joint International Conference on Supercomputing in Nuclear Applications+ Monte Carlo. EDP Sciences, 04208.
[14]
Berkeley University of California. 2015. Berkeley Hardware Floating-Point Units Written in Chisel. https://github.com/ucb-bar/berkeley-hardfloat.
[15]
Chen Yang, Tong Geng, Tianqi Wang, Rushi Patel, Qingqing Xiong, Ahmed Sanaullah, Chunshu Wu, Jiayi Sheng, Charles Lin, Vipin Sachdeva, 2019. Fully integrated FPGA molecular dynamics simulations. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1–31.

Cited By

View all
  • (2023)Efficient Algorithms for Monte Carlo Particle Transport on AI Accelerator HardwareComputer Physics Communications10.1016/j.cpc.2023.109072(109072)Online publication date: Dec-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
SC-W '23: Proceedings of the SC '23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis
November 2023
2180 pages
ISBN:9798400707858
DOI:10.1145/3624062
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Chisel
  2. Cross Section Lookup Benchmark
  3. Hardware Specialization
  4. Monte Carlo Simulation
  5. Performance Analysis

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Conference

SC-W 2023

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)40
  • Downloads (Last 6 weeks)4
Reflects downloads up to 10 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Efficient Algorithms for Monte Carlo Particle Transport on AI Accelerator HardwareComputer Physics Communications10.1016/j.cpc.2023.109072(109072)Online publication date: Dec-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media