skip to main content
10.1145/3337801.3337813acmotherconferencesArticle/Chapter ViewAbstractPublication PagesheartConference Proceedingsconference-collections
research-article

Scaling Performance for N-Body Stream Computation with a Ring of FPGAs

Published:06 June 2019Publication History

ABSTRACT

Field-Programmable Gate Arrays (FPGAs) offer a fairly non-invasive method to specialize custom architectures towards a specific application domain. Recent studies has successfully demonstrated that single-node FPGAs can be a rival to both CPUs and GPUs in performance. Unfortunately, most existing studies limit themselves to using a single FPGA devices, and their scalability requires more investigation.

In this work, we practically demonstrate how to scale the important n-body problem across a comparatively large FPGA cluster. Our design -- composed of up to 256 processing elements -- achieves near-linear strong scaling, with performance-levels comparable to that of custom Application-Specific Integrated Circuits (ASICs). We further develop an analytical performance model, which we use to predict the performance of our solution onto future upcoming Intel Agilex systems. Today, our system reaches up to 47 Giga-Pairs/second, and using our performance model we predict that we can reach up-to 0.142 Tera-Pairs/second peak performance with next-generation FPGAs.

References

  1. Jeroen Bédorf, Evghenii Gaburov, Michiko S. Fujii, Keigo Nitadori, Tomoaki Ishiyama, and Simon Portegies Zwart. 2014. 24.77 Pflops on a Gravitational Tree-Code to Simulate the Milky Way Galaxy with 18600 GPUs. International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015-Janua, January (2014), 54--65. arXiv:arXiv:1412.0659v1 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Jeroen Bédorf, Evghenii Gaburov, and Simon Portegies Zwart. 2012. A sparse octree gravitational N-body code that runs entirely on the GPU processor. J. Comput. Phys. 231, 7 (2012), 2825--2839. arXiv:arXiv:1106.1900v2 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Edmund Bertschinger. 1998. SIMULATIONS OF STRUCTURE FORMATION IN THE UNIVERSE. Annual Review of Astronomy and Astrophysics 36, 1 (sep 1998), 599--654.Google ScholarGoogle ScholarCross RefCross Ref
  4. Andrew Canis, Jongsok Choi, Mark Aldham, Victor Zhang, Ahmed Kammoona, Tomasz Czajkowski, Stephen D. Brown, and Jason H. Anderson. 2013. LegUp: An open-source high-level synthesis tool for FPGA-based processor/accelerator systems. ACM Transactions on Embedded Computing Systems 13, 2 (sep 2013), 1--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Jason Cong, Zhenman Fang, Michael Lo, Hanrui Wang, Jingxian Xu, and Shaochong Zhang. 2018. Understanding performance differences of FPGAs and GPUs. In 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 93--96.Google ScholarGoogle ScholarCross RefCross Ref
  6. HJW De Baar. 1994. von Liebig's law of the minimum and plankton ecology (1899-1991). Progress in oceanography 33, 4 (1994), 347--386.Google ScholarGoogle Scholar
  7. Emanuele Del Sozzo, Marco Rabozzi, Lorenzo Di Tucci, Donatella Sciuto, and Marco D. Santambrogio. 2018. A Scalable FPGA Design for Cloud N-Body Simulation. Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors 2018-July (2018), 1--8.Google ScholarGoogle Scholar
  8. G. Ya. Dynnikova. 2009. Fast technique for solving the N-body problem in flow simulation by vortex methods. Computational Mathematics and Mathematical Physics 49, 8 (aug 2009), 1389--1396.Google ScholarGoogle ScholarCross RefCross Ref
  9. Dominique Escande, Didier Bénisti, Yves Elskens, David Zarzoso, and Fabrice Doveil. 2018. Basic microscopic plasma physics from N-body mechanics. (may 2018). arXiv:1805.11408 http://arxiv.org/abs/1805.11408Google ScholarGoogle Scholar
  10. Michael J Flynn. 1966. Very high-speed computing systems. Proc. IEEE 54, 12 (1966), 1901--1909.Google ScholarGoogle ScholarCross RefCross Ref
  11. Shuaizhi Guo, Tianqi Wang, Linfeng Tao, Teng Tian, Zikun Xiang, and Xi Jin. 2018. RP-Ring: A Heterogeneous Multi-FPGA Accelerator. 2018 (2018).Google ScholarGoogle Scholar
  12. Jens Huthmann, Julian Oppermann, and Andreas Koch. 2014. Automatic high-level synthesis of multi-threaded hardware accelerators. In 2014 24th International Conference on Field Programmable Logic and Applications (FPL). 1--4.Google ScholarGoogle ScholarCross RefCross Ref
  13. Intel. 2019.. https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/pt/intel-agilex-i-series-product-table.pdfGoogle ScholarGoogle Scholar
  14. Atsushi Kawai, Toshiyuki Fukushige, Junichiro Makino, and Makoto Taiji. 2000. GRAPE-5: A Special-Purpose Computer for N-Body Simulations. Publications of the Astronomical Society of Japan 52, 4 (2000), 659--676.Google ScholarGoogle ScholarCross RefCross Ref
  15. J Makino. 2006. The GRAPE project. Computing in Science Engineering 8, 1 (jan 2006), 30--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Junichiro Makino, Toshiyuki Fukushige, Masaki Koga, and Ken Namura. 2003. GRAPE-6: Massively-Parallel Special-Purpose Computer for Astrophysical Particle Simulations. Publications of the Astronomical Society of Japan 55, 6 (2003), 1163--1187.Google ScholarGoogle ScholarCross RefCross Ref
  17. J Makino, K Hiraki, and M Inaba. 2007. GRAPE-DR: 2-Pflops massively-parallel computer with 512-core, 512-Gflops processor chips for scientific computing. In SC '07: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing. 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Junichiro Makino, Makoto Taiji, Toshikazu Ebisuzaki, and Daiichiro Sugimoto. 1997. GRAPEâĂR4: A Massively Parallel SpecialâĂRPurpose Computer for Collisional N âĂRBody Simulations. The Astrophysical Journal 480, 1 (may 1997), 432--446.Google ScholarGoogle ScholarCross RefCross Ref
  19. Antoniette Mondigo, Kentaro Sano, and Hiroyuki Takizawa. 2018. Enhancing Memory Bandwidth in a Single Stream Computation with Multiple FPGAs. In The 2018 International Conference on Field-Programmable Technology. 9--11.Google ScholarGoogle ScholarCross RefCross Ref
  20. Gordon E Moore et al. 1965. Cramming more components onto integrated circuits.Google ScholarGoogle Scholar
  21. Keigo Nitadori, Junichiro Makino, and George Abe. 2006. High-Performance Small-Scale Simulation of Star Clusters Evolution on Cray XD1. arXiv.org (jun 2006), 6105. arXiv:astro-ph/0606105 http://arxiv.org/abs/astro-ph/0606105Google ScholarGoogle Scholar
  22. C Pilato and F Ferrandi. 2012. Bambu: A Free Framework for the High Level Synthesis of Complex Applications. University Booth of DATE 29, 6 (2012), 2011. http://panda.dei.polimi.it/wp-content/uploads/PosterUB{_}DATE.pdfGoogle ScholarGoogle Scholar
  23. Artur Podobas and Satoshi Matsuoka. 2017. Designing and accelerating spiking neural networks using OpenCL for FPGAs. In 2017 International Conference on Field Programmable Technology (ICFPT). IEEE, 255--258.Google ScholarGoogle ScholarCross RefCross Ref
  24. Kentaro Sano, Shin Abiko, and Tomohiro Ueno. 2017. FPGA-based Stream Computing for High-Performance N-Body Simulation using Floating-Point DSP Blocks. In Proceedings of the 8th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies - HEART2017. ACM Press, New York, New York, USA, 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Rainer Spurzem. 1999. Direct N-body simulations. J. Comput. Appl. Math. 109, 1--2 (1999), 407--432.Google ScholarGoogle ScholarCross RefCross Ref
  26. Terasic. 2018 (accessed). DE5a-NET User Manual. https://www.terasic.com.tw/cgi-bin/page/archive_download.pl?Language=English&No=970&FID=0bc2c05d074b8a05252d9b8e363d69d1Google ScholarGoogle Scholar
  27. M Mitchell Waldrop. 2016. The chips are down for MooreâĂŹs law. Nature News 530, 7589 (2016), 144.Google ScholarGoogle ScholarCross RefCross Ref
  28. Robert A Walker and Raul Camposano. 2012. A survey of high-level synthesis systems. Vol. 135. Springer Science & Business Media.Google ScholarGoogle Scholar
  29. Samuel Williams, Andrew Waterman, and David Patterson. 2009. Roofline: An insightful visual performance model for floating-point programs and multi-core architectures. Technical Report. Lawrence Berkeley National Lab.(LBNL), Berkeley, CA (United States).Google ScholarGoogle ScholarCross RefCross Ref
  30. Hamid Reza Zohouri, Naoya Maruyama, Aaron Smith, Motohiko Matsuda, and Satoshi Matsuoka. 2016. Evaluating and optimizing OpenCL kernels for high performance computing with FPGAs. In SC'16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 409--420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Hamid Reza Zohouri, Artur Podobas, and Satoshi Matsuoka. 2018. High-Performance High-Order Stencil Computation on FPGAs Using OpenCL. In 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 123--130.Google ScholarGoogle Scholar

Index Terms

  1. Scaling Performance for N-Body Stream Computation with a Ring of FPGAs

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            HEART '19: Proceedings of the 10th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies
            June 2019
            106 pages
            ISBN:9781450372558
            DOI:10.1145/3337801

            Copyright © 2019 ACM

            Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 6 June 2019

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited

            Acceptance Rates

            HEART '19 Paper Acceptance Rate12of29submissions,41%Overall Acceptance Rate22of50submissions,44%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader