skip to main content
column

Off-Loading LET Generation to PEACH2: A Switching Hub for High Performance GPU Clusters

Published:22 April 2016Publication History
Skip Abstract Section

Abstract

A hardware local essential tree (LET) generator used in an N-body simulation is implemented on the FPGA of PEACH2 (PCI Express Adaptive Communication Hub ver2), a low latency switching hub for high performance GPU clusters. By using the pipelined on-the-fly execution with a multipole acceptance criterion judging module and a data updating module, the generation performance is 2.2 times faster than that with the CPU. When data communication is considered, the performance was 7.2 times as the case with the CPU.

References

  1. A Alagic and H. Amano. Performance analysis of fully-adaptable CRC accelerators on an FPGA. In Proceedings of the International Conference on Field Programmable Logic and Application (FPL '12), pages 575--578, Sept 2012.Google ScholarGoogle Scholar
  2. U. o. T. Center for Computational Sciences. http://www.ccs.tsukuba.ac.jp/.Google ScholarGoogle Scholar
  3. C. Toal, K. McLaughlin, S. Sezer, and X. Yang. Design and Implementation of a Field Programmable CRC Circuit Architecture. In IEEE Trans. On VLSI Systems, Vol. 17(8), pages 1142--1147, Aug. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Ebisuzaki, Toshikazu; Fukushige, T.; Taiji, Makoto; Makino, J.; Sugimoto, D.; Ito, T.; Okumura, S. K.; Hashimoto, E.; Tomida, K.; Miyakawa, N. GRAPE: special purpose computer for simulations of many-body systems. In Proceedings of the Twenty-Sixth Hawaii International Conference on System Sciences, pages 134--143, Jan. 1993.Google ScholarGoogle Scholar
  5. E. S. Fukuda, H. Inoue, T. Takenaka, D. Kim, T. Sadahira, T. Asai, and M. Motomura. Caching Memcached at Reconfigurable Network Interface. In Proceedings of the International Conference on Field Programmable Logic and Application (FPL '14), Sept 2014.Google ScholarGoogle ScholarCross RefCross Ref
  6. Go Ogiya, Yohei Miki, Taisuki Boku, Masao Mori, Naohito Nakasato. Implementation and Performance Evaluation of Astrophysical Tree-code for GPU Clusters. In Information Processing Society of Japan Vol. 6, pages 58--70, April 2013.Google ScholarGoogle Scholar
  7. T. Hanawa, Y. Kodama, T. Boku, and M. Sato. Interconnect for tightly coupled accelerators architecture. "IEEE 21st Annual Symposium on High-Performance Interconnects (HOT Interconnects 21)", 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Hans Sagan, Space-Filing Curves. Springer-Verlag Berlin and Heildelberg GmBH and Co. K, 1994.Google ScholarGoogle Scholar
  9. J. Bedorf, E. Caburov, M. S. Fujii, K. Nitadori, T. Ishiyama, S. Portegies Zwart. 24.77 Pflops on a Gravitational Tree-Code to Simulate the Milky Way Galaxy with 18600 GPUs. In Proceedings of the International Conference for High Performance Computing, Networking, Strage and Analysis, Dec 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Bedorf, E. Gaburov, S. Portegies Zwart. A sparse octree gravitational N-body code that runs entirely on the GPU processor. In Journal of Computational Physics, Vol 231, pages 2825--2839, April 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Josh Barnes, Piet Hut. A hierarchical O(NlogN) force-calculation algorithm. In Nature, pages 446--449, December 1986.Google ScholarGoogle Scholar
  12. J. Szefer, Y. Chen, and R. B. Lee. General Purpose FPGA platform for efficient encryption and hashing. In IEEE International Conference on Application Specific System Architectures and Processors, pages 309--312, June 2010.Google ScholarGoogle ScholarCross RefCross Ref
  13. Makino, J.; Ito, T.; Ebisuzaki, Toshikazu; Sugimoto, D. GRAPE: a special-purpose computer for N-body problems. In Proceedings of the International Conference on Application Specific Array Processors, pages 180--189, Sep 1990.Google ScholarGoogle Scholar
  14. Michael S. Warren, John K. Salmon. Astrophysical N-body Simulations Using Hierarchical Tree Data Structures. In Proceedings of the 1992 ACM/IEEE Conference on Supercomputing, pages 570--577, Sep 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Michael S. Warren, John K. Salmon. A Parallel Hashed Oct-Three N-Body Algorithm. In Proceedings of the 1993 ACM/IEEE Conference on Supercomputing, pages 12--24, March 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Michael S. Warren, John K. Salmon. Skeletons from the treecode closet. In Journal of Computational Physics (ISSN 0021-9991) vol. 111, No. 1, pages 136--155, March 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Yuetsu Kodama, Toshihiro Hanawa, Taisuke Boku, Mitsuhisa Sato. PEACH2: An FPGA-based PCIe network device for Tightly Coupled Accelerators. In HEART2014, June 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 43, Issue 4
    HEART '15
    September 2015
    98 pages
    ISSN:0163-5964
    DOI:10.1145/2927964
    Issue’s Table of Contents

    Copyright © 2016 Authors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 22 April 2016

    Check for updates

    Qualifiers

    • column

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader