Abstract
A hardware local essential tree (LET) generator used in an N-body simulation is implemented on the FPGA of PEACH2 (PCI Express Adaptive Communication Hub ver2), a low latency switching hub for high performance GPU clusters. By using the pipelined on-the-fly execution with a multipole acceptance criterion judging module and a data updating module, the generation performance is 2.2 times faster than that with the CPU. When data communication is considered, the performance was 7.2 times as the case with the CPU.
- A Alagic and H. Amano. Performance analysis of fully-adaptable CRC accelerators on an FPGA. In Proceedings of the International Conference on Field Programmable Logic and Application (FPL '12), pages 575--578, Sept 2012.Google Scholar
- U. o. T. Center for Computational Sciences. http://www.ccs.tsukuba.ac.jp/.Google Scholar
- C. Toal, K. McLaughlin, S. Sezer, and X. Yang. Design and Implementation of a Field Programmable CRC Circuit Architecture. In IEEE Trans. On VLSI Systems, Vol. 17(8), pages 1142--1147, Aug. 2009. Google ScholarDigital Library
- Ebisuzaki, Toshikazu; Fukushige, T.; Taiji, Makoto; Makino, J.; Sugimoto, D.; Ito, T.; Okumura, S. K.; Hashimoto, E.; Tomida, K.; Miyakawa, N. GRAPE: special purpose computer for simulations of many-body systems. In Proceedings of the Twenty-Sixth Hawaii International Conference on System Sciences, pages 134--143, Jan. 1993.Google Scholar
- E. S. Fukuda, H. Inoue, T. Takenaka, D. Kim, T. Sadahira, T. Asai, and M. Motomura. Caching Memcached at Reconfigurable Network Interface. In Proceedings of the International Conference on Field Programmable Logic and Application (FPL '14), Sept 2014.Google ScholarCross Ref
- Go Ogiya, Yohei Miki, Taisuki Boku, Masao Mori, Naohito Nakasato. Implementation and Performance Evaluation of Astrophysical Tree-code for GPU Clusters. In Information Processing Society of Japan Vol. 6, pages 58--70, April 2013.Google Scholar
- T. Hanawa, Y. Kodama, T. Boku, and M. Sato. Interconnect for tightly coupled accelerators architecture. "IEEE 21st Annual Symposium on High-Performance Interconnects (HOT Interconnects 21)", 2013. Google ScholarDigital Library
- Hans Sagan, Space-Filing Curves. Springer-Verlag Berlin and Heildelberg GmBH and Co. K, 1994.Google Scholar
- J. Bedorf, E. Caburov, M. S. Fujii, K. Nitadori, T. Ishiyama, S. Portegies Zwart. 24.77 Pflops on a Gravitational Tree-Code to Simulate the Milky Way Galaxy with 18600 GPUs. In Proceedings of the International Conference for High Performance Computing, Networking, Strage and Analysis, Dec 2014. Google ScholarDigital Library
- J. Bedorf, E. Gaburov, S. Portegies Zwart. A sparse octree gravitational N-body code that runs entirely on the GPU processor. In Journal of Computational Physics, Vol 231, pages 2825--2839, April 2012. Google ScholarDigital Library
- Josh Barnes, Piet Hut. A hierarchical O(NlogN) force-calculation algorithm. In Nature, pages 446--449, December 1986.Google Scholar
- J. Szefer, Y. Chen, and R. B. Lee. General Purpose FPGA platform for efficient encryption and hashing. In IEEE International Conference on Application Specific System Architectures and Processors, pages 309--312, June 2010.Google ScholarCross Ref
- Makino, J.; Ito, T.; Ebisuzaki, Toshikazu; Sugimoto, D. GRAPE: a special-purpose computer for N-body problems. In Proceedings of the International Conference on Application Specific Array Processors, pages 180--189, Sep 1990.Google Scholar
- Michael S. Warren, John K. Salmon. Astrophysical N-body Simulations Using Hierarchical Tree Data Structures. In Proceedings of the 1992 ACM/IEEE Conference on Supercomputing, pages 570--577, Sep 1992. Google ScholarDigital Library
- Michael S. Warren, John K. Salmon. A Parallel Hashed Oct-Three N-Body Algorithm. In Proceedings of the 1993 ACM/IEEE Conference on Supercomputing, pages 12--24, March 1993. Google ScholarDigital Library
- Michael S. Warren, John K. Salmon. Skeletons from the treecode closet. In Journal of Computational Physics (ISSN 0021-9991) vol. 111, No. 1, pages 136--155, March 1994. Google ScholarDigital Library
- Yuetsu Kodama, Toshihiro Hanawa, Taisuke Boku, Mitsuhisa Sato. PEACH2: An FPGA-based PCIe network device for Tightly Coupled Accelerators. In HEART2014, June 2014. Google ScholarDigital Library
Recommendations
PEACH2: An FPGA-based PCIe network device for Tightly Coupled Accelerators
HEART '14In recent years, heterogeneous clusters using accelerators are often used for high performance computing systems. In such clusters, inter-node communication between accelerators requires several memory copies via CPU memory, and the communication ...
Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers
Highlights- Generate parallel CUDA code from sequential C input code using a compiler-based tool for key operators in Geometric Multigrid.
AbstractGPUs, with their high bandwidths and computational capabilities are an increasingly popular target for scientific computing. Unfortunately, to date, harnessing the power of the GPU has required use of a GPU-specific programming model ...
Designing Next Generation Clusters: Evaluation of InfiniBand DDR/QDR on Intel Computing Platforms
HOTI '09: Proceedings of the 2009 17th IEEE Symposium on High Performance InterconnectsClusters based on commodity components continue to be very popular for high-performance computing (HPC).These clusters must be careful to balance both computational as well as I/O requirements of applications. This I/O requirement is generally fulfilled ...
Comments