column

Off-Loading LET Generation to PEACH2: A Switching Hub for High Performance GPU Clusters

Authors:
Chiharu Tsuruta

Keio University, Yokohama, Japan

Keio University, Yokohama, Japan
View Profile

,
Yohei Miki

University of Tsukuba, Tsukuba, Japan

University of Tsukuba, Tsukuba, Japan
View Profile

,
Takuya Kuhara

Keio University, Yokohama, Japan

Keio University, Yokohama, Japan
View Profile

,
Hideharu Amano

Keio University, Yokohama, Japan

Keio University, Yokohama, Japan
View Profile

,
Masayuki Umemura

University of Tsukuba, Tsukuba, Japan

University of Tsukuba, Tsukuba, Japan
View Profile

Authors Info & Claims

ACM SIGARCH Computer Architecture News Volume 43 Issue 4September 2015pp 3–8https://doi.org/10.1145/2927964.2927966

Published:22 April 2016Publication History

ACM SIGARCH Computer Architecture News

Abstract

A hardware local essential tree (LET) generator used in an N-body simulation is implemented on the FPGA of PEACH2 (PCI Express Adaptive Communication Hub ver2), a low latency switching hub for high performance GPU clusters. By using the pipelined on-the-fly execution with a multipole acceptance criterion judging module and a data updating module, the generation performance is 2.2 times faster than that with the CPU. When data communication is considered, the performance was 7.2 times as the case with the CPU.

References

A Alagic and H. Amano. Performance analysis of fully-adaptable CRC accelerators on an FPGA. In Proceedings of the International Conference on Field Programmable Logic and Application (FPL '12), pages 575--578, Sept 2012.Google Scholar
U. o. T. Center for Computational Sciences. http://www.ccs.tsukuba.ac.jp/.Google Scholar
C. Toal, K. McLaughlin, S. Sezer, and X. Yang. Design and Implementation of a Field Programmable CRC Circuit Architecture. In IEEE Trans. On VLSI Systems, Vol. 17(8), pages 1142--1147, Aug. 2009. Google ScholarDigital Library
Ebisuzaki, Toshikazu; Fukushige, T.; Taiji, Makoto; Makino, J.; Sugimoto, D.; Ito, T.; Okumura, S. K.; Hashimoto, E.; Tomida, K.; Miyakawa, N. GRAPE: special purpose computer for simulations of many-body systems. In Proceedings of the Twenty-Sixth Hawaii International Conference on System Sciences, pages 134--143, Jan. 1993.Google Scholar
E. S. Fukuda, H. Inoue, T. Takenaka, D. Kim, T. Sadahira, T. Asai, and M. Motomura. Caching Memcached at Reconfigurable Network Interface. In Proceedings of the International Conference on Field Programmable Logic and Application (FPL '14), Sept 2014.Google ScholarCross Ref
Go Ogiya, Yohei Miki, Taisuki Boku, Masao Mori, Naohito Nakasato. Implementation and Performance Evaluation of Astrophysical Tree-code for GPU Clusters. In Information Processing Society of Japan Vol. 6, pages 58--70, April 2013.Google Scholar
T. Hanawa, Y. Kodama, T. Boku, and M. Sato. Interconnect for tightly coupled accelerators architecture. "IEEE 21st Annual Symposium on High-Performance Interconnects (HOT Interconnects 21)", 2013. Google ScholarDigital Library
Hans Sagan, Space-Filing Curves. Springer-Verlag Berlin and Heildelberg GmBH and Co. K, 1994.Google Scholar
J. Bedorf, E. Caburov, M. S. Fujii, K. Nitadori, T. Ishiyama, S. Portegies Zwart. 24.77 Pflops on a Gravitational Tree-Code to Simulate the Milky Way Galaxy with 18600 GPUs. In Proceedings of the International Conference for High Performance Computing, Networking, Strage and Analysis, Dec 2014. Google ScholarDigital Library
J. Bedorf, E. Gaburov, S. Portegies Zwart. A sparse octree gravitational N-body code that runs entirely on the GPU processor. In Journal of Computational Physics, Vol 231, pages 2825--2839, April 2012. Google ScholarDigital Library
Josh Barnes, Piet Hut. A hierarchical O(NlogN) force-calculation algorithm. In Nature, pages 446--449, December 1986.Google Scholar
J. Szefer, Y. Chen, and R. B. Lee. General Purpose FPGA platform for efficient encryption and hashing. In IEEE International Conference on Application Specific System Architectures and Processors, pages 309--312, June 2010.Google ScholarCross Ref
Makino, J.; Ito, T.; Ebisuzaki, Toshikazu; Sugimoto, D. GRAPE: a special-purpose computer for N-body problems. In Proceedings of the International Conference on Application Specific Array Processors, pages 180--189, Sep 1990.Google Scholar
Michael S. Warren, John K. Salmon. Astrophysical N-body Simulations Using Hierarchical Tree Data Structures. In Proceedings of the 1992 ACM/IEEE Conference on Supercomputing, pages 570--577, Sep 1992. Google ScholarDigital Library
Michael S. Warren, John K. Salmon. A Parallel Hashed Oct-Three N-Body Algorithm. In Proceedings of the 1993 ACM/IEEE Conference on Supercomputing, pages 12--24, March 1993. Google ScholarDigital Library
Michael S. Warren, John K. Salmon. Skeletons from the treecode closet. In Journal of Computational Physics (ISSN 0021-9991) vol. 111, No. 1, pages 136--155, March 1994. Google ScholarDigital Library
Yuetsu Kodama, Toshihiro Hanawa, Taisuke Boku, Mitsuhisa Sato. PEACH2: An FPGA-based PCIe network device for Tightly Coupled Accelerators. In HEART2014, June 2014. Google ScholarDigital Library

Recommendations

PEACH2: An FPGA-based PCIe network device for Tightly Coupled Accelerators
HEART '14

In recent years, heterogeneous clusters using accelerators are often used for high performance computing systems. In such clusters, inter-node communication between accelerators requires several memory copies via CPU memory, and the communication ...
Read More
Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers
Highlights
- Generate parallel CUDA code from sequential C input code using a compiler-based tool for key operators in Geometric Multigrid.
Abstract
GPUs, with their high bandwidths and computational capabilities are an increasingly popular target for scientific computing. Unfortunately, to date, harnessing the power of the GPU has required use of a GPU-specific programming model ...
Read More
Designing Next Generation Clusters: Evaluation of InfiniBand DDR/QDR on Intel Computing Platforms
HOTI '09: Proceedings of the 2009 17th IEEE Symposium on High Performance Interconnects

Clusters based on commodity components continue to be very popular for high-performance computing (HPC).These clusters must be careful to balance both computational as well as I/O requirements of applications. This I/O requirement is generally fulfilled ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGARCH Computer Architecture News Volume 43, Issue 4
HEART '15
September 2015
98 pages
ISSN:0163-5964
DOI:10.1145/2927964
Editor:
Doug DeGroot
acm dot org
Issue’s Table of Contents
Copyright © 2016 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 April 2016
Check for updates
Qualifiers
- column
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 11
  Total Citations
  View Citations
- 80
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Off-Loading LET Generation to PEACH2: A Switching Hub for High Performance GPU Clusters

ACM SIGARCH Computer Architecture News

Abstract

References

Cited By

Recommendations

PEACH2: An FPGA-based PCIe network device for Tightly Coupled Accelerators

Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers

Designing Next Generation Clusters: Evaluation of InfiniBand DDR/QDR on Intel Computing Platforms

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Off-Loading LET Generation to PEACH2: A Switching Hub for High Performance GPU Clusters

ACM SIGARCH Computer Architecture News

Abstract

References

Cited By

Recommendations

PEACH2: An FPGA-based PCIe network device for Tightly Coupled Accelerators

Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers

Designing Next Generation Clusters: Evaluation of InfiniBand DDR/QDR on Intel Computing Platforms

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media