research-article

A software bridged data transfer on a FPGA cluster by using pipelining and InfiniBand verbs

Authors:
Takaaki Miyajima

RIKEN Center for Computational Science, Kobe, Hygo, Japan

RIKEN Center for Computational Science, Kobe, Hygo, Japan
View Profile

,
Tomoya Hirao

Fixstars Corporation, Ohsaki, Shinagawa-ku, Tokyo, Japan

Fixstars Corporation, Ohsaki, Shinagawa-ku, Tokyo, Japan
View Profile

,
Naoya Miyamoto

Fixstars Corporation, Ohsaki, Shinagawa-ku, Tokyo, Japan

Fixstars Corporation, Ohsaki, Shinagawa-ku, Tokyo, Japan
View Profile

,
Jeongdo Son

Fixstars Corporation, Ohsaki, Shinagawa-ku, Tokyo, Japan

Fixstars Corporation, Ohsaki, Shinagawa-ku, Tokyo, Japan
View Profile

,
Kentaro Sano

RIKEN Center for Computational Science, Kobe, Hygo, Japan

RIKEN Center for Computational Science, Kobe, Hygo, Japan
View Profile

HEART '19: Proceedings of the 10th International Symposium on Highly-Efficient Accelerators and Reconfigurable TechnologiesJune 2019Article No.: 11Pages 1–6https://doi.org/10.1145/3337801.3337808

Published:06 June 2019Publication History

HEART '19: Proceedings of the 10th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies

Pages 1–6

ABSTRACT

A heterogeneous system with Field Programmable Gate Array (FPGA) is gathering attention in High-Performance Computing (HPC) area. When FPGA is used as an accelerator attached to the host CPU, there can be many configurations such as network topology to construct FPGA cluster. Sustained data transfer bandwidth between FPGA memory and CPU memory on a distant node is one of the most important factors to decide a topology of FPGA cluster. In order to explore the best topology, a quantitative evaluation of bandwidth is required. We conducted bandwidth measurement on two host nodes; both nodes are connected via 100Gbps InfiniBand cable and one host node has PCIe Gen3 x8-based FPGA accelerator card. We implemented a Direct Memory Access (DMA) function on an FPGA-attached node and a software bridged data transfer function to transfer data between two nodes. The result shows that DMA function and software bridged data transfer function achieve 82.2% and 69.6% of the theoretical bandwidth of PCIe Gen3 x8, a bottleneck of data transfer path, respectively.

References

{n.d.}. ConnectX-5 VPI Adapter Cards User Manual - ConnectX-5 InfiniBand VPI - Mellanox Docs. https://docs.mellanox.com/display/ConnectX5IB.Google Scholar
{n.d.}. Cygnus Supercomputers Center for Computational Sciences, Tsukuba University. https://www.ccs.tsukuba.ac.jp/eng/supercomputers/#Cygnus.Google Scholar
{n.d.}. Intel Programmable Acceleration Card with Arria 10 GX FPGA. https://www.intel.com/content/www/us/en/programmable/products/boards_and_kits/dev-kits/altera/acceleration-card-arria-10-gx.html.Google Scholar
A. M. Caulfield, E. S. Chung, A. Putnam, H. Angepat, J. Fowers, M. Haselman, S. Heil, M. Humphrey, P. Kaur, J. Kim, D. Lo, T. Massengill, K. Ovtcharov, M. Papamichael, L. Woods, S. Lanka, D. Chiou, and D. Burger. 2016. A cloud-scale acceleration architecture. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 1--13. Google ScholarDigital Library
M. Jacobsen and R. Kastner. 2013. RIFFA 2.0: A reusable integration framework for FPGA accelerators. In 2013 23rd International Conference on Field programmable Logic and Applications. 1--8.Google Scholar
Ryohei Kobayashi, Yuma Oobata, Norihisa Fujita, Yoshiki Yamaguchi, and Taisuke Boku. 2018. OpenCL-ready High Speed FPGA Network for Reconfigurable High Performance Computing. In Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region (HPC Asia 2018). ACM, New York, NY, USA, 192--201. Google ScholarDigital Library
Christian Plessl. 2018. Bringing FPGAs to HPC Production Systems and Codes. In H2RC18 workshop at Supercomputing (SC'18).Google Scholar
Ahmed Sanaullah and Martin C. Herbordt. 2018. FPGA HPC Using OpenCL: Case Study in 3D FFT. In Proceedings of the 9th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies (HEART 2018). ACM, New York, NY, USA, Article 7, 6 pages. Google ScholarDigital Library
Roberto Sanchez Correa and Jean Pierre David. 2018. Ultra-low latency communication channels for FPGA-based HPC cluster. Integration 63 (05 2018).Google Scholar
J. A. Stuart and J. D. Owens. 2009. Message passing on data-parallel architectures. In 2009 IEEE International Symposium on Parallel Distributed Processing. 1--12. Google ScholarDigital Library
Tomohiro Ueno Takaaki Miyajima and Kentaro Sano. 2018. Stream Computing of Lattice-Boltzmann Method on Intel Programmable Accelerator Card. In H2RC'18 workshop at Supercomputing (SC'18).Google Scholar
Naif Tarafdar, Thomas Lin, Eric Fukuda, Hadi Bannazadeh, Alberto Leon-Garcia, and Paul Chow. 2017. Enabling Flexible Network FPGA Clusters in a Heterogeneous Cloud Data Center. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '17). ACM, New York, NY, USA, 237--246. Google ScholarDigital Library
A. Theodore Markettos, P. J. Fox, S. W. Moore, and A. W. Moore. 2014. Interconnect for commodity FPGA clusters: Standardized or customized?. In 2014 24th International Conference on Field Programmable Logic and Applications (FPL). 1--8.Google Scholar
M. Vesper, D. Koch, K. Vipin, and S. A. Fahmy. 2016. JetStream: An open-source high-performance PCI Express 3 streaming library for FPGA-to-Host and FPGA-to-FPGA communication. In 2016 26th International Conference on Field Programmable Logic and Applications (FPL). 1--9.Google Scholar

Recommendations

Accelerating Space Radiative Transfer on FPGA using OpenCL
HEART '18: Proceedings of the 9th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies

One of the recent challenges faced by High-Performance Computing (HPC) is how to apply Field-Programmable Gate Array (FPGA) technology to accelerate a next-generation supercomputer as an efficient method of achieving high performance and low power ...
Read More
Hardware-software co-design of AES on FPGA
ICACCI '12: Proceedings of the International Conference on Advances in Computing, Communications and Informatics

This paper presents a compact hardware-software co-design of Advanced Encryption Standard (AES) on the field programmable gate arrays (FPGA) designed for low-cost embedded systems. The design uses MicroBlaze, a soft-core processor from Xilinx. The ...
Read More
InfiniBand Verbs on GPU

Due to their massive parallelism and high performance per Watt, GPUs have gained high popularity in high-performance computing and are a strong candidate for future exascale systems. But communication and data transfer in GPU-accelerated systems remain ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

HEART '19: Proceedings of the 10th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies
June 2019
106 pages
ISBN:9781450372558
DOI:10.1145/3337801

Copyright © 2019 ACM
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 June 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
HEART '19 Paper Acceptance Rate12of29submissions,41%Overall Acceptance Rate22of50submissions,44%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 87
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A software bridged data transfer on a FPGA cluster by using pipelining and InfiniBand verbs

HEART '19: Proceedings of the 10th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies

ABSTRACT

References

Cited By

Recommendations

Accelerating Space Radiative Transfer on FPGA using OpenCL

Hardware-software co-design of AES on FPGA

InfiniBand Verbs on GPU

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A software bridged data transfer on a FPGA cluster by using pipelining and InfiniBand verbs

HEART '19: Proceedings of the 10th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies

ABSTRACT

References

Cited By

Recommendations

Accelerating Space Radiative Transfer on FPGA using OpenCL

Hardware-software co-design of AES on FPGA

InfiniBand Verbs on GPU

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media