ABSTRACT
A heterogeneous system with Field Programmable Gate Array (FPGA) is gathering attention in High-Performance Computing (HPC) area. When FPGA is used as an accelerator attached to the host CPU, there can be many configurations such as network topology to construct FPGA cluster. Sustained data transfer bandwidth between FPGA memory and CPU memory on a distant node is one of the most important factors to decide a topology of FPGA cluster. In order to explore the best topology, a quantitative evaluation of bandwidth is required. We conducted bandwidth measurement on two host nodes; both nodes are connected via 100Gbps InfiniBand cable and one host node has PCIe Gen3 x8-based FPGA accelerator card. We implemented a Direct Memory Access (DMA) function on an FPGA-attached node and a software bridged data transfer function to transfer data between two nodes. The result shows that DMA function and software bridged data transfer function achieve 82.2% and 69.6% of the theoretical bandwidth of PCIe Gen3 x8, a bottleneck of data transfer path, respectively.
- {n.d.}. ConnectX-5 VPI Adapter Cards User Manual - ConnectX-5 InfiniBand VPI - Mellanox Docs. https://docs.mellanox.com/display/ConnectX5IB.Google Scholar
- {n.d.}. Cygnus Supercomputers Center for Computational Sciences, Tsukuba University. https://www.ccs.tsukuba.ac.jp/eng/supercomputers/#Cygnus.Google Scholar
- {n.d.}. Intel Programmable Acceleration Card with Arria 10 GX FPGA. https://www.intel.com/content/www/us/en/programmable/products/boards_and_kits/dev-kits/altera/acceleration-card-arria-10-gx.html.Google Scholar
- A. M. Caulfield, E. S. Chung, A. Putnam, H. Angepat, J. Fowers, M. Haselman, S. Heil, M. Humphrey, P. Kaur, J. Kim, D. Lo, T. Massengill, K. Ovtcharov, M. Papamichael, L. Woods, S. Lanka, D. Chiou, and D. Burger. 2016. A cloud-scale acceleration architecture. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 1--13. Google ScholarDigital Library
- M. Jacobsen and R. Kastner. 2013. RIFFA 2.0: A reusable integration framework for FPGA accelerators. In 2013 23rd International Conference on Field programmable Logic and Applications. 1--8.Google Scholar
- Ryohei Kobayashi, Yuma Oobata, Norihisa Fujita, Yoshiki Yamaguchi, and Taisuke Boku. 2018. OpenCL-ready High Speed FPGA Network for Reconfigurable High Performance Computing. In Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region (HPC Asia 2018). ACM, New York, NY, USA, 192--201. Google ScholarDigital Library
- Christian Plessl. 2018. Bringing FPGAs to HPC Production Systems and Codes. In H2RC18 workshop at Supercomputing (SC'18).Google Scholar
- Ahmed Sanaullah and Martin C. Herbordt. 2018. FPGA HPC Using OpenCL: Case Study in 3D FFT. In Proceedings of the 9th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies (HEART 2018). ACM, New York, NY, USA, Article 7, 6 pages. Google ScholarDigital Library
- Roberto Sanchez Correa and Jean Pierre David. 2018. Ultra-low latency communication channels for FPGA-based HPC cluster. Integration 63 (05 2018).Google Scholar
- J. A. Stuart and J. D. Owens. 2009. Message passing on data-parallel architectures. In 2009 IEEE International Symposium on Parallel Distributed Processing. 1--12. Google ScholarDigital Library
- Tomohiro Ueno Takaaki Miyajima and Kentaro Sano. 2018. Stream Computing of Lattice-Boltzmann Method on Intel Programmable Accelerator Card. In H2RC'18 workshop at Supercomputing (SC'18).Google Scholar
- Naif Tarafdar, Thomas Lin, Eric Fukuda, Hadi Bannazadeh, Alberto Leon-Garcia, and Paul Chow. 2017. Enabling Flexible Network FPGA Clusters in a Heterogeneous Cloud Data Center. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '17). ACM, New York, NY, USA, 237--246. Google ScholarDigital Library
- A. Theodore Markettos, P. J. Fox, S. W. Moore, and A. W. Moore. 2014. Interconnect for commodity FPGA clusters: Standardized or customized?. In 2014 24th International Conference on Field Programmable Logic and Applications (FPL). 1--8.Google Scholar
- M. Vesper, D. Koch, K. Vipin, and S. A. Fahmy. 2016. JetStream: An open-source high-performance PCI Express 3 streaming library for FPGA-to-Host and FPGA-to-FPGA communication. In 2016 26th International Conference on Field Programmable Logic and Applications (FPL). 1--9.Google Scholar
Recommendations
Accelerating Space Radiative Transfer on FPGA using OpenCL
HEART '18: Proceedings of the 9th International Symposium on Highly-Efficient Accelerators and Reconfigurable TechnologiesOne of the recent challenges faced by High-Performance Computing (HPC) is how to apply Field-Programmable Gate Array (FPGA) technology to accelerate a next-generation supercomputer as an efficient method of achieving high performance and low power ...
Hardware-software co-design of AES on FPGA
ICACCI '12: Proceedings of the International Conference on Advances in Computing, Communications and InformaticsThis paper presents a compact hardware-software co-design of Advanced Encryption Standard (AES) on the field programmable gate arrays (FPGA) designed for low-cost embedded systems. The design uses MicroBlaze, a soft-core processor from Xilinx. The ...
InfiniBand Verbs on GPU
Due to their massive parallelism and high performance per Watt, GPUs have gained high popularity in high-performance computing and are a strong candidate for future exascale systems. But communication and data transfer in GPU-accelerated systems remain ...
Comments