Implementation and Performance Evaluation of Memory System Using Addressable Cache for HPC Applications on HBM2 Equipped FPGAs

Fujita, Norihisa; Kobayashi, Ryohei; Yamaguchi, Yoshiki; Boku, Taisuke

doi:10.1007/978-3-031-31209-0_9

Norihisa Fujita^13,14,
Ryohei Kobayashi^13,14,
Yoshiki Yamaguchi^13,14 &
…
Taisuke Boku^13,14

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13835))

Included in the following conference series:

European Conference on Parallel Processing

385 Accesses

Abstract

When we apply field programmable gate arrays (FPGAs) as HPC accelerators, their memory bandwidth presents a significant challenge because it is not comparable to those of other HPC accelerators. In this paper, we propose a memory system for HBM2-equipped FPGAs and HPC applications that uses block RAMs as an addressable cache implemented between HBM2 and an application. This architecture enables data transfer between HBM2 and the cache bulk and allows an application to utilize fast random access on BRAMs. This study demonstrates the implementation and performance evaluation of our new memory system for HPC and HBM2 on an FPGA. Furthermore, we describe the API that can be used to control this system from the host. We implement RISC-V cores in an FPGA as controllers to realize fine-grain data transfer control and to prevent overheads derived from the PCI Express bus. The proposed system is implemented on eight memory channels and achieves 102.7 GB/s of the bandwidth. It overcomes the memory bandwidth of conventional FPGA boards with four channels of DDR4 memory despite using only 8 of 32 channels of the HBM2.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Boost.YAP Library: https://www.boost.org/doc/libs/release/doc/html/yap.html
Chao, J.: Saturn: a terabit packet switch using dual round robin. IEEE Commun. Mag. 38(12), 78–84 (2000). https://doi.org/10.1109/35.888261
Article Google Scholar
kyu Choi, Y., Chi, Y., Qiao, W., Samardzic, N., Cong, J.: HBM connect: High-performance HLS interconnect for FPGA HBM. In: FPGA 2021 (2021)
Google Scholar
De Matteis, T., de Fine Licht, J., Beránek, J., Hoefler, T.: Streaming message interface: High-performance distributed memory programming on reconfigurable hardware. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 201919, pp. 82:1–82:33. ACM New York (2019). https://doi.org/10.1145/3295500.3356201
Fujita, N., Kobayashi, R., Yamaguchi, Y., Boku, T.: Hbm2 memory system for HPC applications on an FPGA. In: 2021 IEEE International Conference on Cluster Computing (CLUSTER), pp. 783–786 (2021). https://doi.org/10.1109/Cluster48925.2021.00116
Hack, S., Grund, D., Goos, G.: Register allocation for programs in SSA-form. In: Mycroft, A., Zeller, A. (eds.) CC 2006. LNCS, vol. 3923, pp. 247–262. Springer, Heidelberg (2006). https://doi.org/10.1007/11688839_20
Chapter Google Scholar
Holzinger, P., Reiser, D., Hahn, T., Reichenbach, M.: Fast HBM access with FPGAS: analysis, architectures, and applications. In: 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 152–159 (2021). https://doi.org/10.1109/IPDPSW52791.2021.00030
Intel: https://www.intel.co.jp/content/www/jp/ja/products/docs/programmable/agilex-m-series-memory-white-paper.html
Kenter, T., et al.: OpenCL-based FPGA design to accelerate the nodal discontinuous galerkin method for unstructured meshes. In: 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 189–196, April 2018. https://doi.org/10.1109/FCCM.2018.00037
Kuramochi, R., Nakahara, H.: An FPGA-based low-latency accelerator for randomly wired neural networks. In: 2020 30th International Conference on Field-Programmable Logic and Applications (FPL), pp. 298–303 (2020). https://doi.org/10.1109/FPL50879.2020.00056
LibFirm: https://pp.ipd.kit.edu/firm/
Meyer, M., Kenter, T., Plessl, C.: Evaluating FPGA accelerator performance with a parameterized opencl adaptation of selected benchmarks of the hpcchallenge benchmark suite. In: 2020 IEEE/ACM International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC), pp. 10–18 (2020). https://doi.org/10.1109/H2RC51942.2020.00007
RISC-V International: https://riscv.org/
Venkataramanaiah, S.K., et al.: FPGA-based low-batch training accelerator for modern CNNs featuring high bandwidth memory. In: 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD), pp. 1–8 (2020)
Google Scholar
Zohouri, H.R., Podobas, A., Matsuoka, S.: Combined spatial and temporal blocking for high-performance stencil computation on FPGAS using OpenCL. In: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA 2018, pp. 153–162. Association for Computing Machinery, New York, (2018). https://doi.org/10.1145/3174243.3174248

Download references

Acknowledgments

This work was supported by JSPS KAKENHI Grant Number 21H04869. We also thank the Intel University Program for providing hardware and software.

Author information

Authors and Affiliations

Center for Computational Sciences, University of Tsukuba, Tsukuba, Japan
Norihisa Fujita, Ryohei Kobayashi, Yoshiki Yamaguchi & Taisuke Boku
Degree Programs in Systems and Information Engineering, Graduate School of Science and Technology, University of Tsukuba, Tsukuba, Japan
Norihisa Fujita, Ryohei Kobayashi, Yoshiki Yamaguchi & Taisuke Boku

Authors

Norihisa Fujita
View author publications
You can also search for this author in PubMed Google Scholar
Ryohei Kobayashi
View author publications
You can also search for this author in PubMed Google Scholar
Yoshiki Yamaguchi
View author publications
You can also search for this author in PubMed Google Scholar
Taisuke Boku
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Norihisa Fujita .

Editor information

Editors and Affiliations

University of Glasgow, Glasgow, UK
Jeremy Singer
University of Glasgow, Glasgow, UK
Yehia Elkhatib
University of Santiago de Compostela, Santiago de Compostela, La Coruña, Spain
Dora Blanco Heras
Louisiana State University, Baton Rouge, LA, USA
Patrick Diehl
University of Edinburgh, Edinburgh, UK
Nick Brown
Universidade de Lisboa, Lisbon, Portugal
Aleksandar Ilic

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fujita, N., Kobayashi, R., Yamaguchi, Y., Boku, T. (2023). Implementation and Performance Evaluation of Memory System Using Addressable Cache for HPC Applications on HBM2 Equipped FPGAs. In: Singer, J., Elkhatib, Y., Blanco Heras, D., Diehl, P., Brown, N., Ilic, A. (eds) Euro-Par 2022: Parallel Processing Workshops. Euro-Par 2022. Lecture Notes in Computer Science, vol 13835. Springer, Cham. https://doi.org/10.1007/978-3-031-31209-0_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-31209-0_9
Published: 02 May 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31208-3
Online ISBN: 978-3-031-31209-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Implementation and Performance Evaluation of Memory System Using Addressable Cache for HPC Applications on HBM2 Equipped FPGAs