Skip to main content

Implementation and Performance Evaluation of Memory System Using Addressable Cache for HPC Applications on HBM2 Equipped FPGAs

  • Conference paper
  • First Online:
Euro-Par 2022: Parallel Processing Workshops (Euro-Par 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13835))

Included in the following conference series:

  • 385 Accesses

Abstract

When we apply field programmable gate arrays (FPGAs) as HPC accelerators, their memory bandwidth presents a significant challenge because it is not comparable to those of other HPC accelerators. In this paper, we propose a memory system for HBM2-equipped FPGAs and HPC applications that uses block RAMs as an addressable cache implemented between HBM2 and an application. This architecture enables data transfer between HBM2 and the cache bulk and allows an application to utilize fast random access on BRAMs. This study demonstrates the implementation and performance evaluation of our new memory system for HPC and HBM2 on an FPGA. Furthermore, we describe the API that can be used to control this system from the host. We implement RISC-V cores in an FPGA as controllers to realize fine-grain data transfer control and to prevent overheads derived from the PCI Express bus. The proposed system is implemented on eight memory channels and achieves 102.7 GB/s of the bandwidth. It overcomes the memory bandwidth of conventional FPGA boards with four channels of DDR4 memory despite using only 8 of 32 channels of the HBM2.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Boost.YAP Library: https://www.boost.org/doc/libs/release/doc/html/yap.html

  2. Chao, J.: Saturn: a terabit packet switch using dual round robin. IEEE Commun. Mag. 38(12), 78–84 (2000). https://doi.org/10.1109/35.888261

    Article  Google Scholar 

  3. kyu Choi, Y., Chi, Y., Qiao, W., Samardzic, N., Cong, J.: HBM connect: High-performance HLS interconnect for FPGA HBM. In: FPGA 2021 (2021)

    Google Scholar 

  4. De Matteis, T., de Fine Licht, J., Beránek, J., Hoefler, T.: Streaming message interface: High-performance distributed memory programming on reconfigurable hardware. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 201919, pp. 82:1–82:33. ACM New York (2019). https://doi.org/10.1145/3295500.3356201

  5. Fujita, N., Kobayashi, R., Yamaguchi, Y., Boku, T.: Hbm2 memory system for HPC applications on an FPGA. In: 2021 IEEE International Conference on Cluster Computing (CLUSTER), pp. 783–786 (2021). https://doi.org/10.1109/Cluster48925.2021.00116

  6. Hack, S., Grund, D., Goos, G.: Register allocation for programs in SSA-form. In: Mycroft, A., Zeller, A. (eds.) CC 2006. LNCS, vol. 3923, pp. 247–262. Springer, Heidelberg (2006). https://doi.org/10.1007/11688839_20

    Chapter  Google Scholar 

  7. Holzinger, P., Reiser, D., Hahn, T., Reichenbach, M.: Fast HBM access with FPGAS: analysis, architectures, and applications. In: 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 152–159 (2021). https://doi.org/10.1109/IPDPSW52791.2021.00030

  8. Intel: https://www.intel.co.jp/content/www/jp/ja/products/docs/programmable/agilex-m-series-memory-white-paper.html

  9. Kenter, T., et al.: OpenCL-based FPGA design to accelerate the nodal discontinuous galerkin method for unstructured meshes. In: 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 189–196, April 2018. https://doi.org/10.1109/FCCM.2018.00037

  10. Kuramochi, R., Nakahara, H.: An FPGA-based low-latency accelerator for randomly wired neural networks. In: 2020 30th International Conference on Field-Programmable Logic and Applications (FPL), pp. 298–303 (2020). https://doi.org/10.1109/FPL50879.2020.00056

  11. LibFirm: https://pp.ipd.kit.edu/firm/

  12. Meyer, M., Kenter, T., Plessl, C.: Evaluating FPGA accelerator performance with a parameterized opencl adaptation of selected benchmarks of the hpcchallenge benchmark suite. In: 2020 IEEE/ACM International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC), pp. 10–18 (2020). https://doi.org/10.1109/H2RC51942.2020.00007

  13. RISC-V International: https://riscv.org/

  14. Venkataramanaiah, S.K., et al.: FPGA-based low-batch training accelerator for modern CNNs featuring high bandwidth memory. In: 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD), pp. 1–8 (2020)

    Google Scholar 

  15. Zohouri, H.R., Podobas, A., Matsuoka, S.: Combined spatial and temporal blocking for high-performance stencil computation on FPGAS using OpenCL. In: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA 2018, pp. 153–162. Association for Computing Machinery, New York, (2018). https://doi.org/10.1145/3174243.3174248

Download references

Acknowledgments

This work was supported by JSPS KAKENHI Grant Number 21H04869. We also thank the Intel University Program for providing hardware and software.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Norihisa Fujita .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fujita, N., Kobayashi, R., Yamaguchi, Y., Boku, T. (2023). Implementation and Performance Evaluation of Memory System Using Addressable Cache for HPC Applications on HBM2 Equipped FPGAs. In: Singer, J., Elkhatib, Y., Blanco Heras, D., Diehl, P., Brown, N., Ilic, A. (eds) Euro-Par 2022: Parallel Processing Workshops. Euro-Par 2022. Lecture Notes in Computer Science, vol 13835. Springer, Cham. https://doi.org/10.1007/978-3-031-31209-0_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-31209-0_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-31208-3

  • Online ISBN: 978-3-031-31209-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics