Abstract
Today’s High-Performance Computing (HPC) systems often use GPUs as dedicated hardware accelerators to meet the computation requirements of applications such as neural networks, genetic decoding, and hydrodynamic simulations. Meanwhile, FPGAs have also been considered as alternative suitable hardware accelerators due to their advancing computational capabilities and low power consumption. Moreover, the developments of High-Level Synthesis (HLS) allow users to generate FPGA designs directly from mainstream languages, e.g., C, C++, and OpenCL. However, writing efficient high-level programs with good performance is still a time-consuming task, and the lack of knowledge about FPGA architecture can lead to poor scalability and portability. In this paper, we propose an architecture design for Computational Fluid Dynamics (CFD) simulations based on the HLS method. Our design can adjust the performance by utilizing the parallelism inside both temporal and spatial domains of CFD simulations. We also discuss the data reuse buffer optimization choices while considering the potability of HLS codes. A performance model is introduced to guide the design space exploration under the constraints of available resources on FPGA. We evaluate our design via a Xilinx VCU1525 FPGA board and compare the results with other state-of-the-art studies. Experiment results show that VCU1525 can achieve 629.6 GFLOP/s in D2Q9 LBM-BGK model and the design and optimization methods can be used for developing various CFD applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Valero-Lara, P., Pinelli, A., Prieto-Matias, M.: Fast finite difference poisson solvers on heterogeneous architectures. Comput. Phys. Commun. 185(4), 1265–1272 (2014)
Feichtinger, C., et al.: Performance modeling and analysis of heterogeneous lattice Boltzmann simulations on CPUGPU clusters. Parallel Comput. 46, 1–13 (2015)
Sano, K., Hatsuda, Y., Yamamoto, S.: Multi-FPGA accelerator for scalable stencil computation with constant memory-bandwidth. IEEE Trans. Parallel Distrib. Syst. 25(3), 695–705 (2014)
Lewis, D., et al.: The stratix 10 highly pipelined FPGA architecture. In: International Symposium on Field-Programmable Gate Arrays (FPGA), pp. 159–168. ACM (2016)
Cong, J., Liu, B., Neuendorffer, S., et al.: High-level synthesis for FPGAs: from prototyping to deployment. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 30(4), 473–491 (2011)
Canis, A., et al.: LegUp: high-level synthesis for FPGA-based processor/accelerator systems. In: The 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA), pp. 33–36. ACM (2011)
Chen, S., Doolen, G.D.: Lattice Boltzmann method for fluid flows. Annu. Rev. Fluid Mech. 30(1), 329–364 (1998)
Amati, G., Succi, S., et al.: Massively parallel lattice-Boltzmann simulation of turbulent channel flow. Int. J. Mod. Phys. C 8(4), 869–877 (1997)
Pohl, T., et al.: Performance evaluation of parallel large-scale lattice Boltzmann applications on three supercomputing architectures. In: The 2004 ACM/IEEE Conference on Supercomputing (SC), p. 21. IEEE (2004)
Pan, C., Luo, L.-S., et al.: An evaluation of lattice Boltzmann schemes for porous medium flow simulation. Comput. Fluids 35(8), 898–909 (2006)
Obrecht, C., Kuznik, F., Tourancheau, B., Roux, J.-J.: Global memory access modelling for efficient implementation of the lattice Boltzmann method on graphics processing units. In: Palma, J.M.L.M., Daydé, M., Marques, O., Lopes, J.C. (eds.) VECPAR 2010. LNCS, vol. 6449, pp. 151–161. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19328-6_16
Delbosc, N., et al.: Optimized implementation of the Lattice Boltzmann Method on a graphics processing unit towards real-time fluid simulation. Comput. Math. Appl. 67(2), 462–475 (2014)
Wang, Z., et al.: GPU acceleration of volumetric lattice Boltzmann method for patient-specific computational hemodynamics. Comput. Fluids 1(15), 192–200 (2015)
Murtaza, S., Hoekstra, A.G., Sloot, P.M.A.: Cellular automata simulations on a FPGA cluster. Int. J. High Perform. Comput. Appl. 25(2), 193–204 (2011)
Sano, K., Yamamoto, S.: FPGA-based scalable and power-efficient fluid simulation using floating-point DSP blocks. IEEE Trans. Parallel Distrib. Syst. 28(10), 2823–2837 (2017)
Waidyasooriya, H.M., et al.: OpenCL-based FPGA-platform for stencil computation and its optimization methodology. IEEE Trans. Parallel Distrib. Syst. 28(5), 1390–1402 (2017)
Zohouri, H.R., Podobas, A., Matsuoka, S.: Combined spatial and temporal blocking for high-performance stencil computation on FPGAs using OpenCL. In: The 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 153–162. ACM (2018)
RAM-Based Shift Register (ALTSHIFT\(\_\)TAPS) IP Core. https://www.intel.com/content/.../ug_shift_register_ram_based.pdf
The Xilinx LogiCORE IP RAM-based Shift Register. https://www.xilinx.com/support/.../shift_ram/v12_0/pg122-c-shift-ram.pdf
Wittmann, M., et al.: Comparison of different propagation steps for lattice Boltzmann methods. Comput. Math. Appl. 65(6), 924–935 (2013)
Tomczak, T., Szafran, R.G.: Sparse geometries handling in lattice Boltzmann method implementation for graphic processors. IEEE Trans. Parallel Distrib. Syst. 29(8), 1865–1878 (2018)
Acknowledgments
This work was supported in part by MEXT as Next Generation High-Performance Computing Infrastructures and Applications R&D Program (Development of Computing-Communication Unified Supercomputer in Next Generation), and by JSPS KAKENHI Grant Number JP17H01707 and JP18H03246. The authors would also like to thank Xilinx Inc., for providing FPGA software tools by Xilinx University Program.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Du, C., Firmansyah, I., Yamaguchi, Y. (2020). FPGA-Based Computational Fluid Dynamics Simulation Architecture via High-Level Synthesis Design Method. In: Rincón, F., Barba, J., So, H., Diniz, P., Caba, J. (eds) Applied Reconfigurable Computing. Architectures, Tools, and Applications. ARC 2020. Lecture Notes in Computer Science(), vol 12083. Springer, Cham. https://doi.org/10.1007/978-3-030-44534-8_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-44534-8_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-44533-1
Online ISBN: 978-3-030-44534-8
eBook Packages: Computer ScienceComputer Science (R0)