ABSTRACT
We present the first FPGA implementation of the full simulation pipeline of a shallow water code based on the discontinuous Galerkin method. Using OpenCL and following an algorithm-hardware codesign approach, the software reference is transformed into a dataflow architecture that can process a full mesh element per clock cycle. The novel projection approach on the algorithmic level complements the pipeline and memory optimizations in the hardware design. With this, the FPGA kernels for different polynomial orders outperform the CPU reference by 43x -- 144x in a strong scaling benchmark scenario. A performance model can explain the measured FPGA performance of up to 717 GFLOPs accurately.
- V. Aizinger and C. Dawson. 2002. A discontinuous Galerkin method for two-dimensional flow and transport in shallow water. Advances in Water Resources 25, 1 (2002), 67--84. Google ScholarCross Ref
- V. Aizinger, J. Proft, C. Dawson, D. Pothina, and S. Negusse. 2013. A three-dimensional discontinuous Galerkin model applied to the baroclinic simulation of Corpus Christi Bay. Ocean Dynamics 63, 1 (2013), 89--113. Google ScholarCross Ref
- S. Chippada, C.N. Dawson, M.L. Martinez, and M.F. Wheeler. 1998. A Godunov-type finite volume method for the system of Shallow water equations. Computer Methods in Applied Mechanics and Engineering 151, 1 (1998), 105 -- 129. Google ScholarCross Ref
- B. Cockburn and C.-W. Shu. 1989. TVB Runge-Kutta local projection discontinuous Galerkin finite element method for conservation laws. II. General framework. Math. Comp. 52 (1989), 411--435. Google ScholarCross Ref
- C. Dawson and V. Aizinger. 2005. A discontinuous Galerkin method for three-dimensional shallow water equations. Journal of Scientific Computing 22, 1-3 (2005), 245--267. Google ScholarCross Ref
- S. Faghih-Naini, S. Kuckuk, V. Aizinger, D. Zint, R. Grosso, and H. Köstler. 2020. Quadrature-free discontinuous Galerkin method with code generation features for shallow water equations on automatically generated block-structured meshes. Advances in Water Resources 138 (2020), 103552. Google ScholarCross Ref
- P. Gorlani, T. Kenter, and C. Plessl. 2019. OpenCL Implementation of Cannon's Matrix Multiplication Algorithm on Intel Stratix 10 FPGAs. In 2019 International Conference on Field-Programmable Technology (ICFPT). 99--107. Google ScholarCross Ref
- H. Hajduk, B. R. Hodges, V. Aizinger, and B. Reuter. 2018. Locally Filtered Transport for computational efficiency in multi-component advection-reaction models. Environmental Modelling & Software 102 (2018), 185--198. Google ScholarDigital Library
- H. Hajduk, D. Kuzmin, and V. Aizinger. 2020. Bathymetry Reconstruction Using Inverse Shallow Water Models: Finite Element Discretization and Regularization. In Numerical Methods for Flows: FEF 2017 Selected Contributions, H. van Brummelen, A. Corsini, S. Perotto, and G. Rozza (Eds.). Springer International Publishing, Cham, 223--230. Google ScholarCross Ref
- M. Hauck, V. Aizinger, F. Frank, H. Hajduk, and A. Rupp. 2020. Enriched Galerkin method for the shallow-water equations. GEM : International Journal on Geomathematics 11, 1 (2020). Google ScholarCross Ref
- Intel. 2020. Intel FPGA SDK for OpenCL Pro Edition Best Practices Guide (UGOCL003, Version 20.3). https://www.intel.com/content/dam/altera-www/global/en_US/pdfs/literature/hb/opencl-sdk/aocl-best-practices-guide.pdf.Google Scholar
- Intel. 2020. Intel FPGA SDK for OpenCL Pro Edition Programming Guide (UG-OCL002, Version 20.3). https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/opencl-sdk/aocl_programming_guide.pdf.Google Scholar
- A. K. Jain, H. Omidian, H. Fraisse, M. Benipal, L. Liu, and D. Gaitonde. 2020. A Domain-Specific Architecture for Accelerating Sparse Matrix Vector Multiplication on FPGAs. In Proc. Int. Conf. on Field Programmable Logic and Applications (FPL). 127--132. Google ScholarCross Ref
- T. Kenter, J. Förstner, and C. Plessl. 2017. Flexible FPGA design for FDTD using OpenCL. In Proc. Int. Conf. on Field Programmable Logic and Applications (FPL). IEEE, 1--7. Google ScholarCross Ref
- T. Kenter, G. Mahale, S. Alhaddad, Y. Grynko, C. Schmitt, A. Afzal, F. Hannig, J. Förstner, and C. Plessl. 2018. OpenCL-based FPGA Design to Accelerate the Nodal Discontinuous Galerkin Method for Unstructured Meshes. In Proc. IEEE Symp. on Field-Programmable Custom Computing Machines (FCCM). Google ScholarCross Ref
- T. De Matteis, J. de Fine Licht, and T. Hoefler. 2019. FBLAS: Streaming Linear Algebra on FPGA. CoRR abs/1907.07929 (2019).Google Scholar
- A. Modave, A. St-Cyr, and T. Warburton. 2016. GPU performance analysis of a nodal discontinuous Galerkin method for acoustic and elastic models. Computers & Geosciences 91 (2016), 64 -- 76. Google ScholarDigital Library
- B. Reuter, V. Aizinger, and H. Köstler. 2015. A multi-platform scaling study for an OpenMP parallelization of a discontinuous Galerkin ocean model. Computers and Fluids 117 (2015), 325 -- 335. Google ScholarCross Ref
- B. Reuter, H. Hajduk, A. Rupp, F. Frank, V. Aizinger, and P. Knabner. 2020. FESTUNG 1.0: Overview, usage, and example applications of the MATLAB/GNU Octave toolbox for discontinuous Galerkin methods. Computers & Mathematics with Applications (2020). Google ScholarCross Ref
- K. Sano, Y. Hatsuda, and S. Yamamoto. 2014. Multi-FPGA Accelerator for Scalable Stencil Computation with Constant Memory Bandwidth. IEEE Transactions on Parallel and Distributed Systems (TPDS) 25, 3 (March 2014), 695--705.Google ScholarDigital Library
- M. B. Sharif, S. K. Ghafoor, T. M. Hines, M. Morales-Hernändez, K. J. Evans, S.-C. Kao, A.J. Kalyanapu, T. T. Dullo, and S. Gangrade. 2020. Performance Evaluation of a Two-Dimensional Flood Model on Heterogeneous High-Performance Computing Architectures. In Proc. Platform for Advanced Scientific Computing Conf. (PASC). ACM, Article 8, 9 pages. Google ScholarDigital Library
- L. C. Stewart, C. Pasoe, B. W. Sherman, M. Herbordt, and V. Sachdeva. 2020. An OpenCL 3D FFT for Molecular Dynamics Simulations on Multiple FPGAs. arXiv preprint arXiv:2009.12617 (2020).Google Scholar
- J. J. Westerink, K. D. Stolzenbach, and J. J. Connor. 1989. General Spectral Computations of the Nonlinear Shallow Water Tidal Interactions within the Bight of Abaco. Journal of Physical Oceanography 19, 9 (09 1989), 1348--1371. <1348:GSCOTN>2.0.CO;2 Google ScholarCross Ref
- C. Yang, T. Geng, T. Wang, R. Patel, Q. Xiong, A. Sanaullah, C. Wu, J. Sheng, C. Lin, V. Sachdeva, W. Sherman, and M. Herbordt. 2019. Fully integrated FPGA molecular dynamics simulations. In Proc. Int. Conf. on High Performance Computing, Networking, Storage and Analysis (SC). 1--31. Google ScholarDigital Library
- H. R. Zohouri, A. Podobas, and S. Matsuoka. 2018. Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs Using OpenCL. In Proc. Int. Symp. on Field-Programmable Gate Arrays (FPGA). ACM, 153--162.Google Scholar
Index Terms
- Algorithm-hardware co-design of a discontinuous Galerkin shallow-water model for a dataflow architecture on FPGA
Recommendations
Scalable Multi-FPGA Design of a Discontinuous Galerkin Shallow-Water Model on Unstructured Meshes
PASC '23: Proceedings of the Platform for Advanced Scientific Computing ConferenceFPGAs are fostering interest as energy-efficient accelerators for scientific simulations, including for methods operating on unstructured meshes. Considering the potential impact on high-performance computing, specific attention needs to be given to ...
Shallow Water DG Simulations on FPGAs: Design and Comparison of a Novel Code Generation Pipeline
High Performance ComputingAbstractFPGAs are receiving increased attention as a promising architecture for accelerators in HPC systems. Evolving and maturing development tools based on high-level synthesis promise productivity improvements for this technology. However, up to now, ...
Hardware accelerated FPGA placement
A key advantage of field-programmable gate arrays (FPGAs) over full-custom and semi-custom devices is that they provide relatively quick implementation from concept to physical realization. However, as modern FPGAs reach close to one million logic ...
Comments