Exploiting Batch Processing on Streaming Architectures to Solve 2D Elliptic Finite Element Problems: A Hybridized Discontinuous Galerkin (HDG) Case Study

King, James; Yakovlev, Sergey; Fu, Zhisong; Kirby, Robert M.; Sherwin, Spencer J.

doi:10.1007/s10915-013-9805-x

Exploiting Batch Processing on Streaming Architectures to Solve 2D Elliptic Finite Element Problems: A Hybridized Discontinuous Galerkin (HDG) Case Study

Published: 26 November 2013

Volume 60, pages 457–482, (2014)
Cite this article

Journal of Scientific Computing Aims and scope Submit manuscript

James King¹,
Sergey Yakovlev²,
Zhisong Fu¹,
Robert M. Kirby¹ &
…
Spencer J. Sherwin³

303 Accesses
6 Citations
Explore all metrics

Abstract

Numerical methods for elliptic partial differential equations (PDEs) within both continuous and hybridized discontinuous Galerkin (HDG) frameworks share the same general structure: local (elemental) matrix generation followed by a global linear system assembly and solve. The lack of inter-element communication and easily parallelizable nature of the local matrix generation stage coupled with the parallelization techniques developed for the linear system solvers make a numerical scheme for elliptic PDEs a good candidate for implementation on streaming architectures such as modern graphical processing units (GPUs). We propose an algorithmic pipeline for mapping an elliptic finite element method to the GPU and perform a case study for a particular method within the HDG framework. This study provides comparison between CPU and GPU implementations of the method as well as highlights certain performance-crucial implementation details. The choice of the HDG method for the case study was dictated by the computationally-heavy local matrix generation stage as well as the reduced trace-based communication pattern, which together make the method amenable to the fine-grained parallelism of GPUs. We demonstrate that the HDG method is well-suited for GPU implementation, obtaining total speedups on the order of 30–35 times over a serial CPU implementation for moderately sized problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parallel Finite Cell Method with Adaptive Geometric Multigrid

Task-Based Conjugate Gradient: From Multi-GPU Towards Heterogeneous Architectures

Performance and Scalability Improvements for Discontinuous Galerkin Solutions to Conservation Laws on Unstructured Grids

Article 01 August 2016

S. R. Brus, D. Wirasaet, … C. Dawson

References

Buck, I.: GPU computing: programming a massively parallel processor. In: Proceedings of the International Symposium on Code Generation and Optimization, CGO ’07, p. 17. IEEE Computer Society, Washington, DC, USA (2007)
Bell, N., Yu, Y., Mucha, P.J.: Particle-based simulation of granular materials. In: Proceedings of the 2005 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, SCA ’05, pp. 77–86. ACM, New York, NY, USA (2005)
Owens, J.D., Houston, M., Luebke, D., Green, S., Stone, J.E., Phillips, J.C.: GPU computing. Proceedings of the IEEE 96(5), 879–899 (2008)
Article Google Scholar
Hesthaven, J.S., Warburton, T.: Nodal Discontinuous Galerkin Methods: Algorithms, Analysis and Applications. Springer, New York (2008)
Book Google Scholar
Ali, A., Syed, K.S., Ishaq, M., Hassan, A., Luo, Hong.: A communication-efficient, distributed memory parallel code using discontinuous Galerkin method for compressible flows. In: Emerging Technologies (ICET), 2010 6th International Conference on, pp. 331–336, oct 2010
Eskilsson, C., El-Khamra, Y., Rideout, D., Allen, G., Jim Chen Q., Tyagi, M.: A parallel High-Order Discontinuous Galerkin Shallow Water Model. In: Proceedings of the 9th International Conference on Computational Science: Part I, ICCS ’09, pp. 63–72. Springer-Verlag, Berlin, Heidelberg (2009)
Goedel, N., Schomann, S., Warburton, T., Clemens, M.: GPU accelerated Adams-Bashforth multirate discontinuous Galerkin FEM simulation of high-frequency electromagnetic fields. IEEE Trans. Magn. 46(8), 2735–2738 (2010)
Article Google Scholar
Goedel, N., Warburton, T., Clemens, M.: GPU accelerated Discontinuous Galerkin FEM for electromagnetic radio frequency problems. In: Antennas and Propagation Society International Symposium, 2009. APSURSI ’09. IEEE, pp. 1–4, June 2009
Klöckner, A., Warburton, T., Hesthaven, J.S.: High-Order Discontinuous Galerkin Methods by GPU Metaprogramming. In: GPU Solutions to Multi-scale Problems in Science and Engineering, pp. 353–374. Springer (2013)
Cockburn, B., Karniadakis, G.E., Shu, C.-W. (eds.): The Development of Discontinuous Galerkin Methods. In: Discontinuous Galerkin Methods: Theory, Computation and Applications, pp. 135–146. Springer-Verlag, Berlin (2000)
Cockburn, B., Gopalakrishnan, J., Lazarov, R.: Unified hybridization of discontinuous Galerkin mixed and continuous Galerkin methods for second order elliptic problems. SIAM J. Numer. Anal. 47, 1319–1365 (2009)
Article MATH MathSciNet Google Scholar
Klöckner, A., Warburton, T., Bridge, J., Hesthaven, J.S.: Nodal discontinuous Galerkin methods on graphics processors. J. Comput. Phys 228, 7863–7882 (2009)
Article MATH MathSciNet Google Scholar
Lanteri, S., Perrussel, R.: An implicit hybridized discontinuous Galerkin method for time-domain Maxwell’s equations. Rapport de recherche RR-7578, INRIA, March (2011)
NVIDIA Corporation. CUDA Programming Guide 4.2, April 2012
AMD Corporation. AMD Accelerated Parallel Processing Math Libraries, Jan 2011
ATI. AMD Accelerated Parallel Processing OpenGL Programming Guide, Jan 2011
Volkov, V., Demmel, J.W.: Benchmarking GPUs to tune dense linear algebra. In: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, SC ’08, pp. 31:1–31:11. IEEE Press, Piscataway, NJ, USA (2008)
Agullo, E., Augonnet, C., Dongarra, J., Ltaief, H., Namyst, R., Thibault, S., Tomov, S.: A Hybridization Methodology for High-Performance Linear Algebra Software for GPUs. In: GPU Computing Gems, Jade Edition 2, 473–484 (2011)
Song, F., Tomov, S., Dongarra, J.: Efficient Support for Matrix Computations on Heterogeneous Multi-core and Multi-GPU Architectures. University of Tennessee, Computer Science Technical, Report UT-CS-11-668 (2011)
Karniadakis, G.E., Sherwin, S.J.: Spectral/HP Element Methods for CFD, 2nd edn. Oxford University Press, UK (2005)
Book Google Scholar
Sherwin, S.J., Karniadakis, G.E.: A triangular spectral element method. Applications to the incompressible Navier–Stokes equations. Comput. Methods Appl. Mech. Eng. 123, 189–229 (1995)
Article MATH MathSciNet Google Scholar
Cockburn, B., Dong, B., Guzmán, J.: A superconvergent LDG-Hybridizable Galerkin method for second-order elliptic problems. Math. Comput. 77(264), 1887–1916 (2007)
Article Google Scholar
Cockburn, B., Gopalakrishnan, J., Sayas, F.-J.: A projection-based error analysis of HDG methods. Math. Comput. 79, 1351–1367 (2010)
Article MATH MathSciNet Google Scholar
Cockburn, B., Guzmán, J., Wang, H.: Superconvergent discontinuous Galerkin methods for second-order elliptic problems. Math. Comput. 78, 1–24 (2009)
Article MATH Google Scholar
Arnold, D.N., Brezzi, F., Cockburn, B., Marini, D.: Unified analysis of discontinuous Galerkin methods for elliptic problems. SIAM J. Numer. Anal. 39, 1749–1779 (2002)
Article MATH MathSciNet Google Scholar
Kirby, Robert M., Sherwin, Spencer J., Cockburn, Bernardo: To CG or to HDG: a comparative study. J. Sci. Comput. 51(1), 183–212 (Apr 2012)
Dubiner, M.: Spectral methods on triangles and other domains. J. Sci. Comput. 6, 345–390 (1991)
Article MATH MathSciNet Google Scholar
Bell, N., Garland, M.: Efficient Sparse Matrix-Vector Multiplication on CUDA. NVIDIA Technical Report NVR-2008-004, NVIDIA Corporation, Dec 2008
Vos P.E.J.: From h to p efficiently : optimising the implementation of spectral / hp element methods. PhD thesis, University of London, 2011
Göddeke, Dominik, Strzodka, Robert, Mohd-Yusof, Jamaludin, McCormick, Patrick S., Wobker, Hilmar, Becker, Christian, Turek, Stefan: Using GPUs to improve multigrid solver performance on a cluster. Int. J. Comput. Sci. Eng. 4(1), 36–55 (2008)
Google Scholar
Göddeke, Dominik, Wobker, Hilmar, Strzodka, Robert, Mohd-Yusof, Jamaludin, McCormick, Patrick S., Turek, Stefan: Co-processor acceleration of an unmodified parallel solid mechanics code with FEASTGPU. Int. J. Comput. Sci. Eng. 4(4), 254–269 (2009)
Article Google Scholar
Kirby, R.M., Sherwin, S.J.: Nektar++ finite element library. http://www.nektar.info/
Bell, N., Garland, M.: Cusp: Generic Parallel Algorithms for Sparse Matrix and Graph Computations, 2012. Version 0.3.0
Hoberock, J., Bell, N.: Thrust: A Parallel Template Library, 2010. Version 1.7.0
Ha, L.K., King, J., Fu, Z., Kirby, R.M.: A High-Performance Multi-Element Processing Framework on GPUs. SCI Technical Report UUSCI-2013-005, SCI Institute, University of Utah (2013)
Roca, X., Nguyen N.C., Peraire, J.: GPU-accelerated sparse matrix-vector product for a hybridizable discontinuous Galerkin method. Aerospace Sciences Meetings. American Institute of Aeronautics and Astronautics, Jan 2011. doi:10.2514/6.2011-687

Download references

Acknowledgments

We would like to thank Professor B. Cockburn (U. Minnesota) for the helpful discussions on this topic. This work was supposed by the Department of Energy (DOE NETL DE-EE0004449) and under NSF OCI-1148291.

Author information

Authors and Affiliations

School of Computing and Scientific Computing and Imaging (SCI) Institute, University of Utah, Salt Lake City, UT, USA
James King, Zhisong Fu & Robert M. Kirby
Scientific Computing and Imaging (SCI) Institute, University of Utah, Salt Lake City, UT, USA
Sergey Yakovlev
Department of Aeronautics, Imperial College London, London, UK
Spencer J. Sherwin

Authors

James King
View author publications
You can also search for this author in PubMed Google Scholar
Sergey Yakovlev
View author publications
You can also search for this author in PubMed Google Scholar
Zhisong Fu
View author publications
You can also search for this author in PubMed Google Scholar
Robert M. Kirby
View author publications
You can also search for this author in PubMed Google Scholar
Spencer J. Sherwin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robert M. Kirby.

Rights and permissions

Reprints and permissions

About this article

Cite this article

King, J., Yakovlev, S., Fu, Z. et al. Exploiting Batch Processing on Streaming Architectures to Solve 2D Elliptic Finite Element Problems: A Hybridized Discontinuous Galerkin (HDG) Case Study. J Sci Comput 60, 457–482 (2014). https://doi.org/10.1007/s10915-013-9805-x

Download citation

Received: 14 March 2013
Revised: 09 September 2013
Accepted: 08 November 2013
Published: 26 November 2013
Issue Date: August 2014
DOI: https://doi.org/10.1007/s10915-013-9805-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploiting Batch Processing on Streaming Architectures to Solve 2D Elliptic Finite Element Problems: A Hybridized Discontinuous Galerkin (HDG) Case Study

Abstract

Access this article

Similar content being viewed by others

Parallel Finite Cell Method with Adaptive Geometric Multigrid

Task-Based Conjugate Gradient: From Multi-GPU Towards Heterogeneous Architectures

Performance and Scalability Improvements for Discontinuous Galerkin Solutions to Conservation Laws on Unstructured Grids

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Exploiting Batch Processing on Streaming Architectures to Solve 2D Elliptic Finite Element Problems: A Hybridized Discontinuous Galerkin (HDG) Case Study

Abstract

Access this article

Similar content being viewed by others

Parallel Finite Cell Method with Adaptive Geometric Multigrid

Task-Based Conjugate Gradient: From Multi-GPU Towards Heterogeneous Architectures

Performance and Scalability Improvements for Discontinuous Galerkin Solutions to Conservation Laws on Unstructured Grids

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation