Skip to main content

Efficient Implementation of Total FETI Solver for Graphic Processing Units Using Schur Complement

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9611))

Abstract

This paper presents a new approach developed for acceleration of FETI solvers by Graphic Processing Units (GPU) using the Schur complement (SC) technique. By using the SCs FETI solvers can avoid working with sparse Cholesky decomposition of the stiffness matrices. Instead a dense structure in form of SC is computed and used by conjugate gradient (CG) solver. In every iteration of CG solver a forward and backward substitution which are sequential are replaced by highly parallel General Matrix Vector Multiplication (GEMV) routine. This results in 4.1 times speedup when the Tesla K20X GPU accelerator is used and its performance is compared to a single 16-core AMD Opteron 6274 (Interlagos) CPU.

The main bottleneck of this method is computation of the Schur complements of the stiffness matrices. This bottleneck is significantly reduced by using new PARDISO-SC sparse direct solver. This paper also presents the performance evaluation of SC computations for three-dimensional elasticity stiffness matrices.

We present the performance evaluation of the proposed approach using our implementation in the ESPRESO solver package.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Farhat, C., Roux, F.-X.: An unconventional domain decomposition method for an efficient parallel solution of large-scale finite element systems. SIAM J. Sci. Stat. Comput. 13, 379–396 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  2. Dostál, Z., Horák, D., Kučera, R.: Total FETI - an easier implementable variant of the FETI method for numerical solution of elliptic PDE. Commun. Numer. Methods Eng. 22(12), 1155–1162 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  3. Brzobohatý, T., Dostál, Z., Kozubek, T., Kovář, P., Markopoulos, A.: Cholesky decomposition with fixing nodes to stable computation of a generalized inverse of the stiffness matrix of a floating structure. Int. J. Numer. Methods Eng. 88(5), 493–509 (2011). doi:10.1002/nme.3187

    Article  MathSciNet  MATH  Google Scholar 

  4. Dostál, Z., Kozubek, T., Markopoulos, A., Menšík, M.: Cholesky decomposition of a positive semidefinite matrix with known kernel. Appl. Math. Comput. 217(13), 6067–6077 (2011). doi:10.1016/j.amc.2010.12.069

    MathSciNet  MATH  Google Scholar 

  5. Kučera, R., Kozubek, T., Markopoulos, A.: On large-scale generalized inverses in solving two-by-two block linear systems. Linear Algebra Appl. 438(7), 3011–3029 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  6. Farhat, C., Mandel, J., Roux, F.-X.: Optimal convergence properties of the FETI domain decomposition method. Comput. Methods Appl. Mech. Eng. 115, 365–385 (1994)

    Article  MathSciNet  Google Scholar 

  7. Roux, F.-X., Farhat, C.: Parallel implementation of direct solution strategies for the coarse grid solvers in 2-level FETI method. Contemp. Math. 218, 158–173 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  8. Kozubek, T., Vondrák, V., Menšík, M., Horák, D., Dostál, Z., Hapla, V., Kabelikova, P., Cermak, M.: Total FETI domain decomposition method and its massively parallel implementation. Adv. Eng. Softw. 60, 14–22 (2013)

    Article  Google Scholar 

  9. Kuzmin, A., Luisier, M., Schenk, O.: Fast methods for computing selected elements of the green’s function in massively parallel nanoelectronic device simulations. In: Wolf, F., Mohr, B., an Mey, D. (eds.) Euro-Par 2013. LNCS, vol. 8097, pp. 533–544. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  10. Schenk, O., Bollhöfer, M., Römer, R.: On large-scale diagonalization techniques for the Anderson model of localization. Featured SIGEST paper in the SIAM Review selected “on the basis of its exceptional interest to the entire SIAM community”. SIAM Rev. 50, 91–112 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  11. Schenk, O., Wächter, A., Hagemann, M.: Matching-based preprocessing algorithms to the solution of saddle-point problems in large-scale nonconvex interior-point optimization. J. Comput. Optim. Appl. 36(2–3), 321–341 (2007). doi:10.1007/s10589-006-9003-y

    Article  MathSciNet  MATH  Google Scholar 

  12. Petra, C., Schenk, O., Lubin, M., Gänter, K.: An augmented incomplete factorization approach for computing the Schur complement in stochastic optimization. SIAM J. Sci. Comput. 36(2), C139–C162 (2014). doi:10.1137/130908737

    Article  MathSciNet  Google Scholar 

  13. Hogg, J.D., Scott, J.A.: A note on the solve phase of a multicore solver, SFTC Rutherford Appleton Laboratory, Technical report, Science and Technology Facilities Council, June 2010

    Google Scholar 

  14. Říha, L., Brzobohatý, T., Markopoulos, A.: Highly scalable FETI methods in ESPRESO. In: Ivnyi, P., Toppin, B.H.V. (eds.) Proceedings of the Fourth International Conference on Parallel, Distributed, Grid, Cloud Computing for Engineering, Civil-Comp Press, Stirlingshire, UK, Paper 17 (2015). doi:10.4203/ccp.107.17

Download references

Acknowledgment

This work was supported by The Ministry of Education, Youth and Sports from the National Programme of Sustainability (NPU II) project IT4Innovations excellence in science - LQ1602 and from the Large Infrastructures for Research, Experimental Development and Innovations project IT4Innovations National Supercomputing Center LM2015070; and by the EXA2CT project funded from the EUs Seventh Framework Programme (FP7/2007–2013) under grant agreement No. 610741.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lubomír Říha .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Říha, L. et al. (2016). Efficient Implementation of Total FETI Solver for Graphic Processing Units Using Schur Complement. In: Kozubek, T., Blaheta, R., Šístek, J., Rozložník, M., Čermák, M. (eds) High Performance Computing in Science and Engineering. HPCSE 2015. Lecture Notes in Computer Science(), vol 9611. Springer, Cham. https://doi.org/10.1007/978-3-319-40361-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-40361-8_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-40360-1

  • Online ISBN: 978-3-319-40361-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics