Skip to main content

Accelerating a Massively Parallel Numerical Simulation in Electromagnetism Using a Cluster of GPUs

  • Conference paper
  • First Online:
Parallel Processing and Applied Mathematics (PPAM 2013)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8384))

Abstract

We have accelerated a legacy massively parallel code solving 3D Maxwell’s equations on a hybrid cluster enhanced with GPUs. To minimize the impact on our existing code, we combine its original Full-MPI approach with task parallelism to design an efficient accelerated \(LL^t\) solver that efficiently shares the same GPUs between different processes and relies on an optimized communication patterns. On 180 nodes of the Tera100 cluster, our GPU-accelerated \(LL^t\) decomposition reaches \(80\) TFlop/s on a problem with \(247980\) unknowns, whereas the sustained machine’s CPU and GPU peaks are respectively \(13\) and \(153\) TFlop/s.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agullo, E., Augonnet, C., Dongarra, J., Faverge, M., Ltaief, H., Thibault, S., Tomov, S.: QR factorization on a multicore node enhanced with multiple GPU accelerators. In: International Parallel and Distributed Processing Symposium, pp. 932–943. IEEE (2011)

    Google Scholar 

  2. Agullo, E., Demmel, J., Dongarra, J., Hadri, B., Kurzak, J., Langou, J., Ltaief, H., Luszczek, P., Tomov, S.: Numerical linear algebra on emerging architectures: the PLASMA and MAGMA projects. J. Phys. Conf. Ser. 180, 12–37 (2009). IOP Publishing

    Article  Google Scholar 

  3. Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: Starpu: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput. Pract. Exp. 23(2), 187–198 (2011)

    Article  Google Scholar 

  4. Baker, A.H., Falgout, R.D., Kolev, T.V., Yang, U.M.: Scaling hypres multigrid solvers to 100,000 cores. High-Performance Scientific Computing, pp. 261–279. Springer, London (2012)

    Chapter  Google Scholar 

  5. Bosilca, G., Bouteiller, A., Herault, T., Lemarinier, P., Saengpatsa, N.O., Tomov, S., Dongarra, J.J.: Performance portability of a GPU enabled factorization with the DAGuE framework. In: IEEE Cluster, pp. 395–402 (2011)

    Google Scholar 

  6. Hesthaven, J.S., Warburton, T.: Nodal high-order methods on unstructured grids: I. time-domain solution of Maxwell’s equations. J. Comput. Phys. 181(1), 186–221 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  7. Humphrey, J.R., Price, D.K., Spagnoli, K.E., Paolini, A.L., Kelmelis, E.J.: CULA: hybrid GPU accelerated linear algebra routines. In: SPIE Defense, Security, and Sensing, pp. 770502–770502. International Society for Optics and Photonics (2010)

    Google Scholar 

  8. Igual, F.D., Chan, E., Quintana-Ortí, E.S., Quintana-Ortí, G., Van De Geijn, R.A., Van Zee, F.G.: The flame approach: from dense linear algebra algorithms to high-performance multi-accelerator implementations. J. Parallel Distrib. Comput. 72(9), 1134–1143 (2012)

    Article  Google Scholar 

  9. Ospici, M., Komatitsch, D., Mehaut, J.F., Deutsch, T., et al.: SGPU 2: a runtime system for using of large applications on clusters of hybrid nodes. In: Second Workshop on Hybrid Multi-Core Computing, held in Conjunction with HiPC (2011)

    Google Scholar 

  10. Song, J., Lu, C.C., Chew, W.C.: Multilevel fast multipole algorithm for electromagnetic scattering by large complex objects. IEEE Trans. Antennas Propag. 45(10), 1488–1493 (1997)

    Article  Google Scholar 

  11. Zhao, K., Vouvakis, M.N., Lee, J.F.: The adaptive cross approximation algorithm for accelerated method of moments computations of EMC problems. IEEE Trans. Electromagn. Compat. 47(4), 763–773 (2005)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cédric Augonnet .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Augonnet, C., Goudin, D., Pujols, A., Sesques, M. (2014). Accelerating a Massively Parallel Numerical Simulation in Electromagnetism Using a Cluster of GPUs. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2013. Lecture Notes in Computer Science(), vol 8384. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-55224-3_55

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-55224-3_55

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-55223-6

  • Online ISBN: 978-3-642-55224-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics