Skip to main content
Log in

Improving the FMM performance using optimal group size on heterogeneous system architectures

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The fast multipole method (FMM) is commonly used to speed-up the time to solution of a wide diversity of N-body type problems. To use the FMM, the elements that constitute the geometry of the problem are clustered into groups of a given size (D) that may deeply vary the time to solution of the FMM. The optimal value of D, in the sense of minimizing the time cost, is unknown beforehand and it depends on several factors. Nevertheless, during the solver setup, it is possible to produce clusters of varying size D and to estimate the time to solution associated with each D. In this paper, we use octree structures to efficiently perform clustering and time cost estimation to find the optimal group size for the FMM implementations on a heterogeneous architecture. In addition, two different frameworks have been analyzed: single-level FMM and fast Fourier transform FMM (FMM-FFT). We found that the sensitivity of the time cost to the parameter D depends on factors, such as the problem size or the implementation of the FMM framework. Moreover, we observed that the time cost may be conspicuously reduced if a proper D is employed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. 4 cores at 3.6 GHz (Hyper-Threading enabled and Turbo Boost disabled).

  2. 1536 CUDA cores and 2 GB of GDDR5.

  3. http://www.fftw.org/.

  4. Consider that \(k_\mathrm{l} \propto \sqrt{N_\mathrm{g}}\), thus its drop is slower than \(N_\mathrm{g}\) (or \(Q\propto N_\mathrm{g}^{1.5}\)) rise.

References

  1. Greengard L, Rokhlin V (1987) A fast algorithm for particle simulations. J Comput Phys 73:325–348

    Article  MathSciNet  MATH  Google Scholar 

  2. Rokhlin V (1993) Diagonal Forms of translation operators for the Helmholtz equation in three dimensions. Appl Comput Harmonic Anal 1:82–93

    Article  MathSciNet  MATH  Google Scholar 

  3. Song J, Lu C-C, Chew WC (1997) Multilevel fast multipole algorithm for electromagnetic scattering by large complex objects. IEEE Trans Antennas Propag 45(10):1488–1492

    Article  Google Scholar 

  4. Waltz C, Sertel K, Carr MA, Usner BC, Volakis JL (2007) Massively parallel fast multipole method solutions of large electromagnetic scattering problems. Trans Antennas Propag 55(6):1810–1816

    Article  Google Scholar 

  5. Taboada JM, Landesa L, Obelleiro F, Rodriguez JL, Bertolo JM, Araujo MG, Mourio JC, Gomez A (2009) High scalability FMM-FFT electromagnetic solver for supercomputer systems. IEEE Antennas Propag Mag 51(6):20–28

    Article  Google Scholar 

  6. Dang V, Nguyen QM, Kilic O (2014) GPU cluster implementation of FMM-FFT for large-scale electromagnetic problems. IEEE Antennas Wirel Propag Lett 13:1259–1262

    Article  Google Scholar 

  7. Scheneider S (2003) Application of fast methods for acoustic scattering and radiation problems. J Comput Acoust 11(3):387–401

    Article  Google Scholar 

  8. López-Portugués M, López-Fernández JA, Ranilla J, Ayestarán RG, Las-Heras F (2013) Parallelization of the FMM on distributed-memory GPGPU. J Supercomput 64(1):17–27

    Article  Google Scholar 

  9. Yoshida K, Nishimura N, Kobayashi S (2001) Application of new fast multipole boundary integral equation method to crack problems in 3D. Eng Anal Bound Elem 25:239–247

    Article  MATH  Google Scholar 

  10. Coifman R, Rokhlin V, Wandzura S (1993) The fast multipole method for the wave equation: a pedestrian prescription. IEEE Antennas Propag Mag 35(3):7–12

    Article  Google Scholar 

  11. Gumerov NA , Duraiswami R, Borovikov EA (2003) Data structures, optimal choice of parameters, and complexity results for generalized multilevel fast multipole methods in \(d\) dimensions. Institute for Advanced Computer Studies

  12. López-Fernández JA, Portugués ML, Taboada JM, Rice HJ, Obelleiro F (2011) HP-FASS: a hybrid parallel fast acoustic scattering solver. Int J Comput Math 88(9):1960–1968

    Article  MATH  Google Scholar 

  13. Song J, Chew W (1995) Multilevel fast-multipole algorithm for solving combined field integral equations of electromagnetic scattering. Microwave Opt Technol Lett 10(1):14–19

    Article  Google Scholar 

  14. Wagner R, Song J, Chew W (1997) Monte Carlo simulation of electromagnetic scattering form two-dimensional random rough surfaces. IEEE Trans Antennas Propag 45(2):1810–1816

    Article  Google Scholar 

  15. Araújo MG, Taboada JM, Obelleiro F, Bértolo JM, Landesa Luis, Rivero J, Rodríguez JL (2010) Supercomputer aware approach for the solution of challenging electromagnetic problems. Prog Electromagn Res 101:241–256

    Article  Google Scholar 

  16. Burton AJ, Miller GF (1971) The application of integral equation methods to the numerical solution of some exterior boundary-value problems. Proc R Soc Lond 323(1553):201–210

    Article  MathSciNet  MATH  Google Scholar 

  17. Saad Y, Schultz MH (1986) GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J Sci Stat Comput 7:856–869

    Article  MathSciNet  MATH  Google Scholar 

  18. López-Portugués M, López-Fernández Jesús A, Menéndez-Canal Jonatan, Rodríguez-Campa Alberto, Ranilla J (2012) Acoustic scattering solver based on single level FMM for multi-GPU systems. J Parallel Distrib Comput 72(9):1057–1064

    Article  Google Scholar 

  19. The OpenMP ARB (2016) The OpenMP API specification for parallel programming. Available online at: http://openmp.org/wp/

  20. Message Passing Interface Forum (2009) MPI: A message-passing interface standard, rel. 2.2. Available online at: http://www.mpi-forum.org

Download references

Acknowledgments

This work has been supported by the “Ministerio de Economía y Competitividad” of Spain / FEDER under Projects TEC2014-54005-P and TEC2015-67387-C4-3-R; and by the “Gobierno del Principado de Asturias” / FEDER under Project FC-15-GRUPIN14-114.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to José Ranilla.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

López-Fernández, J.A., López-Portugués, M. & Ranilla, J. Improving the FMM performance using optimal group size on heterogeneous system architectures. J Supercomput 73, 291–301 (2017). https://doi.org/10.1007/s11227-016-1860-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-016-1860-2

Keywords

Navigation