Improving the FMM performance using optimal group size on heterogeneous system architectures

López-Fernández, J. A.; López-Portugués, M.; Ranilla, José

doi:10.1007/s11227-016-1860-2

Improving the FMM performance using optimal group size on heterogeneous system architectures

Published: 08 September 2016

Volume 73, pages 291–301, (2017)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

J. A. López-Fernández¹,
M. López-Portugués¹ &
José Ranilla ORCID: orcid.org/0000-0003-2941-3741²

239 Accesses
Explore all metrics

Abstract

The fast multipole method (FMM) is commonly used to speed-up the time to solution of a wide diversity of N-body type problems. To use the FMM, the elements that constitute the geometry of the problem are clustered into groups of a given size (D) that may deeply vary the time to solution of the FMM. The optimal value of D, in the sense of minimizing the time cost, is unknown beforehand and it depends on several factors. Nevertheless, during the solver setup, it is possible to produce clusters of varying size D and to estimate the time to solution associated with each D. In this paper, we use octree structures to efficiently perform clustering and time cost estimation to find the optimal group size for the FMM implementations on a heterogeneous architecture. In addition, two different frameworks have been analyzed: single-level FMM and fast Fourier transform FMM (FMM-FFT). We found that the sensitivity of the time cost to the parameter D depends on factors, such as the problem size or the implementation of the FMM framework. Moreover, we observed that the time cost may be conspicuously reduced if a proper D is employed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mesh Partitioning and Efficient Equation Solving Techniques by Distributed Finite Element Methods: A Survey

Article 11 May 2017

Accelerating an FMM-Based Coulomb Solver with GPUs

Portable Node-Level Performance Optimization for the Fast Multipole Method

Notes

4 cores at 3.6 GHz (Hyper-Threading enabled and Turbo Boost disabled).
1536 CUDA cores and 2 GB of GDDR5.
http://www.fftw.org/.
Consider that \(k_\mathrm{l} \propto \sqrt{N_\mathrm{g}}\), thus its drop is slower than \(N_\mathrm{g}\) (or \(Q\propto N_\mathrm{g}^{1.5}\)) rise.

References

Greengard L, Rokhlin V (1987) A fast algorithm for particle simulations. J Comput Phys 73:325–348
Article MathSciNet MATH Google Scholar
Rokhlin V (1993) Diagonal Forms of translation operators for the Helmholtz equation in three dimensions. Appl Comput Harmonic Anal 1:82–93
Article MathSciNet MATH Google Scholar
Song J, Lu C-C, Chew WC (1997) Multilevel fast multipole algorithm for electromagnetic scattering by large complex objects. IEEE Trans Antennas Propag 45(10):1488–1492
Article Google Scholar
Waltz C, Sertel K, Carr MA, Usner BC, Volakis JL (2007) Massively parallel fast multipole method solutions of large electromagnetic scattering problems. Trans Antennas Propag 55(6):1810–1816
Article Google Scholar
Taboada JM, Landesa L, Obelleiro F, Rodriguez JL, Bertolo JM, Araujo MG, Mourio JC, Gomez A (2009) High scalability FMM-FFT electromagnetic solver for supercomputer systems. IEEE Antennas Propag Mag 51(6):20–28
Article Google Scholar
Dang V, Nguyen QM, Kilic O (2014) GPU cluster implementation of FMM-FFT for large-scale electromagnetic problems. IEEE Antennas Wirel Propag Lett 13:1259–1262
Article Google Scholar
Scheneider S (2003) Application of fast methods for acoustic scattering and radiation problems. J Comput Acoust 11(3):387–401
Article Google Scholar
López-Portugués M, López-Fernández JA, Ranilla J, Ayestarán RG, Las-Heras F (2013) Parallelization of the FMM on distributed-memory GPGPU. J Supercomput 64(1):17–27
Article Google Scholar
Yoshida K, Nishimura N, Kobayashi S (2001) Application of new fast multipole boundary integral equation method to crack problems in 3D. Eng Anal Bound Elem 25:239–247
Article MATH Google Scholar
Coifman R, Rokhlin V, Wandzura S (1993) The fast multipole method for the wave equation: a pedestrian prescription. IEEE Antennas Propag Mag 35(3):7–12
Article Google Scholar
Gumerov NA , Duraiswami R, Borovikov EA (2003) Data structures, optimal choice of parameters, and complexity results for generalized multilevel fast multipole methods in \(d\) dimensions. Institute for Advanced Computer Studies
López-Fernández JA, Portugués ML, Taboada JM, Rice HJ, Obelleiro F (2011) HP-FASS: a hybrid parallel fast acoustic scattering solver. Int J Comput Math 88(9):1960–1968
Article MATH Google Scholar
Song J, Chew W (1995) Multilevel fast-multipole algorithm for solving combined field integral equations of electromagnetic scattering. Microwave Opt Technol Lett 10(1):14–19
Article Google Scholar
Wagner R, Song J, Chew W (1997) Monte Carlo simulation of electromagnetic scattering form two-dimensional random rough surfaces. IEEE Trans Antennas Propag 45(2):1810–1816
Article Google Scholar
Araújo MG, Taboada JM, Obelleiro F, Bértolo JM, Landesa Luis, Rivero J, Rodríguez JL (2010) Supercomputer aware approach for the solution of challenging electromagnetic problems. Prog Electromagn Res 101:241–256
Article Google Scholar
Burton AJ, Miller GF (1971) The application of integral equation methods to the numerical solution of some exterior boundary-value problems. Proc R Soc Lond 323(1553):201–210
Article MathSciNet MATH Google Scholar
Saad Y, Schultz MH (1986) GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J Sci Stat Comput 7:856–869
Article MathSciNet MATH Google Scholar
López-Portugués M, López-Fernández Jesús A, Menéndez-Canal Jonatan, Rodríguez-Campa Alberto, Ranilla J (2012) Acoustic scattering solver based on single level FMM for multi-GPU systems. J Parallel Distrib Comput 72(9):1057–1064
Article Google Scholar
The OpenMP ARB (2016) The OpenMP API specification for parallel programming. Available online at: http://openmp.org/wp/
Message Passing Interface Forum (2009) MPI: A message-passing interface standard, rel. 2.2. Available online at: http://www.mpi-forum.org

Download references

Acknowledgments

This work has been supported by the “Ministerio de Economía y Competitividad” of Spain / FEDER under Projects TEC2014-54005-P and TEC2015-67387-C4-3-R; and by the “Gobierno del Principado de Asturias” / FEDER under Project FC-15-GRUPIN14-114.

Author information

Authors and Affiliations

Departamento de Ingeniería Eléctrica, Electrónica, de Computadores y Sistemas, Universidad de Oviedo, Gijón, Spain
J. A. López-Fernández & M. López-Portugués
Departamento de Informática, Universidad de Oviedo, Gijón, Spain
José Ranilla

Authors

J. A. López-Fernández
View author publications
You can also search for this author in PubMed Google Scholar
M. López-Portugués
View author publications
You can also search for this author in PubMed Google Scholar
José Ranilla
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to José Ranilla.

Rights and permissions

Reprints and permissions

About this article

Cite this article

López-Fernández, J.A., López-Portugués, M. & Ranilla, J. Improving the FMM performance using optimal group size on heterogeneous system architectures. J Supercomput 73, 291–301 (2017). https://doi.org/10.1007/s11227-016-1860-2

Download citation

Published: 08 September 2016
Issue Date: January 2017
DOI: https://doi.org/10.1007/s11227-016-1860-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving the FMM performance using optimal group size on heterogeneous system architectures

Abstract

Access this article

Similar content being viewed by others

Mesh Partitioning and Efficient Equation Solving Techniques by Distributed Finite Element Methods: A Survey

Accelerating an FMM-Based Coulomb Solver with GPUs

Portable Node-Level Performance Optimization for the Fast Multipole Method

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving the FMM performance using optimal group size on heterogeneous system architectures

Abstract

Access this article

Similar content being viewed by others

Mesh Partitioning and Efficient Equation Solving Techniques by Distributed Finite Element Methods: A Survey

Accelerating an FMM-Based Coulomb Solver with GPUs

Portable Node-Level Performance Optimization for the Fast Multipole Method

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation