An (almost) direct deployment of the Fast Multipole Method on the Cell processor

Fortin, Pierre; Lamotte, Jean-Luc

doi:10.1007/s11227-013-0877-z

An (almost) direct deployment of the Fast Multipole Method on the Cell processor

Published: 25 January 2013

Volume 65, pages 1205–1222, (2013)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Pierre Fortin¹ &
Jean-Luc Lamotte¹

144 Accesses
Explore all metrics

Abstract

This paper presents the first deployment of the Fast Multipole Method on the Cell processor (PowerXCell 8i). We rely on the matrix formulation with BLAS routines of the FMB code (Fast Multipole with BLAS) in order to directly and efficiently offload the most time consuming operators of both far field and near field computations on the Cell heterogeneous cores. We detail the difficulties that had to be solved first, and we finally obtain a deployment in single and double precisions, which scales linearly on several Cell blades and which is able to handle both uniform and non-uniform distributions of particles. We also present our performance results and comparisons with multicore CPUs, as well as the limitations of our deployment on the Cell processor.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast multipole preconditioners for sparse matrices arising from elliptic equations

Article Open access 09 November 2017

Parallel Finite Cell Method with Adaptive Geometric Multigrid

FEMPAR: An Object-Oriented Parallel Finite Element Framework

Article Open access 11 October 2017

References

Cheng H, Greengard L, Rokhlin V (1999) A fast adaptive multipole algorithm in three dimensions. J Comput Phys 155:468–498
Article MathSciNet MATH Google Scholar
Dongarra J, Sullivan F (2000) Guest editors’ introduction: the top 10 algorithms. Comput Sci Eng 2(1):22–23
Article Google Scholar
Lashuk I, Chandramowlishwaran A, Langston H, Nguyen TA, Sampath R, Shringarpure A, Vuduc R, Ying L, Zorin D, Biros G (2009) A massively parallel adaptive fast-multipole method on heterogeneous architectures. In: SC’09, 58
Google Scholar
Arora N, Shringarpure A, Vuduc R (2009) Direct N-body kernels for multicore platforms. In: ICPP’09, pp 379–387
Google Scholar
Knight TJ, Park JY, Ren M, Houston M, Erez M, Fatahalian K, Aiken A, Dally WJ, Hanrahan P (2007) Compilation for explicitly managed memory hierarchies. In: PPoPP’07, pp 226–236
Google Scholar
De Fabritiis G (2007) Performance of the cell processor for biomolecular simulations. Comput Phys Commun 176:660–664
Article Google Scholar
Luttmann E, Ensign D, Vaidyanathan V, Houston M, Rimon N, Øland J, Jayachandran G, Friedrichs M, Pande V (2009) Accelerating molecular dynamic simulation on the cell processor and Playstation 3. J Comput Chem 30(2):268–274
Article Google Scholar
Swaminarayan S, Kadau K, Germann TC, Fossum GC (2008) 369 Tflop/s molecular dynamics simulations on the Roadrunner general-purpose heterogeneous supercomputer. In: SC’08
Google Scholar
Gumerov NA, Duraiswami R (2008) Fast multipole methods on graphics processors. J Comput Phys 227:8290–8313
Article MathSciNet MATH Google Scholar
Yokota R, Bardhan JP, Knepley MG, Barba LA, Hamada T (2011) Biomolecular electrostatics using a fast multipole BEM on up to 512 GPUs and a billion unknowns. Comput Phys Commun 182(6):1272–1283
Article MATH Google Scholar
Chandramowlishwaran A, Williams S, Oliker L, Lashuk I, Biros G, Vuduc R (2010) Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures. In: IPDPS’10
Google Scholar
Hu Q, Gumerov NA, Duraiswami R (2011) Scalable fast multipole methods on distributed heterogeneous architectures. In: SC’11
Google Scholar
Hu Q, Gumerov NA, Duraiswami R (2012) Scalable distributed fast multipole methods. In: HPCC’12
Google Scholar
Yokota R, Barba L (2012) Hierarchical N-body simulations with autotuning for heterogeneous systems. Comput Sci Eng 14(3):30–39
Article Google Scholar
Coulaud O, Fortin P, Roman J (2008) High performance BLAS formulation of the multipole-to-local operator in the fast multipole method. J Comput Phys 227(3):1836–1862
Article MathSciNet MATH Google Scholar
Coulaud O, Fortin P, Roman J (2010) High-performance BLAS formulation of the adaptive fast multipole method. Math Comput Model 51(3–4):177–188
Article MathSciNet MATH Google Scholar
Takahashi T, Cecka C, Fong W, Darve E (2012) Optimizing the multipole-to-local operator in the fast multipole method for graphical processing units. Int J Numer Methods Eng 89(1):105–133
Article MATH Google Scholar
Nyland L, Harris M, Prins J (2007) Fast N-body simulation with CUDA. GPU Gems 3:677–695
Google Scholar
Fortin P, Lamotte JL (2009) Fast multipole method on the cell broadband engine: the near field part. In: ParCo’2009, vol 19, pp 323–330
Google Scholar
IBM (2008) Basic linear algebra subprograms library programmer’s guide and API reference, software development kit for multicore acceleration version 3.1
Bourgerie Q, Fortin P, Lamotte JL (2010) Efficient complex matrix multiplication on the synergistic processing element of the CEll processor. In: PPAAC’10
Google Scholar
Fortin P, Lamotte JL (2013) The fast multipole method on the cell processor. Research report hal-00770484, LIP6. http://hal.archives-ouvertes.fr/hal-00770484
Coulaud O, Fortin P, Roman J (2007) Hybrid MPI-thread parallelization of the fast multipole method. In: ISPDC’07, pp 391–398
Google Scholar
Arevalo A, Matinata RM, Pandian M, Peri E, Ruby K, Thomas F, Almond C (2008) Programming the cell broadband engine architecture, examples and best practices. In: IBM redbook, SG24-SG7575
Google Scholar
IBM (2008) Cell broadband engine programming handbook, including the PowerXCell 8i processor. Version 1.11
Williams SW, Shalf J, Oliker L, Husbands P, Yelick K (2005) Dense and sparse matrix operations on the cell processor. LBNL paper LBNL-58253
Kurzak J, Buttari A, Dongarra J (2008) Solving systems of linear equations on the CELL processor using Cholesky factorization. IEEE Trans Parallel Distrib Syst 19(9):1175–1186
Article Google Scholar
Kurzak J, Alvaro W, Dongarra J (2009) Optimizing matrix multiplication for a short-vector SIMD architecture—CELL processor. Parallel Comput 35(3):138–150
Article Google Scholar
Kistler M, Gunnels J, Brokenshire D, Benton B (2009) Programming the Linpack benchmark for the IBM PowerXCell 8i processor. Sci Program 17(1–2):43–57
Google Scholar
Hamada T, Narumi T, Yokota R, Yasuoka K, Nitadori K, Taiji M (2009) 42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence. In: SC’09, 62
Google Scholar
Augonnet C, Thibault S, Namyst R, Wacrenier PA (2011) StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr Comput: Pract Exper 23(2):87–198
Article Google Scholar

Download references

Acknowledgements

This work was carried out with partial support from HPC@LR, a Competence Center in High-Performance Computing from the Languedoc-Roussillon region, funded by the Languedoc-Roussillon region, the European Union, and the Université Montpellier 2 Sciences et Techniques. The authors would like to cordially thank the system teams at HPC@LR and at Polytech’Paris-UPMC, as well as B. Cirou at CINES, for helpful assistance during the performance tests.

Author information

Authors and Affiliations

LIP6, UPMC Univ. Paris 06 and CNRS UMR 7606, 4 place Jussieu, 75252, Paris cedex 05, France
Pierre Fortin & Jean-Luc Lamotte

Authors

Pierre Fortin
View author publications
You can also search for this author inPubMed Google Scholar
Jean-Luc Lamotte
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Pierre Fortin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fortin, P., Lamotte, JL. An (almost) direct deployment of the Fast Multipole Method on the Cell processor. J Supercomput 65, 1205–1222 (2013). https://doi.org/10.1007/s11227-013-0877-z

Download citation

Published: 25 January 2013
Issue Date: September 2013
DOI: https://doi.org/10.1007/s11227-013-0877-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An (almost) direct deployment of the Fast Multipole Method on the Cell processor

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Fast multipole preconditioners for sparse matrices arising from elliptic equations

Parallel Finite Cell Method with Adaptive Geometric Multigrid

FEMPAR: An Object-Oriented Parallel Finite Element Framework

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now