Skip to main content
Log in

Efficient parallelization of multilevel fast multipole algorithm for electromagnetic simulation on many-core SW26010 processor

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

A many-core parallel approach of the multilevel fast multipole algorithm (MLFMA) based on the Athread parallel programming model is presented on the homegrown many-core SW26010 CPU of China. In the proposed many-core implementation of MLFMA, the data access efficiency is improved by using data structures based on the structure of array. The adaptive workload distribution strategies are adopted on different MLFMA tree levels to ensure full utilization of computing capability and the scratchpad memory. A double buffering scheme is specially designed to make communication overlapped computation. The resulting Athread-based many-core implementation of the MLFMA is capable of solving real-life problems with over one million unknowns with a remarkable speedup. The capability and efficiency of the proposed method are analyzed through the examples of computing scattering by spheres and a practical aerocraft. Numerical results show that with the proposed parallel scheme, the total speedup ratios from 6.4 to 8.0 can be achieved, compared with the CPU master core.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Dongarra J, Sullivan F (2000) Guest Editors Introduction to the top 10 algorithms. Comput Sci Eng 2(1):22–23

    Article  Google Scholar 

  2. Song JM, Lu CC, Chew WC (1997) Multilevel fast multipole algorithm for electromagnetic scattering by large complex objects. IEEE Trans Antennas Propag 45(10):1488–1493

    Article  Google Scholar 

  3. Sheng XQ, Jin JM, Song J et al (1998) Solution of combined-field integral equation using multilevel fast multipole algorithm for scattering by homogeneous bodies. IEEE Trans Antennas Propag 46(11):1718–1726

    Article  Google Scholar 

  4. Velamparambil S, Chew WC, Song JM (2003) 10 million unknowns: Is it that big? IEEE Antennas Propag Mag 45(2):43–58

    Article  Google Scholar 

  5. Pan XM, Sheng XQ (2008) A sophisticated parallel MLFMA for scattering by extremely large targets. IEEE Antennas Propag Mag 50(3):129–138

    Article  MathSciNet  Google Scholar 

  6. Ergul O, Gurel L (2008) Hierarchical parallelization strategy for multilevel fast multipole algorithm in computational electromagnetics. Electron Lett 44(6):3–4

    Article  Google Scholar 

  7. Yang ML, Wu BY, Gao HW et al (2008) A ternary parallelization approach of MLFMA for solving electromagnetic scattering problems with over 10 billion unknowns. IEEE Trans Antennas Propag 67(11):6965–6978

    Article  Google Scholar 

  8. Hu FJ, Nie ZP, Hu J (2010) An efficient parallel multilevel fast multipole algorithm for large-scale scattering problems. Appl Comput Electromagn Soc J 25(4):381–387

    Google Scholar 

  9. Zhao HP, Hu J, Nie ZP (2010) Parallelization of MLFMA with composite load partition criteria and asynchronous communication. Appl Comput Electromag Soc J 25(2):167–173

    Google Scholar 

  10. Pan XM, Pi WC, Yang ML et al (2012) Solving problems with over one billion unknowns by the MLFMA. IEEE Trans Antennas Propag 60(5):2571–2574

    Article  MathSciNet  Google Scholar 

  11. Donno DD, Esposito A, Tarricone LCL (2010) Introduction to GPU computing and CUDA programming: a case study on FDTD. IEEE Antennas Propag Mag 53(3):116–122

    Article  Google Scholar 

  12. Corp NVIDIA (2011) NVIDIA CUDA C Programming Guide. Santa Clara, CA, USA

  13. Crimi G, Mantovani F, Pivanti M et al (2013) Early experience on porting and running a Lattice Boltzmann code on the Xeon-Phi co-processor. Proc Comput Sci 18:551–560

    Article  Google Scholar 

  14. Murano K, Shimobaba T, Sugiyama A et al (2014) Fast computation of computer-generated hologram using Xeon Phi coprocessor. Comput Phys Commun 185(10):2742–2757

    Article  Google Scholar 

  15. Teodoro G, Kurc T, Kong J et al (2014) Comparative performance analysis of Intel Xeon Phi, GPU, and CPU: a case study from microscopy image analysis. IEEE Trans Parallel Distrib Syst 2014:1063–1072

    Google Scholar 

  16. Zheng F, Li HL, Lv H et al (2015) Cooperative computing techniques for a deeply fused and heterogeneous many-core processor architecture. J Comput Sci Technol 30(1):145–162

    Article  Google Scholar 

  17. Jiang L, Yang C, Ao Y et al (2017) Towards Highly Efficient DGEMM on the Emerging SW26010 Many-Core Processor. In: 46th International Conference on Parallel Processing (ICPP), IEEE computer society

  18. Xu K, Ding DZ, Fan ZH et al (2010) Multilevel fast multipole algorithm enhanced by GPU parallel technique for electromagnetic scattering problems. Microw Opt Technol Lett 52(3):502–507

    Article  Google Scholar 

  19. Guan J, Yan S, Jin JM (2013) An OpenMP-CUDA implementation of multilevel fast multipole algorithm for electromagnetic simulation on multi-GPU computing systems. IEEE Trans Antennas Propag 61(7):3607–3616

    Article  MathSciNet  Google Scholar 

  20. Mu X, Zhou HX, Chen K et al (2014) Higher order method of moments with a parallel out-of-core LU solver on GPU/CPU platform. IEEE Trans Antennas Propag 62(11):5634–5646

    Article  MathSciNet  Google Scholar 

  21. Tran N, Kilic O (2016) Parallel implementations of multilevel fast multipole algorithm on graphical processing unit cluster for large-scale electromagnetics objects. Appl Comput Electromag Soc J 1(4):145–148

    Google Scholar 

  22. Phan T, Tran N, Kilic O (2018) Multi-level fast multipole algorithm for 3-D homogeneous dielectric objects using MPI-CUDA on GPU cluster. Appl Comput Electromag Soc J 33(3):335–338

    Google Scholar 

  23. Rao S, Wilton D, Glisson A (1982) Electromagnetic scattering by surfaces of arbitrary shape. IEEE Trans Antennas Propag 30(3):409–418

    Article  Google Scholar 

  24. Fu H, Liao JF, Yang JZ et al (2016) The Sunway TaihuLight supercomputer: system and applications. Sci China Inf Sci 59(7):072001

    Article  Google Scholar 

  25. Dongarra J (2016) Sunway TaihuLight supercomputer makes its appearance. Natl Sci Rev 3(3):265–266

    Article  Google Scholar 

  26. Xu Z, Lin J, Matsuoka S (2017) Benchmarking SW26010 Many-Core processor. In: IEEE International parallel and distributed processing symposium workshops

  27. OpenACC-Standard.org (2018) The OpenACC Application Programming Interface

  28. National Supercomputing Center in Wuxi (2016) The Compiling System User Guide of Sunway TighthuLight

Download references

Acknowledgements

This work was supported by the National Key R&D Program of China (Grant No. 2017YFB0202500), and the NSFC (Grant Nos. 61971034 and U1730102).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ming-Lin Yang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

He, WJ., Yang, ML., Wang, W. et al. Efficient parallelization of multilevel fast multipole algorithm for electromagnetic simulation on many-core SW26010 processor. J Supercomput 77, 1502–1516 (2021). https://doi.org/10.1007/s11227-020-03308-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-020-03308-9

Keywords

Navigation