Extreme-scale parallel computing: bottlenecks and strategies

Mo, Ze-yao

doi:10.1631/FITEE.1800421

Extreme-scale parallel computing: bottlenecks and strategies

Perspective
Published: 28 November 2018

Volume 19, pages 1251–1260, (2018)
Cite this article

Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Ze-yao Mo ORCID: orcid.org/0000-0003-3280-5682^1,2

266 Accesses
Explore all metrics

Abstract

Extreme-scale numerical simulations seriously demand extreme parallel computing capabilities. To address the challenges of these capabilities toward exascale, we systematically analyze the major bottlenecks of parallel computing research from three perspectives: computational scale, computing efficiency, and programming productivity. For these bottlenecks, we propose a series of urgent key issues and coping strategies. This study will be useful in synchronizing development between the numerical computing capability and supercomputer peak performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Survey of Parallel Computing: Challenges, Methods and Directions

Scalability and efficiency challenges for the exascale supercomputing system: practice of a parallel supporting environment on the Sunway exascale prototype system

Article 23 January 2023

Parallel Environments

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Amarasinghe S, Hall M, Lethin R, et al., 2011. Exascale programming challenges. Technical Report of the Workshop on Exascale Programming Challenges.
Google Scholar
Ashby S, Beckman P, Chen J, et al., 2011. The opportunities and challenges of exascale computing. Summary Report of the Advanced Scientific Computing Advisory Committee Subcommittee.
Google Scholar
Balay S, Gropp WD, McInnes LC, et al., 1997. Efficient management of parallelism in object–oriented numerical software libraries. In: Arge E, Bruaset AM, Langtangen HP (Eds.), Modern Software Tools for Scientific Computing. Birkhauser Boston Inc., Cambridge, USA. https://doi.org/10.1007/978–1–4612–1986–6.8
Campos C, Roman JE, 2012. Strategies for spectrum slicing based on restarted Lanczos methods. Numer Algor, 60(2):279–295. https://doi.org/10.1007/s11075–012–9564–z
Article MathSciNet MATH Google Scholar
Cao X, Mo Z, Liu X, et al., 2011. Parallel implementation of fast multipole method based on JASMIN. Sci China Inform Sci, 54(4):757–766 (in Chinese). https://doi.org/10.1007/s11432–011–4181.3
Article MathSciNet Google Scholar
Chung IH, Lee CR, Zhou J, et al., 2011. Hierarchical mapping for HPC applications. IEEE Int Symp on Parallel and Distributed Processing Workshops and PhD Forum, p.1815–1823. https://doi.org/10.1109/IPDPS.2011.340
MATH Google Scholar
Cooley JW, Tukey JW, 1965. An algorithm for the machine calculation of complex Fourier series. Math Comput, 19(90):297–301. https://doi.org/10.1090/S0025–5718–1965–0178586.1
Article MathSciNet MATH Google Scholar
Darve E, 2000. The fast multipole method: numerical implementation. J Comput Phys, 160(1):195–240. https://doi.org/10.1006/jcph.2000.6451
Article MathSciNet MATH Google Scholar
Dolean V, Jolivet P, Nataf F, 2015. An Introduction to Domain Decomposition Methods: Algorithms, Theory, and Parallel Implementation. Society for Industrial and Applied Mathematics, Philadelphia, USA. https://doi.org/10.1137/1.9781611974065
Book MATH Google Scholar
Dongarra J, Foster I, Fox G, et al., 2003. The Sourcebook of Parallel Computing. Morgan Kaufmann Publishers Inc., San Francisco, USA.
Google Scholar
Dubey A, Almgren A, Bell J, et al., 2014. A survey of high level frameworks in block–structured adaptive mesh refinement packages. J Parall Distr Comput, 74(12):3217–3227. https://doi.org/10.1016/j.jpdc.2014.07.001
Article Google Scholar
Engheta N, Murphy WD, Rokhlin V, et al., 1992. The fast multipole method (FMM) for electromagnetic scattering problems. IEEE Trans Antenn Propag, 40(6):634–641. https://doi.org/10.1109/8.144597
Article MathSciNet MATH Google Scholar
Falgout RD, Yang UM, 2002. Hypre: a library of high performance pre–conditioners. Int Conf on Computational Science, p.632–641.
MATH Google Scholar
Fu H, He C, Chen B, et al., 2017. 18.9–Pflops nonlinear earthquake simulation on Sunway TaihuLight: enabling depiction of 18–Hz and 8–meter scenarios. Int Conf for High Performance Computing, Networking, Storage, and Analysis, p.1–12. https://doi.org/10.1145/3126908.3126910
Google Scholar
Hennessy JL, Patterson DA, 2003. Computer Architecture: a Quantitative Approach. Morgan Kaufmann Publishers Inc., San Francisco, USA.
Google Scholar
Hernandez V, Roman JE, Vidal V, 2005. SLEPc: a scalable and flexible toolkit for the solution of eigenvalue problems. ACM Trans Math Softw, 31(3):351–362. https://doi.org/10.1145/1089014.1089019
Article MathSciNet MATH Google Scholar
Heroux MA, Bartlett RA, Howle VE, et al., 2005. An overview of the Trilinos project. ACM Trans Math Softw, 31(3):397–423. https://doi.org/10.1145/1089014.1089021
Article MathSciNet MATH Google Scholar
Johansen H, McInnes LC, Bernholdt DE, et al., 2014. Software productivity for extreme–scale science. DOE Workshop Report.
Google Scholar
Keyes DE, Mcinnes LC, Woodward CS, et al., 2013. Multiphysics simulations: challenges and opportunities. Int J High Perform Comput Appl, 27(1):4–83. https://doi.org/10.1177.1094342012468181
Article Google Scholar
Knoll DA, Keyes DE, 2004. Jacobian–free Newton–Krylov methods: a survey of approaches and applications. J Comput Phys, 193(2):357–397. https://doi.org/10.1016/j.jcp.2003.08.010
Article MathSciNet MATH Google Scholar
Li J, Zhang X, Tan G, et al., 2013. SMAT: an input adaptive sparse matrix–vector multiplication auto–tuner. ACM SIGPLAN Not, 48(6):117–126. https://doi.org/10.1145/2499370.2462181
Article Google Scholar
Liu X, Yang Z, Yang Y, 2018. A nested partitioning load balancing algorithm for Tianhe–2. J Comput Res Devel, 55(2):418–425. https://doi.org/10.7544/issn1000–1239.2018.20160877
Google Scholar
Lucas R, Ang J, Bergman K, et al., 2014. DOE Advanced Scientific Computing Advisory Subcommittee report: top 10 exascale research challenges. https://doi.org/10.2172.1222713
Book Google Scholar
Mo Z, 2014. Domain–specific programming model for high performance scientific and engineering computation. Commun CCF, 10(1):8–12 (in Chinese).
Google Scholar
Mo Z, 2015. Progress on high performance programming framework for numerical simulation. E–Sci Technol Appl, 6(4):11–19 (in Chinese). https://doi.org/10.11871/j.issn.1674–9480.2015.04.002
Google Scholar
Mo Z, 2016. High performance programming frameworks for numerical simulation. Nat Sci Rev, 3(1):28–29. https://doi.org/10.1093/nsr/nw.086
Article Google Scholar
Mo Z, Zhang A, Cao X, et al., 2010. JASMIN: a parallel software infrastructure for scientific computing. Front Comput Sci China, 4(4):480–488. https://doi.org/10.1007/s11704–010–0120.5
Article Google Scholar
Mo Z, Zhang A, Liu Q, et al., 2015. Research on the components and practices for domain–specific parallel programming models for numerical simulation. Sci Sin Inform, 45(3):385–397 (in Chinese). https://doi.org/10.1360/N112013.00197
Article Google Scholar
Mo Z, Zhang A, Liu Q, et al., 2016. Parallel algorithm and parallel programming: from specialty to generality as well as software reuse. Sci Sin Inform, 46(10):1392–1410 (in Chinese). https://doi.org/10.1360/N112016.00144
Google Scholar
Pei W, Zhu S, 2009. Scientific computing for laser fusion. Physics, 38(8):559–568 (in Chinese). https://doi.org/10.3321/j.issn:0379–4148.2009.08.005
Google Scholar
Reed DA, Bajcsy R, Fernandez MA, et al., 2005. Computational science: ensuring America’s competitiveness. Research Report No. ADA462840. President’s Information Technology Advisory Committee. http://www.dtic.mil/dtic/tr/fulltext/u2/a462840.pdf
Google Scholar
Rossinelli D, Hejazialhosseini B, Hadjidoukas P, et al., 2013 11 Pflop/s simulations of cloud cavitation collapse. Int Conf on High Performance Computing, Networking, Storage, and Analysis, p.1–13. https://doi.org/10.1145/2503210.2504565
Google Scholar
Rudi J, Malossi ACI, Isaac T, et al., 2015. An extreme–scale implicit solver for complex PDEs: highly heterogeneous flow in Earth’s mantle. Int Conf for High Performance Computing, Networking, Storage, and Analysis, p.1–12. https://doi.org/10.1145/2807591.2807675
Book Google Scholar
Saad T, Darwish M, 2009. A high scalability parallel algebraic multigrid solver. In: Deconinck H, Dick E (Eds.), Computational Fluid Dynamics. Springer Berlin Heidelberg, p.231–236. https://doi.org/10.1007/978–3–540–92779–2.34
Saad Y, 2003. Iterative Methods for Sparse Linear Systems (2nd Ed.). Society for Industrial and Applied Mathematics, Philadelphia, USA.
Book MATH Google Scholar
Sarkar V, Budimlic Z, Kulkani M, 2016. 2014 runtime systems Summit. Runtime Systems Report. https://doi.org/10.2172.1341724
Book Google Scholar
Shaw DE, Grossman JP, Bank JA, et al., 2014. Anton 2: raising the bar for performance and programmability in a special–purpose molecular dynamics supercomputer. Int Conf for High Performance Computing, Networking, Storage, and Analysis, p.41–53. https://doi.org/10.1109/SC.2014.9
Google Scholar
Tian R, Zhou M, Wang J, et al., 2018. A challenging dam structural analysis: large–scale implicit thermomechanical coupled contact simulation on Tianhe–2. Comput Mech, p.1–21. https://doi.org/10.1007/s00466–018–1586.5
Google Scholar
Vuduc R, Demmel JW, Yelick KA, 2005. OSKI: a library of automatically tuned sparse matrix kernels. J Phys Conf Ser, 16:521–530. https://doi.org/10.1088/1742–6596/16/1.071
Article Google Scholar
Wissink AM, Hornung RD, Kohn SR, et al., 2001. Large scale parallel structured AMR calculations using the SAMRAI framework. ACM/IEEE Conf on Supercomputing, p.6. https://doi.org/10.1145/582034.582040
Book Google Scholar
Xu X, Mo Z, 2017. Algebraic interface–based coarsening AMG pre–conditioner for multi–scale sparse matrices with applications to radiation hydrodynamics computation. Numer Linear Algebra Appl, 24(2):e2078. https://doi.org/10.1002/nla.2078
Article MATH Google Scholar
Yang C, Xue W, Fu H, et al., 2016. 10M–core scalable fullyimplicit solver for non–hydrostatic atmospheric dynamics. Int Conf for High Performance Computing, Networking, Storage, and Analysis, p.1–12. https://doi.org/10.1109/SC.2016.5
Google Scholar
Yang X, 2012. Sixty years of parallel computing. Comput Eng Sci, 34(8):1–10 (in Chinese). https://doi.org/10.3969/j.issn.1007–130X.2012.08.001
Google Scholar
Zhao Z, Zhou H, Ma H, et al., 2014. Numerical simulation and verification of electromagnetic pulse effect of PIN diode limiter. High Power Laser Particle Beams, 26(6):81–85 (in Chinese). https://doi.org/10.11884/HPLPB201426.063018
Google Scholar

Download references

Author information

Authors and Affiliations

CAEP Software Center for High Performance Numerical Simulation, Beijing, 100088, China
Ze-yao Mo
Institute of Applied Physics and Computational Mathematics, Beijing, 100094, China
Ze-yao Mo

Authors

Ze-yao Mo
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Ze-yao Mo.

Additional information

Project supported by the National Natural Science Foundation of China (No. 91430218) and the National Key Technology R&D Program of China (Nos. 2016YFB0201300 and 2017YFB0202103)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mo, Zy. Extreme-scale parallel computing: bottlenecks and strategies. Frontiers Inf Technol Electronic Eng 19, 1251–1260 (2018). https://doi.org/10.1631/FITEE.1800421

Download citation

Received: 07 July 2018
Revised: 14 September 2018
Accepted: 15 October 2018
Published: 28 November 2018
Issue Date: October 2018
DOI: https://doi.org/10.1631/FITEE.1800421

Key words

CLC number

TP311

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Extreme-scale parallel computing: bottlenecks and strategies

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Survey of Parallel Computing: Challenges, Methods and Directions

Scalability and efficiency challenges for the exascale supercomputing system: practice of a parallel supporting environment on the Sunway exascale prototype system

Parallel Environments

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

Subscribe and save

Buy Now