Skip to main content
Log in

Parallel algorithm design and optimization of geodynamic numerical simulation application on the Tianhe new-generation high-performance computer

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

CitcomCu is a numerical simulation software for mantle convection in the field of geodynamics, which can simulate thermo-chemical convection in a three-dimensional domain. Due to the increasing demand for high-precision simulations and larger application scales, larger-scale computing systems are needed to solve this problem. However, the parallel efficiency of CitcomCu on large-scale heterogeneous parallel computing systems is difficult to improve, especially it cannot adapt to the current mainstream heterogeneous high-performance computing architecture with CPUs and accelerators. In this paper, we propose an geodynamics numerical simulation parallel computing framework using heterogeneous computing architecture based on the Tianhe new-generation high-performance computer. Firstly, the data partitioning mode of CitcomCu was optimized based on the large-scale heterogeneous computing architecture to reduce the overall communication overhead. Secondly, the iterative solution algorithm of CitcomCu was improved to speed up the solution process. Finally, the NEON instruction set based on SIMD is used for the sparse matrix operations in the solution process to improve parallel efficiency. Based on our parallel computing framework, the optimized CitcomCu was deployed and tested on the Tianhe new-generation high-performance computer. Experimental data showed that the performance of the optimized program was 3.3975 times higher than that of the unoptimized program on a single node. Compared with 50,000 computational cores, the parallel efficiency of the unoptimized program on one million computational cores was 36.75%, while the parallel efficiency of the optimized program was improved by 16.22% and reached 42.71%. In addition, the optimized program can be executed on 40 million computational cores, with a parallel efficiency of 36.54%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Data availability statement

Not applicable.

Code availability

Not applicable.

References

  1. Zhong S, Michael G, Moresi L (1998) Role of faults, nonlinear rheology, and viscosity structure in generating plates from instantaneous mantle flow models[J]. J Geophys Res 103:15255–15268. https://doi.org/10.1029/98JB00605

    Article  Google Scholar 

  2. Zhong S, Zuber M, Moresi L, Gurnis M (2000) Role of temperature-dependent viscosity and surface plates in spherical shell models of mantle convection. J Geophys Res 105:11063–11082. https://doi.org/10.1029/2000JB900003

    Article  Google Scholar 

  3. Assunção J, Sacek V (2017) Heat transfer regimes in mantle dynamics using the CitcomCU software. In: 15th International Congress of the Brazilian Geophysical Society and EXPOGEF, Rio de Janeiro, Brazil, 31 July-3. Brazilian Geophysical Society, pp 1636–1639. https://doi.org/10.1190/sbgf2017-318

  4. Yang T, Moresi L, Gurnis M et al (2019) Contrasted East Asia and South America tectonics driven by deep mantle flow. Earth Planet Sci Lett 517:106–116. https://doi.org/10.1016/j.epsl.2019.04.025

    Article  Google Scholar 

  5. Parmentier EM, Turcotte DL, Torrance KE (1976) Studies of finite amplitude non-Newtonian thermal convection with application to convection in the Earth’s mantle. J Geophys Res 81(11):1839–1846. https://doi.org/10.1029/JB081i011p01839

    Article  Google Scholar 

  6. Van Zelst I, Crameri F, Pusok AE et al (2022) 101 geodynamic modelling: how to design, interpret, and communicate numerical studies of the solid Earth. Solid Earth 13(3):583–637. https://doi.org/10.5194/se-13-583-2022

    Article  Google Scholar 

  7. Moresi L, Gurnis M (1996) Constraints on the lateral strength of slabs from three-dimensional dynamic flow models. Earth Planet Sci Lett 138(1–4):15–28. https://doi.org/10.1016/0012-821X(95)00221-W

    Article  Google Scholar 

  8. Zhong S (2005) constraints on thermochemical convection of the mantle from plume-related observations. In: AGU Spring Meeting Abstracts, V42A-01

  9. Kronbichler M, Heister T, Bangerth W (2012) High accuracy mantle convection simulation through modern numerical methods. Geophys J Int 191(1):12–29. https://doi.org/10.1111/j.1365-246X.2012.05609.x

    Article  Google Scholar 

  10. Morra G (2019) Pythonic geodynamics: implementations for fast computing on Jupyter notebooks. In: AGU Fall Meeting Abstracts. ED53F-0902

  11. Kohl N, Thönnes D, Drzisga D et al (2019) The HyTeG finite-element software framework for scalable multigrid solvers. Int J Parallel Emergent Distrib Syst 34(5):477–496. https://doi.org/10.1080/17445760.2018.1506453

    Article  Google Scholar 

  12. Fraters M, Thieulot C, Van Den Berg A et al (2019) The Geodynamic World Builder: a solution for complex initial conditions in numerical modeling. Solid Earth 10(5):1785–1807

    Article  Google Scholar 

  13. Xiao J, Chen J, Zheng J, An H, Huang S, Yang C, Li F, Zhang Z, Huang Y, Han W, Liu X, Chen D, Liu Z, Zhuang G, Chen J, Li G, Sun X, Chen Q (2021) Symplectic structure-preserving particle-in-cell whole-volume simulation of tokamak plasmas to 111.3 trillion particles and 25.7 billion grids. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’21). Association for Computing Machinery, New York, NY, USA, Article 2, pp 1–13. https://doi.org/10.1145/3458817.3487398

  14. Liu Y, Liu X, Li F, Fu H, Yang Y, Song J, Zhao P, Wang Z, Peng D, Chen H, Guo C (2021) Closing the “quantum supremacy” gap: achieving real-time simulation of a random quantum circuit using a new Sunway supercomputer. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’21). Association for Computing Machinery, New York, NY, USA, Article 3, pp 1-12. https://doi.org/10.1145/3458817.3487399

  15. Shang H, Li F, Zhang Y, Zhang L, Fu Y, Gao Y, Wu Y, Duan X, Lin R, Liu X, Liu Y, Chen D (2021) Extreme-scale ab initio quantum Raman spectra simulations on the leadership HPC system in China. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’21). Association for Computing Machinery, New York, NY, USA, Article 6, pp 1-13. https://doi.org/10.1145/3458817.3487402

  16. Gómez CD (2019) Particle-in-cell finite element models of deformation surrounding the bend in the San Andreas fault. California State University, Northridge

    Google Scholar 

  17. Bauer S, Bunge HP, Drzisga D, et al. (2016) Hybrid parallel multigrid methods for geodynamical simulations. In: Software for Exascale Computing-SPPEXA 2013–2015. Springer, Cham, pp 211–235. https://doi.org/10.1007/978-3-319-40528-5_10

  18. Assunção J, Sacek V (2017) Benchmark comparison study for mantle thermal convection using the CitcomCU numerical code. In: 15th International Congress of the Brazilian Geophysical Society and EXPOGEF, Rio de Janeiro, Brazil, 31 July-3. Brazilian Geophysical Society, pp 1630–1635. https://doi.org/10.1190/sbgf2017-317

  19. Bauer S, Huber M, Ghelichkhan S et al (2019) Large-scale simulation of mantle convection based on a new matrix-free approach. J Comput Sci 31:60–76. https://doi.org/10.1016/j.jocs.2018.12.006

    Article  Google Scholar 

  20. May D A, Sanan P, Rupp K, et al. Extreme-scale multigrid components within PETSc. In: Proceedings of the Platform for Advanced Scientific Computing Conference, pp 1–12. https://doi.org/10.1145/2929908.2929913

  21. Bangerth W, Burstedde C, Heister T et al (2012) Algorithms and data structures for massively parallel generic adaptive finite element codes. ACM Trans Math Softw (TOMS) 38(2):1–28. https://doi.org/10.1145/2049673.2049678

    Article  MathSciNet  Google Scholar 

  22. Chen W, Dong X, Chen H et al (2021) Performance evaluation of convolutional neural network on Tianhe-3 prototype. J Supercomput 77(11):12647–12665. https://doi.org/10.1007/s11227-021-03759-8

    Article  Google Scholar 

  23. Lu K, Wang Y, Guo Y, et al. (2022) MT-3000: a heterogeneous multi-zone processor for HPC. CCF Trans High Perform Comput. https://doi.org/10.1007/s42514-022-00095-y

  24. Li J J, Li J, Yang Y, et al. (2022) A parallel ETD algorithm for large-scale rate theory simulation. J Supercomput. https://doi.org/10.1007/s11227-022-04434-2

  25. Maccabe AB (2017) Operating and runtime systems challenges for HPC systems. In: Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017, p 1. https://doi.org/10.1145/3095770.3095771

  26. Weng T, Zhou X, Li K, Peng P, Li K (2022) Efficient distributed approaches to core maintenance on large dynamic graphs. IEEE Trans Parallel Distrib Syst 33(1):129–143. https://doi.org/10.1109/TPDS.2021.3090759

    Article  Google Scholar 

  27. Zhao T, Hall M, Johansen H, et al. (2021) Improving communication by optimizing on-node data movement with data layout. In: Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp 304–317. https://doi.org/10.1145/3437801.3441598

  28. Weng T, Zhou X, Li K, Tan K-L, Li K (2023) Distributed approaches to butterfly analysis on large dynamic bipartite graphs. IEEE Trans Parallel Distrib Syst 34(2):431–445. https://doi.org/10.1109/TPDS.2022.3221821

    Article  Google Scholar 

  29. Žaloudek L, Sekanina L (2011) Increasing fault-tolerance in cellular automata-based systems. In: Calude CS, Kari J, Petre I, Rozenberg G (eds) Unconventional Computation. UC 2011. Lecture Notes in Computer Science, vol 6714. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21341-0_26

    Chapter  Google Scholar 

  30. Zhong S, Yuen D, Moresi L (2007) Numerical methods for mantle convection. In: Treatise on geophysics, vol 7. Elsevier, pp 227–252. https://doi.org/10.1016/B978-044452748-6.00118-8

  31. Song K, Li W, Zhang B, et al. (2022) Parallel design and implementation of Jacobi iterative algorithm based on ternary optical computer. J Supercomput. https://doi.org/10.1007/s11227-022-04471-x

  32. Zhang K, Ding L, Cai Y, et al. (2017) A high performance real-time edge detection system with NEON. In: 2017 IEEE 12th International Conference on ASIC (ASICON). IEEE, pp 847–850. https://doi.org/10.1109/ASICON.2017.8252609

  33. Chen X, Gao Y, Shang H et al (2022) Increasing the efficiency of massively parallel sparse matrix-matrix multiplication in first-principles calculation on the new-generation Sunway supercomputer. IEEE Trans Parallel Distrib Syst 33(12):4752–4766. https://doi.org/10.1109/TPDS.2022.3202518

    Article  Google Scholar 

Download references

Funding

The research was partially funded by Key-Area Research and Development Program of Guangdong Province (2021B0101190004), the National Key R &D Program of China (Grant Nos. 2021YFB0300800), The Key Program of National Natural Science Foundation of China (Grant Nos. U21A20461, 92055213), The National Natural Science Foundation of China (Grant No. 61872127), Research on High Precision Numerical Simulation and Parallel Computing Method for Ion Implanted Silicon Carbide Semiconductor Doping Process (U21A20461).

Author information

Authors and Affiliations

Authors

Contributions

The paper properly credits the meaningful contributions of all authors. All authors have been personally and actively involved in substantial work leading to the paper and will take public responsibility for its content.

Corresponding author

Correspondence to Wangdong Yang.

Ethics declarations

Conflict of interest

This material is the authors’ own original work, which has not been previously published elsewhere. The paper is not currently being considered for publication elsewhere. The paper reflects the authors’ own research and analysis in a truthful and complete manner.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Appendix

Appendix A: Appendix

1.1 A.1. Noun introduction table

See Appendix Table 5.

Table 5 Noun introduction table

1.2 A.2. Input File

# 1. Input and Output Files Information

datafile=“CASE2/caseA”

use_scratch="local"

oldfile=“CASE2/caseA”

restart=0

restart_timesteps=20000

stokes_flow_only=0

maxstep=1000

storage_spacing=50

# 2. Geometry, Ra numbers, Internal heating, Thermochemical/Purely thermal convection

Solver=multigrid node_assemble=1

rayleigh=10.97394e5

rayleigh_comp=1e6

composition=0

Q0=0

Q0_enriched=0

markers_per_ele=15

comp_depth=0.605

visc_heating=0

adi_heating=0

# 3. Grid And Multiprocessor Information

nprocx=16

nprocz=16

nprocy=8

nodex=33 nodez=33 nodey=33

mgunitx=32

mgunitz=32

mgunity=16

levels=3

# 4. Coordinate Information

Geometry=cart3d

dimenx=1.0

dimenz=1.0

dimeny=1.0

z_grid_layers=2

zz=0.0,1.0

nz=1,129

x_grid_layers=2

xx=0,1

nx=1,129

y_grid_layers=2

yy=0,1

ny=1,65

z_lmantle=0.76655052

z_410=0.857143

z_lith=0.9651568

# 5. Rheology

rheol=0

TDEPV=off

VISC_UPDATE=off

update_every_steps=2

num_mat=4

visc0=1.0e0,1.0e0,1.0e0,1.0e0

viscE=6.9077553,6.9077553,6.9077553,6.9077553

viscT=273,273,273,273

viscZ=5e-6,5e-6,5e-6,5e-6

SDEPV=off

sdepv_misfit=0.010

sdepv_expt=1,1,1,1

sdepv_trns=1.e0,1.e0,1.e0,1.e0

VMIN=on visc_min=5.0e-2

VMAX=on visc_max=2.0e04

visc_smooth_cycles=1

Viscosity=system

# 6. DIMENSIONAL INFORMATION and Depth-dependence

layerd=2870000.0

radius=6370000.0

ReferenceT=3800.0

refvisc=1.0e20

density=3300.0

thermdiff=1.0e-6

gravacc=9.8

thermexp=5e-5

cp=1250

wdensity=0.0

visc_factor=1.0

thermexp_factor=1.0

thermdiff_factor=1.00

dissipation_number=2.601

surf_temp=0.078947

# 7. phase changes: to turn off any of the phase changes, let Ra_XXX=0

Ra_410=0.0

Ra_670=0.0

clapeyron410=3.0e6

clapeyron670=-3.0e6

width410=3.5e4

width670=3.5e4

# 8. BOUNDARY CONDITIONS and Initial perturbations

topvbc=0

topvbxval=0.0

topvbyval=0.0

botvbc=0

botvbxval=0.0

botvbyval=0.0

toptbc=1 bottbc=1

toptbcval=0.0 bottbcval=1.0

periodicx=off

periodicy=off

flowthroughx=off

flowthroughy=off

num_perturbations=1

perturbmag=0.001

perturbk=1.0

perturbl=6.0

perturbm=0.0

# 9. SOLVER RELATED MATTERS

Problem=convection

aug_lagr=on

aug_number=1.0e3

precond=on

orthogonal=off

maxsub=1

viterations=2

mg_cycle=1

down_heavy=3

up_heavy=3

vlowstep=20

vhighstep=3

piterations=375

accuracy=1.0e-2

tole_compressibility=1e-7

# Tuning of energy equation

adv_sub_iterations=2

finetunedt=0.75

ll_max=20

nlong=180

nlati=90

# Data input and program debugging

DESCRIBE=off

BEGINNER=off

VERBOSE=off

verbose=off

COMPRESS=off

see_convergence=1

# vim:ts=8:sw=8

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, J., Yang, W., Qi, R. et al. Parallel algorithm design and optimization of geodynamic numerical simulation application on the Tianhe new-generation high-performance computer. J Supercomput 80, 331–362 (2024). https://doi.org/10.1007/s11227-023-05469-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-023-05469-9

Keywords

Navigation