Abstract
CitcomCu is a numerical simulation software for mantle convection in the field of geodynamics, which can simulate thermo-chemical convection in a three-dimensional domain. Due to the increasing demand for high-precision simulations and larger application scales, larger-scale computing systems are needed to solve this problem. However, the parallel efficiency of CitcomCu on large-scale heterogeneous parallel computing systems is difficult to improve, especially it cannot adapt to the current mainstream heterogeneous high-performance computing architecture with CPUs and accelerators. In this paper, we propose an geodynamics numerical simulation parallel computing framework using heterogeneous computing architecture based on the Tianhe new-generation high-performance computer. Firstly, the data partitioning mode of CitcomCu was optimized based on the large-scale heterogeneous computing architecture to reduce the overall communication overhead. Secondly, the iterative solution algorithm of CitcomCu was improved to speed up the solution process. Finally, the NEON instruction set based on SIMD is used for the sparse matrix operations in the solution process to improve parallel efficiency. Based on our parallel computing framework, the optimized CitcomCu was deployed and tested on the Tianhe new-generation high-performance computer. Experimental data showed that the performance of the optimized program was 3.3975 times higher than that of the unoptimized program on a single node. Compared with 50,000 computational cores, the parallel efficiency of the unoptimized program on one million computational cores was 36.75%, while the parallel efficiency of the optimized program was improved by 16.22% and reached 42.71%. In addition, the optimized program can be executed on 40 million computational cores, with a parallel efficiency of 36.54%.















Similar content being viewed by others
Data availability statement
Not applicable.
Code availability
Not applicable.
References
Zhong S, Michael G, Moresi L (1998) Role of faults, nonlinear rheology, and viscosity structure in generating plates from instantaneous mantle flow models[J]. J Geophys Res 103:15255–15268. https://doi.org/10.1029/98JB00605
Zhong S, Zuber M, Moresi L, Gurnis M (2000) Role of temperature-dependent viscosity and surface plates in spherical shell models of mantle convection. J Geophys Res 105:11063–11082. https://doi.org/10.1029/2000JB900003
Assunção J, Sacek V (2017) Heat transfer regimes in mantle dynamics using the CitcomCU software. In: 15th International Congress of the Brazilian Geophysical Society and EXPOGEF, Rio de Janeiro, Brazil, 31 July-3. Brazilian Geophysical Society, pp 1636–1639. https://doi.org/10.1190/sbgf2017-318
Yang T, Moresi L, Gurnis M et al (2019) Contrasted East Asia and South America tectonics driven by deep mantle flow. Earth Planet Sci Lett 517:106–116. https://doi.org/10.1016/j.epsl.2019.04.025
Parmentier EM, Turcotte DL, Torrance KE (1976) Studies of finite amplitude non-Newtonian thermal convection with application to convection in the Earth’s mantle. J Geophys Res 81(11):1839–1846. https://doi.org/10.1029/JB081i011p01839
Van Zelst I, Crameri F, Pusok AE et al (2022) 101 geodynamic modelling: how to design, interpret, and communicate numerical studies of the solid Earth. Solid Earth 13(3):583–637. https://doi.org/10.5194/se-13-583-2022
Moresi L, Gurnis M (1996) Constraints on the lateral strength of slabs from three-dimensional dynamic flow models. Earth Planet Sci Lett 138(1–4):15–28. https://doi.org/10.1016/0012-821X(95)00221-W
Zhong S (2005) constraints on thermochemical convection of the mantle from plume-related observations. In: AGU Spring Meeting Abstracts, V42A-01
Kronbichler M, Heister T, Bangerth W (2012) High accuracy mantle convection simulation through modern numerical methods. Geophys J Int 191(1):12–29. https://doi.org/10.1111/j.1365-246X.2012.05609.x
Morra G (2019) Pythonic geodynamics: implementations for fast computing on Jupyter notebooks. In: AGU Fall Meeting Abstracts. ED53F-0902
Kohl N, Thönnes D, Drzisga D et al (2019) The HyTeG finite-element software framework for scalable multigrid solvers. Int J Parallel Emergent Distrib Syst 34(5):477–496. https://doi.org/10.1080/17445760.2018.1506453
Fraters M, Thieulot C, Van Den Berg A et al (2019) The Geodynamic World Builder: a solution for complex initial conditions in numerical modeling. Solid Earth 10(5):1785–1807
Xiao J, Chen J, Zheng J, An H, Huang S, Yang C, Li F, Zhang Z, Huang Y, Han W, Liu X, Chen D, Liu Z, Zhuang G, Chen J, Li G, Sun X, Chen Q (2021) Symplectic structure-preserving particle-in-cell whole-volume simulation of tokamak plasmas to 111.3 trillion particles and 25.7 billion grids. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’21). Association for Computing Machinery, New York, NY, USA, Article 2, pp 1–13. https://doi.org/10.1145/3458817.3487398
Liu Y, Liu X, Li F, Fu H, Yang Y, Song J, Zhao P, Wang Z, Peng D, Chen H, Guo C (2021) Closing the “quantum supremacy” gap: achieving real-time simulation of a random quantum circuit using a new Sunway supercomputer. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’21). Association for Computing Machinery, New York, NY, USA, Article 3, pp 1-12. https://doi.org/10.1145/3458817.3487399
Shang H, Li F, Zhang Y, Zhang L, Fu Y, Gao Y, Wu Y, Duan X, Lin R, Liu X, Liu Y, Chen D (2021) Extreme-scale ab initio quantum Raman spectra simulations on the leadership HPC system in China. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’21). Association for Computing Machinery, New York, NY, USA, Article 6, pp 1-13. https://doi.org/10.1145/3458817.3487402
Gómez CD (2019) Particle-in-cell finite element models of deformation surrounding the bend in the San Andreas fault. California State University, Northridge
Bauer S, Bunge HP, Drzisga D, et al. (2016) Hybrid parallel multigrid methods for geodynamical simulations. In: Software for Exascale Computing-SPPEXA 2013–2015. Springer, Cham, pp 211–235. https://doi.org/10.1007/978-3-319-40528-5_10
Assunção J, Sacek V (2017) Benchmark comparison study for mantle thermal convection using the CitcomCU numerical code. In: 15th International Congress of the Brazilian Geophysical Society and EXPOGEF, Rio de Janeiro, Brazil, 31 July-3. Brazilian Geophysical Society, pp 1630–1635. https://doi.org/10.1190/sbgf2017-317
Bauer S, Huber M, Ghelichkhan S et al (2019) Large-scale simulation of mantle convection based on a new matrix-free approach. J Comput Sci 31:60–76. https://doi.org/10.1016/j.jocs.2018.12.006
May D A, Sanan P, Rupp K, et al. Extreme-scale multigrid components within PETSc. In: Proceedings of the Platform for Advanced Scientific Computing Conference, pp 1–12. https://doi.org/10.1145/2929908.2929913
Bangerth W, Burstedde C, Heister T et al (2012) Algorithms and data structures for massively parallel generic adaptive finite element codes. ACM Trans Math Softw (TOMS) 38(2):1–28. https://doi.org/10.1145/2049673.2049678
Chen W, Dong X, Chen H et al (2021) Performance evaluation of convolutional neural network on Tianhe-3 prototype. J Supercomput 77(11):12647–12665. https://doi.org/10.1007/s11227-021-03759-8
Lu K, Wang Y, Guo Y, et al. (2022) MT-3000: a heterogeneous multi-zone processor for HPC. CCF Trans High Perform Comput. https://doi.org/10.1007/s42514-022-00095-y
Li J J, Li J, Yang Y, et al. (2022) A parallel ETD algorithm for large-scale rate theory simulation. J Supercomput. https://doi.org/10.1007/s11227-022-04434-2
Maccabe AB (2017) Operating and runtime systems challenges for HPC systems. In: Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017, p 1. https://doi.org/10.1145/3095770.3095771
Weng T, Zhou X, Li K, Peng P, Li K (2022) Efficient distributed approaches to core maintenance on large dynamic graphs. IEEE Trans Parallel Distrib Syst 33(1):129–143. https://doi.org/10.1109/TPDS.2021.3090759
Zhao T, Hall M, Johansen H, et al. (2021) Improving communication by optimizing on-node data movement with data layout. In: Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp 304–317. https://doi.org/10.1145/3437801.3441598
Weng T, Zhou X, Li K, Tan K-L, Li K (2023) Distributed approaches to butterfly analysis on large dynamic bipartite graphs. IEEE Trans Parallel Distrib Syst 34(2):431–445. https://doi.org/10.1109/TPDS.2022.3221821
Žaloudek L, Sekanina L (2011) Increasing fault-tolerance in cellular automata-based systems. In: Calude CS, Kari J, Petre I, Rozenberg G (eds) Unconventional Computation. UC 2011. Lecture Notes in Computer Science, vol 6714. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21341-0_26
Zhong S, Yuen D, Moresi L (2007) Numerical methods for mantle convection. In: Treatise on geophysics, vol 7. Elsevier, pp 227–252. https://doi.org/10.1016/B978-044452748-6.00118-8
Song K, Li W, Zhang B, et al. (2022) Parallel design and implementation of Jacobi iterative algorithm based on ternary optical computer. J Supercomput. https://doi.org/10.1007/s11227-022-04471-x
Zhang K, Ding L, Cai Y, et al. (2017) A high performance real-time edge detection system with NEON. In: 2017 IEEE 12th International Conference on ASIC (ASICON). IEEE, pp 847–850. https://doi.org/10.1109/ASICON.2017.8252609
Chen X, Gao Y, Shang H et al (2022) Increasing the efficiency of massively parallel sparse matrix-matrix multiplication in first-principles calculation on the new-generation Sunway supercomputer. IEEE Trans Parallel Distrib Syst 33(12):4752–4766. https://doi.org/10.1109/TPDS.2022.3202518
Funding
The research was partially funded by Key-Area Research and Development Program of Guangdong Province (2021B0101190004), the National Key R &D Program of China (Grant Nos. 2021YFB0300800), The Key Program of National Natural Science Foundation of China (Grant Nos. U21A20461, 92055213), The National Natural Science Foundation of China (Grant No. 61872127), Research on High Precision Numerical Simulation and Parallel Computing Method for Ion Implanted Silicon Carbide Semiconductor Doping Process (U21A20461).
Author information
Authors and Affiliations
Contributions
The paper properly credits the meaningful contributions of all authors. All authors have been personally and actively involved in substantial work leading to the paper and will take public responsibility for its content.
Corresponding author
Ethics declarations
Conflict of interest
This material is the authors’ own original work, which has not been previously published elsewhere. The paper is not currently being considered for publication elsewhere. The paper reflects the authors’ own research and analysis in a truthful and complete manner.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Appendix
Appendix A: Appendix
1.1 A.1. Noun introduction table
See Appendix Table 5.
1.2 A.2. Input File
# 1. Input and Output Files Information
datafile=“CASE2/caseA”
use_scratch="local"
oldfile=“CASE2/caseA”
restart=0
restart_timesteps=20000
stokes_flow_only=0
maxstep=1000
storage_spacing=50
# 2. Geometry, Ra numbers, Internal heating, Thermochemical/Purely thermal convection
Solver=multigrid node_assemble=1
rayleigh=10.97394e5
rayleigh_comp=1e6
composition=0
Q0=0
Q0_enriched=0
markers_per_ele=15
comp_depth=0.605
visc_heating=0
adi_heating=0
# 3. Grid And Multiprocessor Information
nprocx=16
nprocz=16
nprocy=8
nodex=33 nodez=33 nodey=33
mgunitx=32
mgunitz=32
mgunity=16
levels=3
# 4. Coordinate Information
Geometry=cart3d
dimenx=1.0
dimenz=1.0
dimeny=1.0
z_grid_layers=2
zz=0.0,1.0
nz=1,129
x_grid_layers=2
xx=0,1
nx=1,129
y_grid_layers=2
yy=0,1
ny=1,65
z_lmantle=0.76655052
z_410=0.857143
z_lith=0.9651568
# 5. Rheology
rheol=0
TDEPV=off
VISC_UPDATE=off
update_every_steps=2
num_mat=4
visc0=1.0e0,1.0e0,1.0e0,1.0e0
viscE=6.9077553,6.9077553,6.9077553,6.9077553
viscT=273,273,273,273
viscZ=5e-6,5e-6,5e-6,5e-6
SDEPV=off
sdepv_misfit=0.010
sdepv_expt=1,1,1,1
sdepv_trns=1.e0,1.e0,1.e0,1.e0
VMIN=on visc_min=5.0e-2
VMAX=on visc_max=2.0e04
visc_smooth_cycles=1
Viscosity=system
# 6. DIMENSIONAL INFORMATION and Depth-dependence
layerd=2870000.0
radius=6370000.0
ReferenceT=3800.0
refvisc=1.0e20
density=3300.0
thermdiff=1.0e-6
gravacc=9.8
thermexp=5e-5
cp=1250
wdensity=0.0
visc_factor=1.0
thermexp_factor=1.0
thermdiff_factor=1.00
dissipation_number=2.601
surf_temp=0.078947
# 7. phase changes: to turn off any of the phase changes, let Ra_XXX=0
Ra_410=0.0
Ra_670=0.0
clapeyron410=3.0e6
clapeyron670=-3.0e6
width410=3.5e4
width670=3.5e4
# 8. BOUNDARY CONDITIONS and Initial perturbations
topvbc=0
topvbxval=0.0
topvbyval=0.0
botvbc=0
botvbxval=0.0
botvbyval=0.0
toptbc=1 bottbc=1
toptbcval=0.0 bottbcval=1.0
periodicx=off
periodicy=off
flowthroughx=off
flowthroughy=off
num_perturbations=1
perturbmag=0.001
perturbk=1.0
perturbl=6.0
perturbm=0.0
# 9. SOLVER RELATED MATTERS
Problem=convection
aug_lagr=on
aug_number=1.0e3
precond=on
orthogonal=off
maxsub=1
viterations=2
mg_cycle=1
down_heavy=3
up_heavy=3
vlowstep=20
vhighstep=3
piterations=375
accuracy=1.0e-2
tole_compressibility=1e-7
# Tuning of energy equation
adv_sub_iterations=2
finetunedt=0.75
ll_max=20
nlong=180
nlati=90
# Data input and program debugging
DESCRIBE=off
BEGINNER=off
VERBOSE=off
verbose=off
COMPRESS=off
see_convergence=1
# vim:ts=8:sw=8
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, J., Yang, W., Qi, R. et al. Parallel algorithm design and optimization of geodynamic numerical simulation application on the Tianhe new-generation high-performance computer. J Supercomput 80, 331–362 (2024). https://doi.org/10.1007/s11227-023-05469-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-023-05469-9