Abstract
In this work we present AWP-ODC-OS, an end-to-end optimization of AWP-ODC targeting homogeneous, manycore supercomputers. AWP-ODC is an established community software package simulating seismic wave propagation using a staggered finite difference scheme which is fourth order accurate in space and second order in time. Recent production simulations, e.g. using the software for the computation of seismic hazard maps, largely relied on GPU accelerated supercomputers. In contrast, our work gives a comprehensive overview of the required steps to achieve near-optimal performance on the Intel® Xeon PhiTM x200 processor (code-named Knights Landing), and compares our competitive performance results to the most recent GPU architectures.
At the level of a single vector operation, we apply the vector folding technique to AWP-ODC-OS, yielding a 1.6\(\times \) performance increase over traditional vectorization. Further, we present a novel strategy utilizing both DDR4 RAM and High Bandwidth Memory, increasing the maximum problem size by 26% while still operating at maximum performance. The presented shared and distributed parallelization carefully schedules work to the cores and ensures overlapping communication and computation. We conclude with a detailed study of AWP-ODC-OS’s full-application performance on the Intel Xeon Phi x200 processor, achieving up to 98.5% of the most recent P100 GPU generation’s performance. Additionally, our weak scaling study on up to 9,000 nodes of the supercomputer Cori Phase II achieves a parallel efficiency of greater than 91%, equivalent to the performance of over twenty thousand NVIDIA Tesla K20X GPUs.
References
Aochi, H., Ulrich, T., Ducellier, A., Dupros, F., Michea, D.: Finite difference simulations of seismic wave propagation for understanding earthquake physics and predicting ground motions: advances and challenges. 454(1), 012010 (2013)
Cerjan, C., Kosloff, D., Kosloff, R., Reshef, M.: A nonreflecting boundary condition for discrete acoustic and elastic wave equations. Geophysics 50(4), 705–708 (1985)
Chaljub, E., Maufroy, E., Moczo, P., Kristek, J., Hollender, F., Bard, P.-Y., Priolo, E., Klin, P., de Martin, F., Zhang, Z., Zhang, W., Chen, X.: 3-D numerical simulations of earthquake ground motion in sedimentary basins: testing accuracy through stringent models 201(1), 90–111 (2015)
Christen, M., Schenk, O., Cui, Y.: Patus for convenient high-performance stencils: evaluation in earthquake simulations. In: 2012 International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 1–10. IEEE (2012)
Cui, Y., Poyraz, E., Zhou, J., Callaghan, S., Maechling, P., Jordan, T.H., Shih, L., Chen, P.: Accelerating cybershake calculations on the XE6/XK7 platform of blue waters. In: 2013 Extreme Scaling Workshop (XSW 2013), pp. 8–17. IEEE (2013)
Cui, Y., Olsen, K.B., Jordan, T.H., Lee, K., Zhou, J., Small, P., Roten, D., et al.: Scalable earthquake simulation on petascale supercomputers. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–20. IEEE Computer Society (2010)
Cui, Y., Poyraz, E., Olsen, K.B., Zhou, J., Withers, K., Callaghan, S., Larkin, J., Guest, C., Choi, D., Chourasia, A., et al.: Physics-based seismic hazard analysis on petascale heterogeneous supercomputers. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 70. ACM (2013)
Dalguer, L.A., and Day, S.M.: Staggered grid split node method for spontaneous rupture simulation. J. Geophys. Res.: Solid (2007)
Day, S.M.: Efficient simulation of constant Q using coarse-grained memory variables. Bull. Seismol. Soc. Am. 88(4), 1051–1062 (1998)
Day, S.M., Bradley, C.R.: Memory-efficient simulation of anelastic wave propagation. Bull. Seismol. Soc. Am. 91(3), 520–531 (2001)
Dongarra, J.: Toward a new metric for ranking high performance computing systems. Sandia Report, SAND2013-4744, p. 312 (2013)
Duru, K., Dunham, E.M.: Dynamic earthquake rupture simulations on nonplanar faults embedded in 3D geometrically complex, heterogeneous elastic solids 305, 185–207 (2016)
Gottschämmer, E., Olsen, K.B.: Accuracy of the explicit planar free-surface boundary condition implemented in a fourth-order staggered-grid velocity-stress finite-difference scheme. Bull. Seismol. Soc. Am. 91(3), 617–623 (2001)
Kristek, J., Moczo, P.: Seismic-wave propagation in viscoelastic media with material discontinuities: a 3D fourth-order staggered-grid finite-difference modeling 93(5), 2273–2280 (2003)
Cruz-Atienza, V.M., Virieux, J., Aochi, H.: 3D finite-difference dynamic-rupture modeling along nonplanar faults 72(5), SM123–SM137 (2007)
Madariaga, R., Olsen, K., Archuleta, R.: Modeling dynamic rupture in a 3D earthquake fault model. Bull. Seismol. (1998)
McCalpin, J.D.: A survey of memory bandwidth and machine balance in current high performance computers. IEEE TCCA Newsl. 19–25 (1995)
Moczo, P., Robertsson, J., Eisner, L.: The finite-difference time-domain method for modeling of seismic wave propagation 48 (2007)
Olsen, K.B., Day, S.M., Minster, J.B., Cui, Y., Chourasia, A., Faerman, M., Moore, R., Maechling, P., Jordan, T.: Strong shaking in Los Angeles expected from southern San Andreas earthquake. Geophys. Res. Lett. 33(7) (2006)
Olsen, K.B., Stephenson, W.J., Geisselmeyer, A.: 3D crustal structure and long-period ground motions from a M9.0 megathrust earthquake in the pacific northwest region. J. Seismolog. 12(2), 145–159 (2008)
Olsen, K.B.: Simulation of three-dimensional wave propagation in the Salt Lake Basin. Ph.D. thesis, University of Utah (1994)
Roten, D., Cui, Y., Olsen, K.B., Day, S.M., Withers, K., Savran, W.H., Wang, P., Mu, D.: High-frequency nonlinear earthquake simulations on petascale heterogeneous supercomputers. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, p. 82. IEEE Press (2016)
Withers, K.B., Olsen, K.B., Day, S.M.: Memory-efficient simulation of frequency-dependent Q. Bull. Seismol. Soc. Am. (2015)
Yount, C.: Vector folding: improving stencil performance via multi-dimensional simd-vector representation. In: High Performance Computing and Communications (HPCC), pp. 865–870, August 2015
Yount, C., Duran, A.: Effective use of large high-bandwidth memory caches in HPC stencil computation via temporal wave-front tiling. In: Proceedings of the 7th International Workshop in Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems Held as Part of ACM/IEEE Supercomputing 2016 (SC 2016), PMBS 2016, November 2016
Yount, C., Tobin, J., Breuer, A., Duran, A.: Yask-yet another stencil kernel: a framework for HPC stencil code-generation and tuning. In: Proceedings of the 6th International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing Held as Part of ACM/IEEE Supercomputing 2016 (SC 2016), WOLFHPC 2016, November 2016
Acknowledgements
We acknowledge support from the National Energy Research Scientific Computing Center (NERSC) for access to the Cori supercomputer. We acknowledge the Texas Advanced Computing Center (TACC) at The University of Texas at Austin for providing HPC resources that have contributed to the research results reported within this paper. We thank Jack Deslippe, Sudip Dosanjh and Richard Gerber at NERSC and Lars Koesterke, Tommy Minyard and Dan Stanzione at TACC. The UCSD team was supported by NSF awards ACI-1450451, EAR-1349180, EAR-1135455 and Keck Foundation grant 005590-00001.
Optimization Notice: Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance. Intel, Xeon, and Intel Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Tobin, J., Breuer, A., Heinecke, A., Yount, C., Cui, Y. (2017). Accelerating Seismic Simulations Using the Intel Xeon Phi Knights Landing Processor. In: Kunkel, J.M., Yokota, R., Balaji, P., Keyes, D. (eds) High Performance Computing. ISC High Performance 2017. Lecture Notes in Computer Science(), vol 10266. Springer, Cham. https://doi.org/10.1007/978-3-319-58667-0_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-58667-0_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-58666-3
Online ISBN: 978-3-319-58667-0
eBook Packages: Computer ScienceComputer Science (R0)