Accelerating Seismic Simulations Using the Intel Xeon Phi Knights Landing Processor

Tobin, Josh; Breuer, Alexander; Heinecke, Alexander; Yount, Charles; Cui, Yifeng

doi:10.1007/978-3-319-58667-0_8

Josh Tobin¹⁹,
Alexander Breuer¹⁹,
Alexander Heinecke²⁰,
Charles Yount²⁰ &
…
Yifeng Cui¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10266))

Included in the following conference series:

International Conference on High Performance Computing

2251 Accesses
5 Citations

Abstract

In this work we present AWP-ODC-OS, an end-to-end optimization of AWP-ODC targeting homogeneous, manycore supercomputers. AWP-ODC is an established community software package simulating seismic wave propagation using a staggered finite difference scheme which is fourth order accurate in space and second order in time. Recent production simulations, e.g. using the software for the computation of seismic hazard maps, largely relied on GPU accelerated supercomputers. In contrast, our work gives a comprehensive overview of the required steps to achieve near-optimal performance on the Intel^® Xeon Phi^TM x200 processor (code-named Knights Landing), and compares our competitive performance results to the most recent GPU architectures.

At the level of a single vector operation, we apply the vector folding technique to AWP-ODC-OS, yielding a 1.6\(\times \) performance increase over traditional vectorization. Further, we present a novel strategy utilizing both DDR4 RAM and High Bandwidth Memory, increasing the maximum problem size by 26% while still operating at maximum performance. The presented shared and distributed parallelization carefully schedules work to the cores and ensures overlapping communication and computation. We conclude with a detailed study of AWP-ODC-OS’s full-application performance on the Intel Xeon Phi x200 processor, achieving up to 98.5% of the most recent P100 GPU generation’s performance. Additionally, our weak scaling study on up to 9,000 nodes of the supercomputer Cori Phase II achieves a parallel efficiency of greater than 91%, equivalent to the performance of over twenty thousand NVIDIA Tesla K20X GPUs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Notes

References

Aochi, H., Ulrich, T., Ducellier, A., Dupros, F., Michea, D.: Finite difference simulations of seismic wave propagation for understanding earthquake physics and predicting ground motions: advances and challenges. 454(1), 012010 (2013)
Google Scholar
Cerjan, C., Kosloff, D., Kosloff, R., Reshef, M.: A nonreflecting boundary condition for discrete acoustic and elastic wave equations. Geophysics 50(4), 705–708 (1985)
Article Google Scholar
Chaljub, E., Maufroy, E., Moczo, P., Kristek, J., Hollender, F., Bard, P.-Y., Priolo, E., Klin, P., de Martin, F., Zhang, Z., Zhang, W., Chen, X.: 3-D numerical simulations of earthquake ground motion in sedimentary basins: testing accuracy through stringent models 201(1), 90–111 (2015)
Google Scholar
Christen, M., Schenk, O., Cui, Y.: Patus for convenient high-performance stencils: evaluation in earthquake simulations. In: 2012 International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 1–10. IEEE (2012)
Google Scholar
Cui, Y., Poyraz, E., Zhou, J., Callaghan, S., Maechling, P., Jordan, T.H., Shih, L., Chen, P.: Accelerating cybershake calculations on the XE6/XK7 platform of blue waters. In: 2013 Extreme Scaling Workshop (XSW 2013), pp. 8–17. IEEE (2013)
Google Scholar
Cui, Y., Olsen, K.B., Jordan, T.H., Lee, K., Zhou, J., Small, P., Roten, D., et al.: Scalable earthquake simulation on petascale supercomputers. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–20. IEEE Computer Society (2010)
Google Scholar
Cui, Y., Poyraz, E., Olsen, K.B., Zhou, J., Withers, K., Callaghan, S., Larkin, J., Guest, C., Choi, D., Chourasia, A., et al.: Physics-based seismic hazard analysis on petascale heterogeneous supercomputers. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 70. ACM (2013)
Google Scholar
Dalguer, L.A., and Day, S.M.: Staggered grid split node method for spontaneous rupture simulation. J. Geophys. Res.: Solid (2007)
Google Scholar
Day, S.M.: Efficient simulation of constant Q using coarse-grained memory variables. Bull. Seismol. Soc. Am. 88(4), 1051–1062 (1998)
Google Scholar
Day, S.M., Bradley, C.R.: Memory-efficient simulation of anelastic wave propagation. Bull. Seismol. Soc. Am. 91(3), 520–531 (2001)
Article Google Scholar
Dongarra, J.: Toward a new metric for ranking high performance computing systems. Sandia Report, SAND2013-4744, p. 312 (2013)
Google Scholar
Duru, K., Dunham, E.M.: Dynamic earthquake rupture simulations on nonplanar faults embedded in 3D geometrically complex, heterogeneous elastic solids 305, 185–207 (2016)
Google Scholar
Gottschämmer, E., Olsen, K.B.: Accuracy of the explicit planar free-surface boundary condition implemented in a fourth-order staggered-grid velocity-stress finite-difference scheme. Bull. Seismol. Soc. Am. 91(3), 617–623 (2001)
Article Google Scholar
Kristek, J., Moczo, P.: Seismic-wave propagation in viscoelastic media with material discontinuities: a 3D fourth-order staggered-grid finite-difference modeling 93(5), 2273–2280 (2003)
Google Scholar
Cruz-Atienza, V.M., Virieux, J., Aochi, H.: 3D finite-difference dynamic-rupture modeling along nonplanar faults 72(5), SM123–SM137 (2007)
Google Scholar
Madariaga, R., Olsen, K., Archuleta, R.: Modeling dynamic rupture in a 3D earthquake fault model. Bull. Seismol. (1998)
Google Scholar
McCalpin, J.D.: A survey of memory bandwidth and machine balance in current high performance computers. IEEE TCCA Newsl. 19–25 (1995)
Google Scholar
Moczo, P., Robertsson, J., Eisner, L.: The finite-difference time-domain method for modeling of seismic wave propagation 48 (2007)
Google Scholar
Olsen, K.B., Day, S.M., Minster, J.B., Cui, Y., Chourasia, A., Faerman, M., Moore, R., Maechling, P., Jordan, T.: Strong shaking in Los Angeles expected from southern San Andreas earthquake. Geophys. Res. Lett. 33(7) (2006)
Google Scholar
Olsen, K.B., Stephenson, W.J., Geisselmeyer, A.: 3D crustal structure and long-period ground motions from a M9.0 megathrust earthquake in the pacific northwest region. J. Seismolog. 12(2), 145–159 (2008)
Article Google Scholar
Olsen, K.B.: Simulation of three-dimensional wave propagation in the Salt Lake Basin. Ph.D. thesis, University of Utah (1994)
Google Scholar
Roten, D., Cui, Y., Olsen, K.B., Day, S.M., Withers, K., Savran, W.H., Wang, P., Mu, D.: High-frequency nonlinear earthquake simulations on petascale heterogeneous supercomputers. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, p. 82. IEEE Press (2016)
Google Scholar
Withers, K.B., Olsen, K.B., Day, S.M.: Memory-efficient simulation of frequency-dependent Q. Bull. Seismol. Soc. Am. (2015)
Google Scholar
Yount, C.: Vector folding: improving stencil performance via multi-dimensional simd-vector representation. In: High Performance Computing and Communications (HPCC), pp. 865–870, August 2015
Google Scholar
Yount, C., Duran, A.: Effective use of large high-bandwidth memory caches in HPC stencil computation via temporal wave-front tiling. In: Proceedings of the 7th International Workshop in Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems Held as Part of ACM/IEEE Supercomputing 2016 (SC 2016), PMBS 2016, November 2016
Google Scholar
Yount, C., Tobin, J., Breuer, A., Duran, A.: Yask-yet another stencil kernel: a framework for HPC stencil code-generation and tuning. In: Proceedings of the 6th International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing Held as Part of ACM/IEEE Supercomputing 2016 (SC 2016), WOLFHPC 2016, November 2016
Google Scholar

Download references

Acknowledgements

We acknowledge support from the National Energy Research Scientific Computing Center (NERSC) for access to the Cori supercomputer. We acknowledge the Texas Advanced Computing Center (TACC) at The University of Texas at Austin for providing HPC resources that have contributed to the research results reported within this paper. We thank Jack Deslippe, Sudip Dosanjh and Richard Gerber at NERSC and Lars Koesterke, Tommy Minyard and Dan Stanzione at TACC. The UCSD team was supported by NSF awards ACI-1450451, EAR-1349180, EAR-1135455 and Keck Foundation grant 005590-00001.

Optimization Notice: Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance. Intel, Xeon, and Intel Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries.

Author information

Authors and Affiliations

University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA
Josh Tobin, Alexander Breuer & Yifeng Cui
Intel Corporation, 2200 Mission College Blvd., Santa Clara, CA, 95054, USA
Alexander Heinecke & Charles Yount

Authors

Josh Tobin
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Breuer
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Heinecke
View author publications
You can also search for this author in PubMed Google Scholar
Charles Yount
View author publications
You can also search for this author in PubMed Google Scholar
Yifeng Cui
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Josh Tobin .

Editor information

Editors and Affiliations

Deutsches Klimarechenzentrum (DKRZ), Hamburg, Germany
Julian M. Kunkel
Tokyo Institute of Technology, Tokyo, Japan
Rio Yokota
Argonne National Laboratory, Argonne, IL, USA
Pavan Balaji
KAUST, Thuwal, Saudi Arabia
David Keyes

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tobin, J., Breuer, A., Heinecke, A., Yount, C., Cui, Y. (2017). Accelerating Seismic Simulations Using the Intel Xeon Phi Knights Landing Processor. In: Kunkel, J.M., Yokota, R., Balaji, P., Keyes, D. (eds) High Performance Computing. ISC High Performance 2017. Lecture Notes in Computer Science(), vol 10266. Springer, Cham. https://doi.org/10.1007/978-3-319-58667-0_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-58667-0_8
Published: 12 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-58666-3
Online ISBN: 978-3-319-58667-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics