skip to main content
10.1145/3659914.3659936acmconferencesArticle/Chapter ViewAbstractPublication PagespascConference Proceedingsconference-collections
research-article
Open access

Leveraging the High Bandwidth of Last-Level Cache for HPC Seismic Imaging Applications

Published: 03 June 2024 Publication History

Abstract

We solve the 3D acoustic wave equation using the finite-difference time-domain (FDTD) formulation in both first and second order. The FDTD approach is expressed as a stencil-based computational scheme with a long-range discretization, i.e., 8th order in space and 2nd order in time, which is routinely used in the oil and gas industry and environmental geophysics for high subsurface imaging fidelity purposes. Absorbing Boundary Conditions (ABCs) are employed to attenuate reflections from artificial boundaries. The high order discretization engenders extensive data movement across the memory subsystem and may consequently impact the kernel throughput due to the inherent memory-bound behavior of the stencil operator, especially on systems facing memory starvation. The first-order formulation of the 3D acoustic equation further exacerbates this phenomenon because it calculates both the pressure and velocity fields, which corresponds to 1.6X the memory footprint of the second-order formulation. To address this memory bottleneck, we design, implement, and deploy the multicore wavefront diamond tiling with temporal blocking (MWD-TB) to boost the performance of seismic wavefield modeling by exploiting spatial and temporal data reuse. MWD-TB leverages the large capacity of last-level cache (LLC) of modern x86 systems and extracts high bandwidth memory from the underlying architecture. We demonstrate the numerical accuracy of MWD-TB on the Salt3D model from the Society of Exploration Geophysicists (SEG). Our MWD-TB implementations for the first- and second-order FDTD formulations achieve speedups of up to 3.5X and 3X on a large grid size on AMD systems equipped with large LLC, respectively, compared to the traditional spatial blocking method alone.

References

[1]
Òscar Calderón Agudo, Nuno Vieira da Silva, George Stronge, and Michael Warner. 2019. Mitigating elastic effects in marine 3-D full-waveform inversion. Geophysical Journal International 220, 3 (12 2019), 2089--2104. arXiv:https://academic.oup.com/gji/article-pdf/220/3/2089/31803565/ggz569.pdf
[2]
Kadir Akbudak, Hatem Ltaief, Vincent Etienne, Rached Abdelkhalak, Thierry Tonellot, and David Keyes. 2020. Asynchronous Computations for Solving the Acoustic Wave Propagation Equation. The International Journal of High Performance Computing Applications 34, 4 (2020), 377--393.
[3]
R. M. Alford, K. R. Kelly, and D. M. Boore. 1974. Accuracy of Finite-difference modeling of the Acoustic Wave Equation. Geophysics 39, 6 (1974), 834--842. arXiv:https://doi.org/10.1190/1.1440470
[4]
Vinayaka Bandishti, Irshad Pananilath, and Uday Bondhugula. 2012. Tiling stencil computations to maximize parallelism. In SC '12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE, East 79th St., New York City, NY, USA, 1--11.
[5]
E. Baysal, D.D. Kosloff, and J.W.C. Sherwood. 1983. Reverse time migration. Geophysics 48 (1983), 1514--1524.
[6]
U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. 2008. A practical automatic polyhedral parallelizer and locality optimizer. ACM SIGPLAN Notices 43, 6 (2008), 101--113.
[7]
WEN-FONG Chang and GEORGE A. McMechan. 1990. 3D Acoustic Prestack Reverse-Time Migration. Geophysical Prospecting 38, 7 (1990), 737--755. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1365-2478.1990.tb01872.x
[8]
M. Christen, O. Schenk, and H. Burkhart. 2011. PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures. In International Parallel and Distributed Processing Symposium. IEEE, Washington, DC, USA, 676--687.
[9]
SANTA CLARA. 2022. 3rd Gen AMD EPYC Processors with AMD 3D V-Cache Technology Deliver Outstanding Leadership Performance in Technical Computing Workloads. https://www.amd.com/en/press-releases/2022-03-21-3rd-gen-amd-epyc-processors-amd-3d-v-cache-technology-deliver-outstanding.
[10]
K. Datta. 2009. Auto-tuning Stencil Codes for Cache-Based Multicore Platforms. Ph.D. Dissertation. EECS Department, University of California, Berkeley.
[11]
V. Etienne, T. Tonellot, T. Malas, H. Ltaief, S. Kortas, P. Thierry, and D. Keyes. 2017. High-Performance Seismic Modeling with Finite-Difference Using Spatial and Temporal Cache Blocking. In High-Performance Seismic Modeling with Finite-Difference Using Spatial and Temporal Cache Blocking. European Association of Geoscientists and Engineers, Kosterijland 48 3981 AJ Bunnik The Netherlands, 1--5.
[12]
Vincent Etienne*, Thierry Tonellot, Philippe Thierry, Vincent Berthoumieux, and Cedric Andreolli. 2014. Optimization of the seismic modeling with the timedomain finite-difference method. Society of Exploration Geophysicists, 10300 Town Park Dr. Ste SE 1000. Houston, TX 77072, 3536--3540. arXiv:https://library.seg.org/doi/pdf/10.1190/segam2014-0176.1
[13]
M. Frigo and V. Strumpen. 2005. Cache Oblivious Stencil Computations. In 19th Int. Conf. on Supercomputing (Cambridge, Massachusetts). ACM, New York, NY, USA, 361--366.
[14]
Tobias Grosser, Albert Cohen, Justin Holewinski, P. Sadayappan, and Sven Verdoolaege. 2014. Hybrid Hexagonal/Classical Tiling for GPUs. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization (Orlando, FL, USA) (CGO '14). Association for Computing Machinery, New York, NY, USA, 66--75.
[15]
Tobias Grosser, Sven Verdoolaege, Albert Cohen, and P. Sadayappan. 2014. The Relation Between Diamond Tiling and Hexagonal Tiling. Parallel Processing Letters 24, 03 (2014), 1441002. arXiv:https://doi.org/10.1142/S0129626414410023
[16]
Tom Henretty, Richard Veras, Franz Franchetti, Louis-Noël Pouchet, J. Ramanujam, and P. Sadayappan. 2013. A stencil compiler for short-vector SIMD architectures. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing (Eugene, Oregon, USA) (ICS '13). Association for Computing Machinery, New York, NY, USA, 13--24.
[17]
J. Holewinski, L. N. Pouchet, and P. Sadayappan. 2012. High-performance Code Generation for Stencil Computations on GPU Architectures. In 26th ACM Int. Conf. on Supercomputing (San Servolo Island, Venice, Italy). ACM, New York, NY, USA, 311--320.
[18]
David Imbert, Khadija Imadoueddine, Philippe Thierry, Hervé Chauris, and Leonardo Borges. 2012. Tips and tricks for finite difference and i/o-less FWI. Society of Exploration Geophysicists, 10300 Town Park Dr. Ste SE 1000. Houston, TX 77072, 3174--3178. arXiv:https://library.seg.org/doi/pdf/10.1190/1.3627855
[19]
M. Louboutin, M. Lange, F. Luporini, N. Kukreja, P. A. Witte, F. J. Herrmann, P. Velesko, and G. J. Gorman. 2019. Devito (v3.1.0): an Embedded Domain-specific Language for Finite Differences and Geophysical Exploration. Geoscientific Model Development 12, 3 (2019), 1165--1187.
[20]
H. Ltaief, J. Cranney, D. Gratadour, Y. Hong, L. Gatineau, and D. Keyes. 2021. Meeting the Real-Time Challenges of Ground-Based Telescopes Using Low-Rank Matrix Computations. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (St. Louis, Missouri). Association for Computing Machinery, New York, NY, USA, Article 29, 16 pages.
[21]
Fabio Luporini, Mathias Louboutin, Michael Lange, Navjot Kukreja, Philipp Witte, Jan Hückelheim, Charles Yount, Paul H. J. Kelly, Felix J. Herrmann, and Gerard J. Gorman. 2020. Architecture and Performance of Devito, a System for Automated Stencil Computation. ACM Trans. Math. Softw. 46, 1, Article 6 (April 2020), 28 pages.
[22]
T. Malas, G. Hager, H. Ltaief, and D. E. Keyes. 2017. Multidimensional Intratile Parallelization for Memory-Starved Stencil Computations. ACM Trans. Parallel Comput. 4, 3, Article 12 (Dec. 2017), 32 pages.
[23]
T. Malas, G. Hager, H. Ltaief, H. Stengel, G. Wellein, and D. Keyes. 2015. Multicore Optimized Wavefront Diamond Blocking for Optimizing Stencil Updates. SIAM Journal on Scientific Computing 37, 4 (2015), 439--464.
[24]
Tareq M. Malas. 2015. GIRIH stencil optimization framework. https://github.com/ecrc/girih.
[25]
Tareq M. Malas, Julian Hornich, Georg Hager, Hatem Ltaief, Christoph Pflaum, and David E. Keyes. 2016. Optimization of an Electromagnetics Code with Multicore Wavefront Diamond Blocking and Multi-dimensional Intra-Tile Parallelization. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 1 East 79th St., New York City, NY, USA, 142--151.
[26]
G. A. McMechan. 1983. Migration by Extrapolation of Time-dependent Boundary Values*. Geophysical Prospecting 31, 3 (1983), 413--420.
[27]
Anthony Nguyen, Nadathur Satish, Jatin Chhugani, Changkyu Kim, and Pradeep Dubey. 2010. 3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs. In SC '10: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1 East 79th St., New York City, NY, USA, 1--13.
[28]
Can Oren and Jeffrey Shragge. 2022. Image-domain DAS 3D VSP elastic transmission tomography. Geophysical Journal International 232, 3 (10 2022), 1914--1925. arXiv:https://academic.oup.com/gji/article-pdf/232/3/1914/47107794/ggac427.pdf
[29]
D. Orozco and G. Gao. 2009. Diamond tiling: A tiling framework for time-iterated scientific applications. Technical Report. CAPSL Technical Memo 091.
[30]
Daniel Orozco, Elkin Garcia, and Guang Gao. 2011. Locality Optimization of Stencil Applications Using Data Dependency Graphs. In Languages and Compilers for Parallel Computing, Keith Cooper, John Mellor-Crummey, and Vivek Sarkar (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 77--91.
[31]
Long Qu, Rached Abdelkhalak, Hatem Ltaief, Issam Said, and David Keyes. 2023. Exploiting temporal data reuse and asynchrony in the reverse time migration. The International Journal of High Performance Computing Applications 37, 2 (2023), 132--150. arXiv:https://doi.org/10.1177/10943420221128529
[32]
Alexandre C. Sena, Aline P. Nascimento, Cristina Boeres, Vinod Rebello, and Andre Bulcao. 2011. An Approach to Optimise the Execution of RTM Algorithm in Multicore Machines. In 2011 IEEE Seventh International Conference on eScience. IEEE, 1 East 79th St., New York City, NY, USA, 403--410.
[33]
Society of Petroleum Engineers 2019. Application of High Performance Asynchronous Acoustic Wave Equation Stencil Solver into a Land Survey. SPE Middle East Oil and Gas Show and Conference, Vol. Day 3 Wed, March 20, 2019. Society of Petroleum Engineers. arXiv:https://onepetro.org/SPEMEOS/proceedings-pdf/19MEOS/3-19MEOS/D032S082R003/1144416/spe-194722-ms.pdf
[34]
Robert Strzodka, Mohammed Shaheen, Dawid Pajak, and Hans-Peter Seidel. 2011. Cache Accurate Time Skewing in Iterative Stencil Computations. In 2011 International Conference on Parallel Processing. IEEE Computer Society, 1 East 79th St., New York City, NY, USA, 571--581.
[35]
Y. Tang, R. A. Chowdhury, B. C. Kuszmaul, C.-K. Luk, and C. E. Leiserson. 2011. The Pochoir Stencil Compiler. In 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures (San Jose, California, USA). ACM, New York, NY, USA, 117--128.
[36]
Xiao Bo Tian, Ik Bum Kang, Geun Young Kim, and Hong Shuang Zhang. 2008. An Improvement in the Absorbing Boundary Technique for Numerical Simulation of Elastic Wave Propagation. Journal of Geophysics and Engineering 5 (June 2008), 203--209. Issue 2.
[37]
Xiao Bo Tian, Ik Bum Kang, Geun Young Kim, and Hong Shuang Zhang. 2008. An improvement in the absorbing boundary technique for numerical simulation of elastic wave propagation. Journal of Geophysics and Engineering 5, 2 (05 2008), 203--209. arXiv:https://academic.oup.com/jge/article-pdf/5/2/203/26797852/jge8_2_007.pdf
[38]
S. Titarenko and M. Hildyard. 2017. Hybrid multicore/vectorisation technique applied to the elastic wave equation on a staggered grid. Computer Physics Communications 216 (2017), 53--62.
[39]
Jean Virieux and Stéphane Operto. 2009. An Overview of Full-Waveform Inversion in Exploration Geophysics. Geophysics 74, 6 (2009), WCC1--WCC26.
[40]
G. Wellein, G. Hager, T. Zeiser, M. Wittmann, and H. Fehske. 2009. Efficient Temporal Blocking for Stencil Computations by Multicore-Aware Wavefront Parallelization. In 33rd Annual IEEE Int. Computer Software and Applications Conference, Vol. 1. IEEE, 1 East 79th St., New York City, NY, USA, 579--586.
[41]
N. D. Whitmore. 2005. Iterative depth migration by backward time propagation. Society of Exploration Geophysicists, 10300 Town Park Dr. Ste SE 1000. Houston, TX 77072., 382--385. arXiv:https://library.seg.org/doi/pdf/10.1190/1.1893867
[42]
D. Wonnacott. 2000. Using time skewing to eliminate idle time due to memory bandwidth and network limitations. In Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000. 171--180.
[43]
D. G. Wonnacott and M. M. Strout. 2013. On the Scalability of Loop Tiling Techniques. In Proceedings of the 3rd International Workshop on Polyhedral Compilation Techniques. Berlin, 3--11.
[44]
Kwangjin Yoon and Kurt J. Marfurt. 2006. Reverse-Time Migration using the Poynting Vector. Exploration Geophysics 37, 1 (2006), 102--107. arXiv:https://doi.org/10.1071/EG06102
[45]
Liang Yuan, Yunquan Zhang, Peng Guo, and Shan Huang. 2017. Tessellating Stencils. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Denver, Colorado) (SC '17). ACM, New York, NY, USA, Article 49, 13 pages.
[46]
Wensheng Zhang and Jia Luo. 2013. Full-waveform Velocity Inversion Based on the Acoustic Wave Equation. American Journal of Computational Mathematics 03 (01 2013), 13--20.
[47]
Tuowen Zhao, Samuel Williams, Mary Hall, and Hans Johansen. 2018. Delivering Performance-Portable Stencil Computations on CPUs and GPUs Using Bricks. In 2018 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC). IEEE, 1 East 79th St., New York City, NY, USA, 59--70.
[48]
X. Zhou. 2013. Tiling optimizations for stencil computations. Ph. D. Dissertation. University of Illinois at Urbana-Champaign.

Index Terms

  1. Leveraging the High Bandwidth of Last-Level Cache for HPC Seismic Imaging Applications

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      PASC '24: Proceedings of the Platform for Advanced Scientific Computing Conference
      June 2024
      296 pages
      ISBN:9798400706394
      DOI:10.1145/3659914
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 03 June 2024

      Check for updates

      Author Tags

      1. seismic modeling
      2. 3D acoustic wave equation
      3. data reuse
      4. spatial/temporal blocking
      5. wavefront parallelism
      6. large cache capacity
      7. oil and gas exploration
      8. environmental geophysics

      Qualifiers

      • Research-article

      Conference

      PASC '24
      Sponsor:

      Acceptance Rates

      PASC '24 Paper Acceptance Rate 26 of 36 submissions, 72%;
      Overall Acceptance Rate 109 of 221 submissions, 49%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 212
        Total Downloads
      • Downloads (Last 12 months)212
      • Downloads (Last 6 weeks)55
      Reflects downloads up to 06 Jan 2025

      Other Metrics

      Citations

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media