skip to main content
10.1145/3149457.3149478acmotherconferencesArticle/Chapter ViewAbstractPublication PageshpcasiaConference Proceedingsconference-collections
research-article

Time-space tiling with tile-level parallelism for the 3D FDTD method

Published: 28 January 2018 Publication History

Abstract

Our aim in this work is to improve the performance of the multi-threaded 3D FDTD solver using time-space tiling techniques that enable tile-level parallelization. The implementation of tile-level parallelization that we have used is based on the so-called diamond tiling technique. In this paper, we present a systematic manner for introducing time-space tiling techniques into the 3D FDTD solver and compare four different approaches. Our performance evaluation on a state-of-the-art multi-core processor demonstrated the effectiveness of the time-space tiling techniques with tile-level parallelism for the 3D FDTD method. For the problem with 2003 grid points, our implementation with two-dimensional tile-level parallelism achieved a speedup of 1.88 times over the naive implementation, while for the problem of 3003 grid points, our implementation with one-dimensional tile-level parallelism showed a speedup of 2.22 times. Both results are better than the speedup obtained from an implementation with intra-tile parallelization presented in a previous work.

References

[1]
U. Bondhugula, V. Bandishti, and I. Pananilath. 2017. Diamond Tiling: Tiling Techniques to Maximize Parallelism for Stencil Computations. IEEE Trans. Parallel Distrib. Syst. 28, 5 (2017), 1285--1298.
[2]
Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. 2008. A practical automatic polyhedral parallelizer and locality optimizer. ACS SIGPLAN Notices 43, 6 (2008), 101--113.
[3]
K. C. Chew and V. F. Fusco. 1995. A parallel implementation of the finite difference time-domain algorithm. International Journal of Numerical Modelling: Electronic Networks, Devices and Fields 8, 3--4 (1995), 293--299.
[4]
Tobias Grosser, Albert Cohen, Justin Holewinski, P. Sadayappan, and Sven Verdoolaege. 2014. Hybrid Hexagonal/Classical Tiling for GPUs. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO '14). ACM, 66:66--66:75.
[5]
Tom Henretty, Richard Veras, Franz Franchetti, Louis-Noel Pouchet, J. Ramanujam, and P. Sadayappan. 2013. A Stencil Compiler for Short-vector SIMD Architectures. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing (ICS '13). ACM, 13--24.
[6]
Junwei Lu, D. Thiel, and S. Saario. 2002. FDTD analysis of dielectric-embedded electronically switched multiple-beam (DEESMB) antenna array. IEEE Trans. Magn. 38, 2 (2002), 701--704.
[7]
Tareq Malas, Georg Hager, Hatem Ltaief, Holger Stengel, Gerhard Wellein, and David Keyes. 2015. Multicore-optimized wave-front diamond blocking for optimizing stencil updates. SIAM J. Sci. Comput. 37, 4 (2015), C439--C464.
[8]
Naoya Maruyama, Tatsuo Nomura, Kento Sato, and Satoshi Matsuoka. 2011. Physis: An Implicitly Parallel Programming Model for Stencil Computations on Large-scale GPU-accelerated Supercomputers. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC '11). ACM, 11:1--11:12.
[9]
John D. McCalpin. 1995. Memory Bandwidth and Machine Balance in Current High Performance Computers. IEEE TCCA Newsletter (Dec. 1995), 19--25.
[10]
Takeshi Minami, Motoharu Hibino, Tasuku Hiraishi, Takeshi Iwashita, and Hiroshi Nakashima. 2015. Automatic Parameter Tuning of Three-Dimensional Tiled FDTD Kernel. Springer International Publishing, 284--297.
[11]
Takeshi Minami, Takeshi Iwashita, and Hiroshi Nakashima. 2013. Temporal and spatial tiling method without redundant calculations for three-dimensional FDTD method (In Japanese). IPSJ Tran. Adv. Comput. Syst. 6, 1 (2013), 56--65.
[12]
Daniel Orozco and Garcia. 2009. Mapping the FDTD Application to Many-Core Chip Architectures. In 2009 International Conference on Parallel Processing. 309--316.
[13]
Daniel Orozco, Elkin Garcia, and Guang Gao. 2011. Locality Optimization of Stencil Applications Using Data Dependency Graphs. In Proceedings of the 23rd International Conference on Languages and Compilers for Parallel Computing (LCPC'10). Springer-Verlag, 77--91.
[14]
Robert Strzodka, Mohammed Shaheen, Dawid Pajak, and Hans-Peter Seidel. 2011. Cache Accurate Time Skewing in Iterative Stencil Computations. In 2011 International Conference on Parallel Processing. 571--581.
[15]
Dennis M. Sullivan. 2013. Electromagnetic simulation using the FDTD method (2nd ed.). Wiley-IEEE Press.
[16]
Yuan Tang, Rezaul Alam Chowdhury, Bradley C. Kuszmaul, Chi-Keung Luk, and Charles E. Leiserson. 2011. The Pochoir Stencil Compiler. In Proceedings of the Twenty-third Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA '11). ACM, 117--128.
[17]
M. Wolfe. 1989. More Iteration Space Tiling. In Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89). ACM, 655--664.
[18]
Fan Yang and Y. Rahmat-Samii. 2003. Microstrip antennas integrated with electromagnetic band-gap (EBG) structures: a low mutual coupling design for array applications. IEEE Trans. Antennas Propag. 51, 10 (2003), 2936--2946.
[19]
Kane S. Yee. 1966. Numerical solution of initial boundary value problems involving Maxwell's equations in isotropic media. IEEE Trans. Antennas and Propagation (1966), 302--307.
[20]
Andrey Zakirov, Vadim Levchenko, Anastasia Perepelkina, and Yasunari Zempo. 2016. High performance FDTD algorithm for GPGPU supercomputers. J. Phys: Conference Series 759, 1 (2016), 012100.
[21]
Xing Zhou. 2013. Tiling Optimizations for Stencil Computations. Ph.D. Dissertation. University of Illinois at Urbana-Champaign.

Cited By

View all
  • (2021)Elastodynamic full waveform inversion on GPUs with time-space tiling and wavefield reconstructionThe Journal of Supercomputing10.1007/s11227-020-03352-577:3(2416-2457)Online publication date: 1-Mar-2021
  • (2020)Multiplicative Schwartz-Type Block Multi-Color Gauss-Seidel Smoother for Algebraic Multigrid MethodsProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3368474.3368481(217-226)Online publication date: 15-Jan-2020
  • (2020)Integrating Cache Oblivious Approach with Modern Processor ArchitectureProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3368474.3368477(123-130)Online publication date: 15-Jan-2020
  • Show More Cited By

Index Terms

  1. Time-space tiling with tile-level parallelism for the 3D FDTD method

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    HPCAsia '18: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region
    January 2018
    322 pages
    ISBN:9781450353724
    DOI:10.1145/3149457
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 January 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. 3D FDTD method
    2. iterative stencil computation
    3. parallel computing
    4. time-space tiling

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    HPC Asia 2018

    Acceptance Rates

    HPCAsia '18 Paper Acceptance Rate 30 of 67 submissions, 45%;
    Overall Acceptance Rate 69 of 143 submissions, 48%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)19
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 20 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Elastodynamic full waveform inversion on GPUs with time-space tiling and wavefield reconstructionThe Journal of Supercomputing10.1007/s11227-020-03352-577:3(2416-2457)Online publication date: 1-Mar-2021
    • (2020)Multiplicative Schwartz-Type Block Multi-Color Gauss-Seidel Smoother for Algebraic Multigrid MethodsProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3368474.3368481(217-226)Online publication date: 15-Jan-2020
    • (2020)Integrating Cache Oblivious Approach with Modern Processor ArchitectureProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3368474.3368477(123-130)Online publication date: 15-Jan-2020
    • (2018)Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memory Hierarchy2018 IEEE 7th Non-Volatile Memory Systems and Applications Symposium (NVMSA)10.1109/NVMSA.2018.00016(19-24)Online publication date: Aug-2018
    • (2018)The DiamondCandy LRnLA algorithm: raising efficiency of the 3D cross-stencil schemesThe Journal of Supercomputing10.1007/s11227-018-2461-z75:12(7778-7789)Online publication date: 23-Jun-2018

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media