Autotuning of Adaptive Mesh Refinement PDE Solvers on Shared Memory Architectures

Nogina, Svetlana; Unterweger, Kristof; Weinzierl, Tobias

doi:10.1007/978-3-642-31464-3_68

Svetlana Nogina¹⁹,
Kristof Unterweger¹⁹ &
Tobias Weinzierl¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7203))

Included in the following conference series:

International Conference on Parallel Processing and Applied Mathematics

2152 Accesses

Abstract

Many multithreaded, grid-based, dynamically adaptive solvers for partial differential equations permanently have to traverse subgrids (patches) of different and changing sizes. The parallel efficiency of this traversal depends on the interplay of the patch size, the architecture used, the operations triggered throughout the traversal, and the grain size, i.e. the size of the subtasks the patch is broken into. We propose an oracle mechanism delivering grain sizes on-the-fly. It takes historical runtime measurements for different patch and grain sizes as well as the traverse’s operations into account, and it yields reasonable speedups. Neither magic configuration settings nor an expensive pre-tuning phase are necessary. It is an autotuning approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

An Experience Report on (Auto-)tuning of Mesh-Based PDE Solvers on Shared Memory Systems

Task-Based Sparse Hybrid Linear Solver for Distributed Memory Heterogeneous Architectures

OpenMP Parallelization Strategies for a Discontinuous Galerkin Solver

Article 30 July 2018

References

Bilmes, J., Asanovic, K., Chin, C.-W., Demmel, J.: Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In: Proceedings of the 11th International Conference on Supercomputing, pp. 340–347 (1997)
Google Scholar
Cariño, R.L., Banicescu, I.: Dynamic Scheduling Parallel Loops With Variable Iterate Execution Times. In: 16th International Parallel and Distributed Processing Symposium (IPDPS 2002). IEEE (2002); electonical proceedings
Google Scholar
Chapman, B., Jost, G., van der Pas, R.: Using OpenMP: Portable Shared Memory Parallel Programming. MIT Press (2007)
Google Scholar
Cuenca, J., García, L.-P., Giménez, D.: A proposal for autotuning linear algebra routines on multicore platforms. Procedia CS 1(1), 515–523 (2010)
Article Google Scholar
Datta, K., Murphy, M., Volkov, V., Williams, S., Carter, J., Oliker, L., Patterson, D., Shalf, J., Yelick, K.: Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, pp. 4:1–4:12. IEEE Press (2008)
Google Scholar
Eckhardt, W., Weinzierl, T.: A Blocking Strategy on Multicore Architectures for Dynamically Adaptive PDE Solvers. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds.) PPAM 2009. LNCS, vol. 6067, pp. 567–575. Springer, Heidelberg (2010)
Chapter Google Scholar
Gmeiner, B., Gradl, T., Köstler, H., Rüde, U.: Highly parallel geometric multigrid for hierarchical hybrid grids on Blue-Gene/P. Numer. Linear Algebr. (submitted)
Google Scholar
Hummel, S.F., Schmidt, J., Uma, R.N., Wein, J.: Load-Sharing in Heterogeneous Systems via Weighted Factoring. In: Proceedings of the 8th Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 318–328. ACM (1997)
Google Scholar
Kamil, S., Chan, C., Oliker, L., Shalf, J., Williams, S.: An auto-tuning framework for parallel multicore stencil computations. In: 24th IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2010, pp. 1–12. IEEE (2010)
Google Scholar
Parker, S.G.: A component-based architecture for parallel multi-physics pde simulation. FGCS 22(1-2), 204–216 (2006)
Article Google Scholar
Reinders, J.: Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O’Reilly Media (2007)
Google Scholar
Sutter, H.: The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software. Dr. Dobb’s Journal 3(30), 202–210 (2005)
Google Scholar
Tang, P., Yew, P.: Processor self-scheduling for multiple-nested parallel loops. In: Proceedings of the 1986 International Conference on Parallel Processing, pp. 528–535. IEEE (1986)
Google Scholar
Vuduc, R.: Automatic assembly of highly tuned code fragments (1998), www.cs.berkeley.edu/~richie/stat242/project
Weinzierl, T.: A Framework for Parallel PDE Solvers on Multiscale Adaptive Cartesian Grids. Verlag Dr. Hut (2009)
Google Scholar
Weinzierl, T., Köppl, T.: A geometric space-time multigrid algorithm for the heat equation. In: NMTMA (accepted)
Google Scholar
Whaley, R.C., Petitet, A., Dongarra, J.: Automated empirical optimizations of software and the ATLAS project. Parallel Comput. 27, 3–35 (2001)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Technische Universität München, Boltzmannstr. 3, 85748, Garching, Germany
Svetlana Nogina, Kristof Unterweger & Tobias Weinzierl

Authors

Svetlana Nogina
View author publications
You can also search for this author in PubMed Google Scholar
Kristof Unterweger
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Weinzierl
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer and Information Science, Czestochowa University of Technology, Dabrowskiego 69, 42-201, Czestochowa, Poland
Roman Wyrzykowski & Konrad Karczewski &
Electrical Engineering and Computer Science Department, University of Tennessee, 1122 Volunteer Blvd, 37996-3450, Knoxville, TN, USA
Jack Dongarra
Department of Informatics and Mathematical Modeling, Technical University of Denmark, Richard Petersens Plads, Building 321, 2800, Kongens Lyngby, Denmark
Jerzy Waśniewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nogina, S., Unterweger, K., Weinzierl, T. (2012). Autotuning of Adaptive Mesh Refinement PDE Solvers on Shared Memory Architectures. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2011. Lecture Notes in Computer Science, vol 7203. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31464-3_68

Download citation

DOI: https://doi.org/10.1007/978-3-642-31464-3_68
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31463-6
Online ISBN: 978-3-642-31464-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics