Skip to main content

An Experience Report on (Auto-)tuning of Mesh-Based PDE Solvers on Shared Memory Systems

  • Conference paper
  • First Online:
Parallel Processing and Applied Mathematics (PPAM 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10778))

Abstract

With the advent of manycore systems, shared memory parallelisation has gained importance in high performance computing. Once a code is decomposed into tasks or parallel regions, it becomes crucial to identify reasonable grain sizes, i.e. minimum problem sizes per task that make the algorithm expose a high concurrency at low overhead. Many papers do not detail what reasonable task sizes are, and consider their findings craftsmanship not worth discussion. We have implemented an autotuning algorithm, a machine learning approach, for a project developing a hyperbolic equation system solver. Autotuning here is important as the grid and task workload are multifaceted and change frequently during runtime. In this paper, we summarise our lessons learned. We infer tweaks and idioms for general autotuning algorithms and we clarify that such a approach does not free users completely from grain size awareness.

This work received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 671698 (ExaHyPE). It made use of the facilities of the Hamilton HPC Service of Durham University. The authors furthermore gratefully acknowledge the Gauss Centre for Supercomputing e.V. (www.gauss-centre.eu) for funding this project by providing computing time on the GCS Supercomputer SuperMUC at Leibniz Supercomputing Centre (www.lrz.de). All software is freely available from www.exahype.eu.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Amdahl, G.M.: Validity of the single processor approach to achieving large scale computing. In: AFIPS Proceedings of the SJCC, vol. 31, pp. 483–485 (1967)

    Google Scholar 

  2. Bader, M., Dumbser, M., Gabriel, A., Igel, H., Rezzolla, L., Weinzierl, T.: ExaHyPE–An Exascale Hyperbolic PDE Engine (2017). http://www.exahype.org

  3. Dumbser, M., Zanotti, O., Loubère, R., Diot, S.: A posteriori subcell limiting of the discontinuous Galerkin finite element method for hyperbolic conservation laws. J. Comput. Phys. 278, 47–75 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  4. Gamma, E., Helm, R., Johnson, R.E., Vlissides, J.: Design Patterns - Elements of Reusable Object-Oriented Software, 1st edn. Addison-Wesley Longman, Boston (1994)

    MATH  Google Scholar 

  5. Gerber, R.: The Software Optimization Cookbook-High-performance Recipes for the Intel Architecture. Intel Press, Hillsboro (2002)

    Google Scholar 

  6. Nogina, S., Unterweger, K., Weinzierl, T.: Autotuning of adaptive mesh refinement PDE solvers on shared memory architectures. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2011. LNCS, vol. 7203, pp. 671–680. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31464-3_68

    Chapter  Google Scholar 

  7. Papadimitriou, C., Steiglitz, K. (eds.): Combinatorial Optimization: Algorithms and Complexity. Dover Publications Inc., New York (2000)

    Google Scholar 

  8. Reano, C., Silla, F., Leslie, M.J.: schedGPU: fine-grain dynamic and adaptative scheduling for GPUs. In: 2016 International Conference on High Performance Computing Simulation (HPCS), pp. 993–997, July 2016

    Google Scholar 

  9. Reinders, J.: Intel Threading Building Blocks. O’Reilly, Sebastopol (2007)

    Google Scholar 

  10. Schreiber, M., Riesinger, C., Neckel, T., Bungartz, H.J., Breuer, A.: Invasive compute balancing for applications with shared and hybrid parallelization. Int. J. Parallel Prog. 43(6), 1004–1027 (2015)

    Article  Google Scholar 

  11. Wahib, M., Maruyama, N., Aoki, T.: Daino: a high-level framework for parallel and efficient AMR on GPUs. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Press (2016)

    Google Scholar 

  12. Weinzierl, T., Mehl, M.: Peano—a traversal and storage scheme for octree-like adaptive cartesian multiscale grids. SIAM J. Sci. Comput. 33(5), 2732–2760 (2011)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dominic E. Charrier .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Charrier, D.E., Weinzierl, T. (2018). An Experience Report on (Auto-)tuning of Mesh-Based PDE Solvers on Shared Memory Systems. In: Wyrzykowski, R., Dongarra, J., Deelman, E., Karczewski, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2017. Lecture Notes in Computer Science(), vol 10778. Springer, Cham. https://doi.org/10.1007/978-3-319-78054-2_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-78054-2_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-78053-5

  • Online ISBN: 978-3-319-78054-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics