Abstract
With the advent of manycore systems, shared memory parallelisation has gained importance in high performance computing. Once a code is decomposed into tasks or parallel regions, it becomes crucial to identify reasonable grain sizes, i.e. minimum problem sizes per task that make the algorithm expose a high concurrency at low overhead. Many papers do not detail what reasonable task sizes are, and consider their findings craftsmanship not worth discussion. We have implemented an autotuning algorithm, a machine learning approach, for a project developing a hyperbolic equation system solver. Autotuning here is important as the grid and task workload are multifaceted and change frequently during runtime. In this paper, we summarise our lessons learned. We infer tweaks and idioms for general autotuning algorithms and we clarify that such a approach does not free users completely from grain size awareness.
This work received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 671698 (ExaHyPE). It made use of the facilities of the Hamilton HPC Service of Durham University. The authors furthermore gratefully acknowledge the Gauss Centre for Supercomputing e.V. (www.gauss-centre.eu) for funding this project by providing computing time on the GCS Supercomputer SuperMUC at Leibniz Supercomputing Centre (www.lrz.de). All software is freely available from www.exahype.eu.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Amdahl, G.M.: Validity of the single processor approach to achieving large scale computing. In: AFIPS Proceedings of the SJCC, vol. 31, pp. 483–485 (1967)
Bader, M., Dumbser, M., Gabriel, A., Igel, H., Rezzolla, L., Weinzierl, T.: ExaHyPE–An Exascale Hyperbolic PDE Engine (2017). http://www.exahype.org
Dumbser, M., Zanotti, O., Loubère, R., Diot, S.: A posteriori subcell limiting of the discontinuous Galerkin finite element method for hyperbolic conservation laws. J. Comput. Phys. 278, 47–75 (2014)
Gamma, E., Helm, R., Johnson, R.E., Vlissides, J.: Design Patterns - Elements of Reusable Object-Oriented Software, 1st edn. Addison-Wesley Longman, Boston (1994)
Gerber, R.: The Software Optimization Cookbook-High-performance Recipes for the Intel Architecture. Intel Press, Hillsboro (2002)
Nogina, S., Unterweger, K., Weinzierl, T.: Autotuning of adaptive mesh refinement PDE solvers on shared memory architectures. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2011. LNCS, vol. 7203, pp. 671–680. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31464-3_68
Papadimitriou, C., Steiglitz, K. (eds.): Combinatorial Optimization: Algorithms and Complexity. Dover Publications Inc., New York (2000)
Reano, C., Silla, F., Leslie, M.J.: schedGPU: fine-grain dynamic and adaptative scheduling for GPUs. In: 2016 International Conference on High Performance Computing Simulation (HPCS), pp. 993–997, July 2016
Reinders, J.: Intel Threading Building Blocks. O’Reilly, Sebastopol (2007)
Schreiber, M., Riesinger, C., Neckel, T., Bungartz, H.J., Breuer, A.: Invasive compute balancing for applications with shared and hybrid parallelization. Int. J. Parallel Prog. 43(6), 1004–1027 (2015)
Wahib, M., Maruyama, N., Aoki, T.: Daino: a high-level framework for parallel and efficient AMR on GPUs. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Press (2016)
Weinzierl, T., Mehl, M.: Peano—a traversal and storage scheme for octree-like adaptive cartesian multiscale grids. SIAM J. Sci. Comput. 33(5), 2732–2760 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Charrier, D.E., Weinzierl, T. (2018). An Experience Report on (Auto-)tuning of Mesh-Based PDE Solvers on Shared Memory Systems. In: Wyrzykowski, R., Dongarra, J., Deelman, E., Karczewski, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2017. Lecture Notes in Computer Science(), vol 10778. Springer, Cham. https://doi.org/10.1007/978-3-319-78054-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-78054-2_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-78053-5
Online ISBN: 978-3-319-78054-2
eBook Packages: Computer ScienceComputer Science (R0)