Skip to main content

Applicability of the ECM Performance Model to Explicit ODE Methods on Current Multi-core Processors

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2018)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10876))

Included in the following conference series:

Abstract

To support the portability of efficiency when bringing an application from scientific computing to a new HPC system, autotuning techniques are promising approaches. Ideally, these approaches are able to derive an efficient implementation for a specific HPC system by applying suitable program transformations. Often, a large number of implementations results, and the most efficient of these variants should be selected. In this article, we investigate performance modelling and prediction techniques which can support the selection process. These techniques may significantly reduce the selection effort, compared to extensive runtime tests. We apply the execution-cache-memory (ECM) performance model to numerical solution methods for ordinary differential equations (ODEs). In particular, we consider the question whether it is possible to obtain a performance prediction for the resulting implementation variants to support the variant selection. We investigate the accuracy of the prediction for different ODEs and different hardware platforms and show that the prediction is able to reliably select a set of fast variants and, thus, to limit the search space for possible later empirical tuning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Whaley, R.C., Dongarra, J.J.: Automatically Tuned Linear Algebra Software. Technical report, University of Tennessee (1999)

    Google Scholar 

  2. Bilmes, J., Asanovic, K., Chin, C.W., Demmel, J.: Optimizing matrix multiply using PHiPAC: a portable high-performance, ANSI C coding methodology. In: Proceedings of the 11th International Conference on Supercomputing (ICS 1997), pp. 340–347. ACM (1997)

    Google Scholar 

  3. Tiwari, A., Hollingsworth, J.K.: Online adaptive code generation and tuning. In: Proceedings of the IEEE International Parallel & Distributed Processing Symposium, pp. 879–892. IEEE (2011)

    Google Scholar 

  4. Gerndt, M., César, E., Benkner, S. (eds.): Automatic Tuning of HPC Applications - The Periscope Tuning Framework. Shaker Verlag, Aachen (2015)

    Google Scholar 

  5. Hairer, E., Nørsett, S., Wanner, G.: Solving Ordinary Differential Equations I: Nonstiff Problems, 2nd edn. Springer, Heidelberg (2000). https://doi.org/10.1007/978-3-540-78862-1

    Book  MATH  Google Scholar 

  6. Tikir, M.M., Hollingsworth, J.K.: Using hardware counters to automatically improve memory performance. In: Proceedings of the ACM/IEEE Conference on Supercomputing, SC 2004, p. 46. IEEE Computer Society (2004)

    Google Scholar 

  7. Tallent, N.R., Mellor-Crummey, J.M.: Effective performance measurement and analysis of multithreaded applications. In: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice Parallel Programming, PPoPP 2009, pp. 229–240. ACM (2009)

    Google Scholar 

  8. Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)

    Article  Google Scholar 

  9. Tang, L., Hu, X.S., Barrett, R.F.: Perdome: a performance model for heterogeneous computing systems. In: Proceedings of the Symposium on High Performance Computing, HPC 2015, pp. 225–232. Society for Computer Simulation International (2015)

    Google Scholar 

  10. Treibig, J., Hager, G.: Introducing a performance model for bandwidth-limited loop kernels. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds.) PPAM 2009. LNCS, vol. 6067, pp. 615–624. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14390-8_64

    Chapter  Google Scholar 

  11. Stengel, H., Treibig, J., Hager, G., Wellein, G.: Quantifying performance bottlenecks of stencil computations using the execution-cache-memory model. In: Proceedings of the 29th ACM International Conference on Supercomputing, ICS 2015, pp. 207–216. ACM (2015)

    Google Scholar 

  12. Luszczek, P., Gates, M., Kurzak, J., Danalis, A., Dongarra, J.: Search space generation and pruning system for autotuners. In: IEEE International Parallel Distributed Processing Symposium on Workshops, IPDPSW 2016, pp. 1545–1554, May 2016

    Google Scholar 

  13. Feng, W., Abdelrahman, T.S.: A sampling based strategy to automatic performance tuning of GPU programs. In: IEEE International Parallel Distributed Processing Symposium on Workshops, IPDPSW 2017, pp. 1342–1349. IEEE Computer Society, May 2017

    Google Scholar 

  14. Luo, Y., Tan, G., Mo, Z., Sun, N.: FAST: a fast stencil autotuning framework based on an optimal-solution space model. In: Proceedings of the 29th ACM International Conference on Supercomupting, ICS 2015, pp. 187–196. ACM, June 2015

    Google Scholar 

  15. Bei, Z., Yu, Z., Zhang, H., Xiong, W., Xu, C., Eeckhout, L., Feng, S.: RFHOC: a random-forest approach to auto-tuning Hadoop’s configuration. IEE Trans. Parallel Distrib. Syst. 27(5), 1470–1483 (2016)

    Article  Google Scholar 

  16. Nørsett, S.P., Simonsen, H.H.: Aspects of parallel Runge-Kutta methods. In: Bellen, A., Gear, C.W., Russo, E. (eds.) Numerical Methods for Ordinary Differential Equations. LNM, vol. 1386, pp. 103–117. Springer, Heidelberg (1989). https://doi.org/10.1007/BFb0089234

    Chapter  Google Scholar 

  17. van der Houwen, P.J., Sommeijer, B.P.: Parallel iteration of high-order Runge-Kutta methods with stepsize control. J. Comput. Appl. Math. 29, 111–127 (1990)

    Article  MathSciNet  Google Scholar 

  18. Burrage, K.: Parallel and Sequential Methods for Ordinary Differential Equations. Oxford Science Publications, Oxford (1995)

    MATH  Google Scholar 

  19. Schmitt, B.A.: Peer methods with improved embedded sensitivities for parameter-dependent ODEs. J. Comput. Appl. Math. 256, 242–253 (2014)

    Article  MathSciNet  Google Scholar 

  20. Hammer, J., Hager, G., Eitzinger, J., Wellein, G.: Automatic loop kernel analysis and performance modeling with kerncraft. In: Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation High Performance Computing Systems, PMBS 2015, pp. 4:1–4:11. ACM (2015)

    Google Scholar 

  21. Treibig, J., Hager, G., Wellein, G.: LIKWID: a lightweight performance-oriented tool suite for x86 multicore environments. In: Proceedings of the 39th International Conference on Parallel Processing Workshops, ICPPW 2010, pp. 207–216. IEEE Computer Society (2010)

    Google Scholar 

  22. Israel, H., Gideon, S.: Intel architecture code analysis. https://software.intel.com/en-us/articles/intel-architecture-code-analyzer

  23. Hairer, E., Wanner, G.: Solving Ordinary Differential Equations II: Stiff and Differential-Algebraic Problems, 2 rev. edn. Springer, Heidelberg (2002). https://doi.org/10.1007/978-3-642-05221-7

    Book  MATH  Google Scholar 

  24. Bartel, A., Günther, M., Pulch, R., Rentrop, P.: Numerical techniques for different time scales in electric circuit simulation. In: Breuer, M., Durst, F., Zenger, C. (eds.) High Performance Scientific and Engineering Computing. LNCSE, vol. 21, pp. 343–360. Springer, Heidelberg (2002). https://doi.org/10.1007/978-3-642-55919-8_38

    Chapter  Google Scholar 

  25. Mazzia, F., Magherini, C., Kierzenka, J.: Test Set for Initial Value Problem Solvers, Release 2.4, February 2008. https://archimede.dm.uniba.it/~testset/

Download references

Acknowledgments

This work is supported by the German Ministry of Science and Education (BMBF) under project number 01IH16012A. Discussions with Julian Hammer (RRZE) are gratefully acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Johannes Seiferth .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Seiferth, J., Alappat, C., Korch, M., Rauber, T. (2018). Applicability of the ECM Performance Model to Explicit ODE Methods on Current Multi-core Processors. In: Yokota, R., Weiland, M., Keyes, D., Trinitis, C. (eds) High Performance Computing. ISC High Performance 2018. Lecture Notes in Computer Science(), vol 10876. Springer, Cham. https://doi.org/10.1007/978-3-319-92040-5_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-92040-5_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-92039-9

  • Online ISBN: 978-3-319-92040-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics