Skip to main content

Multidimensional Performance and Scalability Analysis for Diverse Applications Based on System Monitoring Data

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10777))

Abstract

The availability of high performance computing resources enables us to perform very large numerical simulations and in this way to tackle challenging real life problems. At the same time, in order to efficiently utilize the computational power at our disposal, the ever growing complexity of the computer architecture poses high demands on the algorithms and their implementation.

Performing large scale high performance simulations can be done by utilizing available general libraries, writing libraries that suit particular classes of problems or developing software from scratch. Clearly, the possibilities to enhance the efficiency of the software tools in the three cases is very different, ranging from nearly impossible to full capacity. In this work we exemplify the efficiency of the three approaches on benchmark problems, using monitoring tools that provide a very rich spectrum of data on the performance of the applied codes as well as on the utilization of the supercomputer itself.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    OpenMP threading and GPU kernels have been written in a separate branch and have not been used in this study.

References

  1. Alexandrov, V., Esquivel-Flores, O., Ivanovska, S., Karaivanova, A.: On the preconditioned quasi-Monte Carlo algorithm for matrix computations. In: Lirkov, I., Margenov, S.D., Waśniewski, J. (eds.) LSSC 2015. LNCS, vol. 9374, pp. 163–171. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26520-9_17

    Chapter  Google Scholar 

  2. Andreev, D.Y., Antonov, A.S., Voevodin, V.V., Zhumatiy, S.A., Nikitenko, D.A., Stefanov, K.S., Shvets, P.A.: A system for the automated finding of inefficiencies and errors in parallel programs. Comput. Methods Program.: New Comput. Technol. 14, 48–53 (2013)

    Google Scholar 

  3. Antonov, A., Teplov, A.: Generalized approach to scalability analysis of parallel applications. In: Carretero, J., et al. (eds.) ICA3PP 2016. LNCS, vol. 10049, pp. 291–304. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49956-7_23

    Chapter  Google Scholar 

  4. Dorostkar, A., Neytcheva, M., Lund, B.: Numerical and computational aspects of some block-preconditioners for saddle point systems. Parallel Comput. 49, 164–178 (2015). https://doi.org/10.1016/j.parco.2015.06.003

    Article  MathSciNet  Google Scholar 

  5. Koufaty, D., Marr, D.: Hyper-threading technology in the netburst microarchitecture. IEEE Micro 23, 56–65 (2003). ISSN 0272-1732

    Article  Google Scholar 

  6. Nikitenko, D., Stefanov, K., Zhumatiy, S., Voevodin, V., Teplov, A., Shvets, P.: System monitoring-based holistic resource utilization analysis for every user of a large HPC center. In: Carretero, J., et al. (eds.) ICA3PP 2016. LNCS, vol. 10049, pp. 305–318. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49956-7_24

    Chapter  Google Scholar 

  7. Nikitenko, D.A., Voevodin, V.V., Voevodin, V.V., Zhumatiy, S.A., Stefanov, K.S., Teplov, A.M., Shvets, P.A.: Supercomputer application integral characteristics analysis for the whole queued job collection of large-scale HPC systems. In: 10th Annual International Scientific Conference on Parallel Computing Technologies, Arkhangelsk, Russian Federation, 29–31 March 2016, PCT 2016. CEUR Workshop Proceedings, vol. 1576, pp. 20–30 (2016)

    Google Scholar 

  8. Nikitenko, D.A., Adinets, A.V., Bryzgalov, P.A., Stefanov, K.S., Voevodin, V.V., Zhumatiy, S.A.: Job Digest - approach to analysis of application dynamic characteristics on supercomputer systems. Numer. Methods Program. 13, 160–166 (2012)

    Google Scholar 

  9. Rubensson, E.H., Rudberg, E.: Locality-aware parallel block-sparse matrix-matrix multiplication using the Chunks and Tasks programming model. Parallel Comput. 57, 87–106 (2016)

    Article  MathSciNet  Google Scholar 

  10. Rubensson, E.H., Rudberg, E.: Chunks and Tasks: a programming model for parallelization of dynamic algorithms. Parallel Comput. 40, 328–343 (2014)

    Article  Google Scholar 

  11. Rubensson, E.H., Rudberg, E.: CHT-MPI: an MPI-based Chunks and Tasks library implementation, version 1.2. http://www.chunks-and-tasks.org

  12. Bowler, D.R., Miyazaki, T.: \(O(N)\) methods in electronic structure calculations. Rep. Prog. Phys. 75, 036503 (2012). https://doi.org/10.1088/0034-4885/75/3/036503

    Article  Google Scholar 

  13. Voevodin, V., Voevodin, V.: Efficiency of exascale supercomputer centers and supercomputing education. In: Gitler, I., Klapp, J. (eds.) ISUM 2015. CCIS, vol. 595, pp. 14–23. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32243-8_2

    Chapter  Google Scholar 

  14. Voevodin, V.V., Zhumatiy, S.A., Sobolev, S.I., Antonov, A.S., Bryzgalov, P.A., Nikitenko, D.A., Stefanov, K.S., Voevodin, V.V.: Practice of "Lomonosov" supercomputer. Open Syst. J. 7, 36–39 (2012)

    Google Scholar 

  15. Weidendorfer, J.: Sequential performance analysis with Callgrind and KCachegrind. In: Resch, M., Keller, R., Himmler, V., Krammer, B., Schulz, A. (eds.) Tools for High Performance Computing, pp. 93–113. Springer, Berlin, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68564-7_7

    Chapter  Google Scholar 

  16. Karypis, G., Kumar, V.: A fast and highly quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1999)

    Article  MATH  Google Scholar 

  17. Allinea. https://www.allinea.com/products/map

  18. Deal.II. https://www.dealii.org

  19. mpiP Profiling Tool. mpip.sourceforge.net/

  20. Totalview for HPC. https://www.roguewave.com/products-services/totalview

  21. The Trilinos Project. https://trilinos.org/

Download references

Acknowledgements

The research work of the authors was partly supported by The Swedish Foundation for international Cooperation in Research and Higher Education (STINT) Initiation grant IB2016-6543, entitled ‘Large scale complex numerical simulations on large scale complex computer facilities - identifying performance and scalability issues’, 2016–2017.

The performance evaluation and all large scale tests are thanks to the access to the supercomputer Lomonosov-2 at the Research Computing Center of Lomonosov Moscow State University, Russia.

The results were obtained in the Lomonosov Moscow State University with the financial support of the Russian Science Foundation (agreement N 17-71-20114) in part of Chunks and Tasks model efficiency analysis (Sect. 4.3). The work on applications described in Sects. 4.2 and 4.4 was supported by the Russian Foundation for Basic Research (projects 16-07-01003 in part of scalability analysis, and project 17-07-00719 in part of system monitoring data management). This is hereby gratefully acknowledged.

Numerous valuable discussions with Emanuel H. Rubensson and Elias Rudberg as well as their contribution in correcting the paper are hereby also gratefully acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dmitry Nikitenko .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Neytcheva, M. et al. (2018). Multidimensional Performance and Scalability Analysis for Diverse Applications Based on System Monitoring Data. In: Wyrzykowski, R., Dongarra, J., Deelman, E., Karczewski, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2017. Lecture Notes in Computer Science(), vol 10777. Springer, Cham. https://doi.org/10.1007/978-3-319-78024-5_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-78024-5_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-78023-8

  • Online ISBN: 978-3-319-78024-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics