Skip to main content

Visualizing, Measuring, and Tuning Adaptive MPI Parameters

  • Conference paper
  • First Online:
  • 514 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 11027))

Abstract

Adaptive MPI (AMPI) is an advanced MPI runtime environment that offers several features over traditional MPI runtimes, which can lead to a better utilization of the underlying hardware platform and therefore higher performance. These features are overdecomposition through virtualization, and load balancing via rank migration. Choosing which of these features to use, and finding the optimal parameters for them is a challenging task however, since different applications and systems may require different options. Furthermore, there is a lack of information about the impact of each option. In this paper, we present a new visualization of AMPI in its companion Projections tool, which depicts the operation of an MPI application and details the impact of the different AMPI features on its resource usage. We show how these visualizations can help to improve the efficiency and execution time of an MPI application. Applying optimizations indicated by the performance analysis to two MPI-based applications results in performance improvements of up 18% from overdecomposition and load balancing.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://codesign.llnl.gov/lulesh.php.

  2. 2.

    https://github.com/ParRes/Kernels.

  3. 3.

    https://charm.cs.illinois.edu/software.

References

  1. Acun, B., et al.: Parallel programming with migratable objects: Charm++ in practice. In: SC (2014). https://doi.org/10.1109/SC.2014.58

  2. Acun, B., Kale, L.V.: Mitigating processor variation through dynamic load balancing. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, pp. 1073–1076. IEEE (2016)

    Google Scholar 

  3. Adhianto, L., et al.: HPCToolkit: tools for performance analysis of optimized parallel programs. Concurr. Comput.: Pract. Exp. 22(6), 685–701 (2010). https://doi.org/10.1002/cpe.1553

    Article  Google Scholar 

  4. Bhandarkar, M., Kalé, L.V., de Sturler, E., Hoeflinger, J.: Adaptive load balancing for MPI programs. In: Alexandrov, V.N., Dongarra, J.J., Juliano, B.A., Renner, R.S., Tan, C.J.K. (eds.) ICCS 2001. LNCS, vol. 2074, pp. 108–117. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45718-6_13

    Chapter  Google Scholar 

  5. Gottbrath, C.: Automation assisted debugging on the Cray with TotalView. In: Proceedings of Cray User Group (2011)

    Google Scholar 

  6. Huang, C., Lawlor, O., Kalé, L.V.: Adaptive MPI. In: Rauchwerger, L. (ed.) LCPC 2003. LNCS, vol. 2958, pp. 306–322. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24644-2_20

    Chapter  Google Scholar 

  7. Islam, T., Mohror, K., Schulz, M.: Exploring the capabilities of the new MPI\(\_\)T interface. In: Proceedings of the 21st European MPI Users’ Group Meeting, p. 91. ACM (2014)

    Google Scholar 

  8. Islam, T., Mohror, K., Schulz, M.: Exploring the MPI tool information interface: features and capabilities. Int. J. High Perform. Comput. Appl., pp. 212–222. (2016). https://doi.org/10.1177/1094342015600507

    Chapter  Google Scholar 

  9. Jeannot, E., Meneses, E., Mercier, G., Tessier, F., Zheng, G.: Communication and topology-aware load balancing in Charm++ with TreeMatch. In: 2013 IEEE International Conference on Cluster Computing, CLUSTER, pp. 1–8. IEEE (2013)

    Google Scholar 

  10. Kale, L.V., Krishnan, S.: CHARM++: a portable concurrent object oriented system based on C++. In: Conference on Object-Oriented Programming Systems, Languages, and Applications, OOPSLA, pp. 91–108 (1993)

    Google Scholar 

  11. Karlin, I., et al.: Exploring traditional and emerging parallel programming models using a proxy application. In: 27th IEEE International Parallel & Distributed Processing Symposium, IEEE IPDPS 2013, Boston, USA, May 2013

    Google Scholar 

  12. Karlin, I., Keasler, J., Neely, R.: Lulesh 2.0 updates and changes. Technical report LLNL-TR-641973, August 2013

    Google Scholar 

  13. Karrels, E., Lusk, E.: Performance analysis of MPI programs. In: Environments and Tools for Parallel Scientific Computing, pp. 195–200 (1994)

    Google Scholar 

  14. Knüpfer, A., et al.: The Vampir performance analysis tool-set. In: Resch, M., Keller, R., Himmler, V., Krammer, B., Schulz, A. (eds.) Tools for High Performance Computing, pp. 139–155. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68564-7_9

    Chapter  Google Scholar 

  15. Knüpfer, A., et al.: Score-P: a joint performance measurement run-time infrastructure for periscope, Scalasca, TAU, and Vampir. In: Brunst, H., Müller, M., Nagel, W., Resch, M. (eds.) Tools for High Performance Computing, pp. 79–91. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31476-6_7

    Chapter  Google Scholar 

  16. Krammer, B., Bidmon, K., Müller, M.S., Resch, M.M.: MARMOT: an MPI analysis and checking tool. In: Advances in Parallel Computing, vol. 13, pp. 493–500 (2004)

    Google Scholar 

  17. Lecomber, D., Wohlschlegel, P.: Debugging at scale with Allinea DDT. In: Cheptsov, A., Brinkmann, S., Gracia, J., Resch, M., Nagel, W. (eds.) Tools for High Performance Computing, pp. 3–12. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37349-7_1

    Chapter  Google Scholar 

  18. Menon, H., Chandrasekar, K., Kale, L.V.: POSTER: automated load balancer selection based on application characteristics. In: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 447–448. ACM (2017)

    Google Scholar 

  19. Message Passing Interface Forum: MPI: A Message-Passing Interface Standard (Version 3.0). Technical report (2012)

    Google Scholar 

  20. Müller, M.S., et al.: Developing scalable applications with Vampir, VampirServer and VampirTrace. In: PARCO, vol. 15, pp. 637–644 (2007)

    Google Scholar 

  21. Pearce, O., Gamblin, T., de Supinski, B.R., Schulz, M., Amato, N.M.: Quantifying the effectiveness of load balance algorithms. In: ACM International Conference on Supercomputing, ICS, pp. 185–194 (2012). https://doi.org/10.1145/2304576.2304601

  22. Vetter, J.: Performance analysis of distributed applications using automatic classification of communication inefficiencies. In: Proceedings of the 14th International Conference on Supercomputing, pp. 245–254. ACM (2000)

    Google Scholar 

  23. Vetter, J.: Dynamic statistical profiling of communication activity in distributed applications. In: Proceedings of the 2002 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS 2002, pp. 240–250. ACM, New York (2002). https://doi.org/10.1145/511334.511364

  24. Van der Wijngaart, R.F., Mattson, T.G.: The parallel research kernels. In: 2014 IEEE High Performance Extreme Computing Conference, HPEC, pp. 1–6. IEEE (2014)

    Google Scholar 

  25. Wu, C.E., et al.: From trace generation to visualization: a performance framework for distributed parallel systems. In: Proceedings of the 2000 ACM/IEEE Conference on Supercomputing, SC 2000. IEEE Computer Society, Washington, DC (2000). http://dl.acm.org/citation.cfm?id=370049.370458

  26. Zaki, O., Lusk, E., Gropp, W., Swider, D.: Toward scalable performance visualization with Jumpshot. Int. J. High Perform. Comput. Appl. 13(3), 277–288 (1999)

    Article  Google Scholar 

  27. Zhai, J., Sheng, T., He, J.: Efficiently acquiring communication traces for large-scale parallel applications. IEEE Trans. Parallel Distrib. Syst. (TPDS) 22(11), 1862–1870 (2011). https://doi.org/10.1109/TPDS.2011.49

    Article  Google Scholar 

Download references

Acknowledgments

This paper is based in part upon work supported by the Department of Energy, National Nuclear Security Administration, under Award Number DE-NA0002374.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matthias Diener .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Diener, M., White, S., Kale, L.V. (2019). Visualizing, Measuring, and Tuning Adaptive MPI Parameters. In: Bhatele, A., Boehme, D., Levine, J., Malony, A., Schulz, M. (eds) Programming and Performance Visualization Tools. ESPT ESPT VPA VPA 2017 2018 2017 2018. Lecture Notes in Computer Science(), vol 11027. Springer, Cham. https://doi.org/10.1007/978-3-030-17872-7_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-17872-7_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-17871-0

  • Online ISBN: 978-3-030-17872-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics