skip to main content
10.1145/2802658.2802667acmotherconferencesArticle/Chapter ViewAbstractPublication PageseurompiConference Proceedingsconference-collections
research-article

MPI Advisor: a Minimal Overhead Tool for MPI Library Performance Tuning

Authors Info & Claims
Published:21 September 2015Publication History

ABSTRACT

A majority of parallel applications executed on HPC clusters use MPI for communication between processes. Most users treat MPI as a black box, executing their programs using the cluster's default settings. While the default settings perform adequately for many cases, it is well known that optimizing the MPI environment can significantly improve application performance. Although the existing optimization tools are effective when used by performance experts, they require deep knowledge of MPI library behavior and the underlying hardware architecture in which the application will be executed. Therefore, an easy-to-use tool that provides recommendations for configuring the MPI environment to optimize application performance is highly desirable. This paper addresses this need by presenting an easy-to-use methodology and tool, named MPI Advisor, that requires just a single execution of the input application to characterize its predominant communication behavior and determine the MPI configuration that may enhance its performance on the target combination of MPI library and hardware architecture. Currently, MPI Advisor provides recommendations that address the four most commonly occurring MPI-related performance bottlenecks, which are related to the choice of: 1) point-to-point protocol (eager vs. rendezvous), 2) collective communication algorithm, 3) MPI tasks-to-cores mapping, and 4) Infiniband transport protocol. The performance gains obtained by implementing the recommended optimizations in the case studies presented in this paper range from a few percent to more than 40%. Specifically, using this tool, we were able to improve the performance of HPCG with MVAPICH2 on four nodes of the Stampede cluster from 6.9 GFLOP/s to 10.1 GFLOP/s. Since the tool provides application-specific recommendations, it also informs the user about correct usage of MPI.

References

  1. F. Broquedis, J. Clet-Ortega, S. Moreaud, N. Furmento, B. Goglin, G. Mercier, S. Thibault, and R. Namyst. hwloc: a Generic Framework for Managing Hardware Affinities in HPC Applications. In Proceedings of the Euromicro Conference on Parallel, Distributed and Network-Based Computing, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Chaarawi, J. Squyres, E. Gabriel, and S. Feki. A Tool for Optimizing Runtime Parameters of Open MPI. In Proceedings of the European PVM/MPI Users' Group Meeting, pages 210--217, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Dongarra and M. A. Heroux. Toward a New Metric for Ranking High Performance Computing Systems. Technical Report SAND2013-4744, Sandia National Laboratories, 2013.Google ScholarGoogle Scholar
  4. L. Fialho and J. Browne. Framework and Modular Infrastructure for Automation of Architectural Adaptation and Performance Optimization for HPC Systems. In Proceedings of the International Conference on Supercomputing, pages 261--277, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Geimer, P. Saviankou, A. Strube, Z. Szebenyi, F. Wolf, and B. J. N. Wylie. Further Improving the Scalability of the Scalasca Toolset. In Proceedings of the PARA 2010: State of the Art in Scientific and Parallel Computing, pages 463--473, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Gerndt and M. Ott. Automatic performance analysis with Periscope. Concurrency and Computation: Practice and Experience, 22(6):736--748, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. E. Harding, T. Metzroth, J. Gauss, and A. A. Auer. Parallel Calculation of CCSD and CCSD(T) Analytic First and Second Derivatives. Journal of Chemical Theory and Computation, 4(1):64--74, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  8. E. Jeannot, G. Mercier, and F. Tessier. Process placement in multicore clusters: Algorithmic issues and practical techniques. IEEE Transactions on Parallel and Distributed Systems, 25(4):993--1002, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. Kielmann, R. F. H. Hofman, H. E. Bal, A. Plaat, and R. A. F. Bhoedjang. MagPIe: MPI's Collective Communication Operations for Clustered Wide Area Systems. SIGPLAN Notices, 34(8):131--140, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. J. Koop, J. K. Sridhar, and D. K. Panda. Scalable MPI design over InfiniBand using eXtended Reliable Connection. In Proceedings of the IEEE International Conference on Cluster Computing, pages 203--212, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  11. M. J. Koop, S. Sur, Q. Gao, and D. K. Panda. High Performance MPI Design Using Unreliable Datagram for Ultra-scale InfiniBand Clusters. In Proceedings of the International Conference on Supercomputing, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Miceli, G. Civario, A. Sikora, E. César, M. Gerndt, H. Haitof, C. Navarrete, S. Benkner, M. Sandrieser, L. Morin, and F. Bodin. AutoTune: A Plugin-Driven Approach to the Automatic Tuning of Parallel Applications. In Proceedings of the PARA 2012: State of the Art in Scientific and Parallel Computing, pages 328--342, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Mijaković, A. P. Soto, I. A. C. Ureña, M. Gerndt, A. Sikora, and E. César. Specification of Periscope Tuning Framework Plugins. In Proceedings of the International Conference on Parallel Computing, pages 123--132, 2013.Google ScholarGoogle Scholar
  14. W. E. Nagel, A. Arnold, M. Weber, H. C. Hoppe, and K. Solchenbach. VAMPIR: Visualization and Analysis of MPI Resources. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 69--80, 1996.Google ScholarGoogle Scholar
  15. S. Pellegrini, T. Fahringer, H. Jordan, and H. Moritsch. Automatic Tuning of MPI Runtime Parameter Settings by Using Machine Learning. In Proceedings of the ACM International Conference on Computing Frontiers, pages 115--116, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Pellegrini, J. Wang, T. Fahringer, and H. Moritsch. Optimizing MPI Runtime Parameter Settings by Using Machine Learning. In Proceedings of the European PVM/MPI Users' Group Meeting, pages 196--206, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Reinders. VTune Performance Analyzer Essentials. Intel Press, Hillsboro, 1st edition, 2005.Google ScholarGoogle Scholar
  18. M. Schulz, J. Galarowicz, D. Maghrak, and W. Hachfeld. Open | SpeedShop: An open source infrastructure for parallel performance analysis. Scientific Programming, 16(2-3):105--121, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Shende and A. D. Malony. The Tau Parallel Performance System. International Journal of High Performance Computing Applications, 20(2):287--311, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. S. Simmons and K. W. Schulz. A Distributed Memory Out-of-core Method on HPC Clusters and Its Application to Quantum Chemistry Applications. In Proceedings of the Conference of the Extreme Science and Engineering Discovery Environment, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. S. Vetter and M. O. McCracken. Statistical Scalability Analysis of Communication Operations in Distributed Applications. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Vienne, J. Chen, M. Wasi-Ur-Rahman, N. S. Islam, H. Subramoni, and D. K. Panda. Performance Analysis and Evaluation of InfiniBand FDR and 40GigE RoCE on HPC and Cloud Computing Systems. In Hot Interconnects, pages 48--55, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. West, T. Evans, W. L. Barth, and J. Browne. Multilevel Workload Characterization With Applications in High Performance Computer Systems Management. Submitted to the 2015 IEEE Int. Symposium on Workload Characterization, 2015.Google ScholarGoogle Scholar

Index Terms

  1. MPI Advisor: a Minimal Overhead Tool for MPI Library Performance Tuning

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            EuroMPI '15: Proceedings of the 22nd European MPI Users' Group Meeting
            September 2015
            149 pages
            ISBN:9781450337953
            DOI:10.1145/2802658

            Copyright © 2015 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 21 September 2015

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited

            Acceptance Rates

            EuroMPI '15 Paper Acceptance Rate14of29submissions,48%Overall Acceptance Rate66of139submissions,47%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader