research-article

MPI Advisor: a Minimal Overhead Tool for MPI Library Performance Tuning

Authors:
Esthela Gallardo

The University of Texas at El Paso, Department of Computer Science

The University of Texas at El Paso, Department of Computer Science
View Profile

,
Jerome Vienne

The University of Texas at Austin, Texas Advanced Computing Center

The University of Texas at Austin, Texas Advanced Computing Center
View Profile

,
Leonardo Fialho

The University of Texas at Austin, Texas Advanced Computing Center

The University of Texas at Austin, Texas Advanced Computing Center
View Profile

,
Patricia Teller

The University of Texas at El Paso, Department of Computer Science

The University of Texas at El Paso, Department of Computer Science
View Profile

,
James Browne

The University of Texas at Austin, Texas Advanced Computing Center

The University of Texas at Austin, Texas Advanced Computing Center
View Profile

EuroMPI '15: Proceedings of the 22nd European MPI Users' Group MeetingSeptember 2015Article No.: 6Pages 1–10https://doi.org/10.1145/2802658.2802667

Published:21 September 2015Publication History

EuroMPI '15: Proceedings of the 22nd European MPI Users' Group Meeting

Pages 1–10

ABSTRACT

A majority of parallel applications executed on HPC clusters use MPI for communication between processes. Most users treat MPI as a black box, executing their programs using the cluster's default settings. While the default settings perform adequately for many cases, it is well known that optimizing the MPI environment can significantly improve application performance. Although the existing optimization tools are effective when used by performance experts, they require deep knowledge of MPI library behavior and the underlying hardware architecture in which the application will be executed. Therefore, an easy-to-use tool that provides recommendations for configuring the MPI environment to optimize application performance is highly desirable. This paper addresses this need by presenting an easy-to-use methodology and tool, named MPI Advisor, that requires just a single execution of the input application to characterize its predominant communication behavior and determine the MPI configuration that may enhance its performance on the target combination of MPI library and hardware architecture. Currently, MPI Advisor provides recommendations that address the four most commonly occurring MPI-related performance bottlenecks, which are related to the choice of: 1) point-to-point protocol (eager vs. rendezvous), 2) collective communication algorithm, 3) MPI tasks-to-cores mapping, and 4) Infiniband transport protocol. The performance gains obtained by implementing the recommended optimizations in the case studies presented in this paper range from a few percent to more than 40%. Specifically, using this tool, we were able to improve the performance of HPCG with MVAPICH2 on four nodes of the Stampede cluster from 6.9 GFLOP/s to 10.1 GFLOP/s. Since the tool provides application-specific recommendations, it also informs the user about correct usage of MPI.

References

F. Broquedis, J. Clet-Ortega, S. Moreaud, N. Furmento, B. Goglin, G. Mercier, S. Thibault, and R. Namyst. hwloc: a Generic Framework for Managing Hardware Affinities in HPC Applications. In Proceedings of the Euromicro Conference on Parallel, Distributed and Network-Based Computing, 2010. Google ScholarDigital Library
M. Chaarawi, J. Squyres, E. Gabriel, and S. Feki. A Tool for Optimizing Runtime Parameters of Open MPI. In Proceedings of the European PVM/MPI Users' Group Meeting, pages 210--217, 2008. Google ScholarDigital Library
J. Dongarra and M. A. Heroux. Toward a New Metric for Ranking High Performance Computing Systems. Technical Report SAND2013-4744, Sandia National Laboratories, 2013.Google Scholar
L. Fialho and J. Browne. Framework and Modular Infrastructure for Automation of Architectural Adaptation and Performance Optimization for HPC Systems. In Proceedings of the International Conference on Supercomputing, pages 261--277, 2014. Google ScholarDigital Library
M. Geimer, P. Saviankou, A. Strube, Z. Szebenyi, F. Wolf, and B. J. N. Wylie. Further Improving the Scalability of the Scalasca Toolset. In Proceedings of the PARA 2010: State of the Art in Scientific and Parallel Computing, pages 463--473, 2012. Google ScholarDigital Library
M. Gerndt and M. Ott. Automatic performance analysis with Periscope. Concurrency and Computation: Practice and Experience, 22(6):736--748, 2009. Google ScholarDigital Library
M. E. Harding, T. Metzroth, J. Gauss, and A. A. Auer. Parallel Calculation of CCSD and CCSD(T) Analytic First and Second Derivatives. Journal of Chemical Theory and Computation, 4(1):64--74, 2008.Google ScholarCross Ref
E. Jeannot, G. Mercier, and F. Tessier. Process placement in multicore clusters: Algorithmic issues and practical techniques. IEEE Transactions on Parallel and Distributed Systems, 25(4):993--1002, 2014. Google ScholarDigital Library
T. Kielmann, R. F. H. Hofman, H. E. Bal, A. Plaat, and R. A. F. Bhoedjang. MagPIe: MPI's Collective Communication Operations for Clustered Wide Area Systems. SIGPLAN Notices, 34(8):131--140, 1999. Google ScholarDigital Library
M. J. Koop, J. K. Sridhar, and D. K. Panda. Scalable MPI design over InfiniBand using eXtended Reliable Connection. In Proceedings of the IEEE International Conference on Cluster Computing, pages 203--212, 2008.Google ScholarCross Ref
M. J. Koop, S. Sur, Q. Gao, and D. K. Panda. High Performance MPI Design Using Unreliable Datagram for Ultra-scale InfiniBand Clusters. In Proceedings of the International Conference on Supercomputing, 2007. Google ScholarDigital Library
R. Miceli, G. Civario, A. Sikora, E. César, M. Gerndt, H. Haitof, C. Navarrete, S. Benkner, M. Sandrieser, L. Morin, and F. Bodin. AutoTune: A Plugin-Driven Approach to the Automatic Tuning of Parallel Applications. In Proceedings of the PARA 2012: State of the Art in Scientific and Parallel Computing, pages 328--342, 2013. Google ScholarDigital Library
R. Mijaković, A. P. Soto, I. A. C. Ureña, M. Gerndt, A. Sikora, and E. César. Specification of Periscope Tuning Framework Plugins. In Proceedings of the International Conference on Parallel Computing, pages 123--132, 2013.Google Scholar
W. E. Nagel, A. Arnold, M. Weber, H. C. Hoppe, and K. Solchenbach. VAMPIR: Visualization and Analysis of MPI Resources. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 69--80, 1996.Google Scholar
S. Pellegrini, T. Fahringer, H. Jordan, and H. Moritsch. Automatic Tuning of MPI Runtime Parameter Settings by Using Machine Learning. In Proceedings of the ACM International Conference on Computing Frontiers, pages 115--116, 2010. Google ScholarDigital Library
S. Pellegrini, J. Wang, T. Fahringer, and H. Moritsch. Optimizing MPI Runtime Parameter Settings by Using Machine Learning. In Proceedings of the European PVM/MPI Users' Group Meeting, pages 196--206, 2009. Google ScholarDigital Library
J. Reinders. VTune Performance Analyzer Essentials. Intel Press, Hillsboro, 1st edition, 2005.Google Scholar
M. Schulz, J. Galarowicz, D. Maghrak, and W. Hachfeld. Open | SpeedShop: An open source infrastructure for parallel performance analysis. Scientific Programming, 16(2-3):105--121, 2008. Google ScholarDigital Library
S. Shende and A. D. Malony. The Tau Parallel Performance System. International Journal of High Performance Computing Applications, 20(2):287--311, 2006. Google ScholarDigital Library
C. S. Simmons and K. W. Schulz. A Distributed Memory Out-of-core Method on HPC Clusters and Its Application to Quantum Chemistry Applications. In Proceedings of the Conference of the Extreme Science and Engineering Discovery Environment, 2012. Google ScholarDigital Library
J. S. Vetter and M. O. McCracken. Statistical Scalability Analysis of Communication Operations in Distributed Applications. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming, 2001. Google ScholarDigital Library
J. Vienne, J. Chen, M. Wasi-Ur-Rahman, N. S. Islam, H. Subramoni, and D. K. Panda. Performance Analysis and Evaluation of InfiniBand FDR and 40GigE RoCE on HPC and Cloud Computing Systems. In Hot Interconnects, pages 48--55, 2012. Google ScholarDigital Library
J. West, T. Evans, W. L. Barth, and J. Browne. Multilevel Workload Characterization With Applications in High Performance Computer Systems Management. Submitted to the 2015 IEEE Int. Symposium on Workload Characterization, 2015.Google Scholar

Index Terms

MPI Advisor: a Minimal Overhead Tool for MPI Library Performance Tuning

Recommendations

Employing MPI_T in MPI Advisor to optimize application performance

MPI_T, the MPI Tool Information Interface, was introduced in the MPI 3.0 standard with the aim of enabling the development of more effective tools to support the Message Passing Interface MPI, a standardized and portable message-passing system that is ...
Read More
Kernel-Assisted MPI Collective Communication among Many-core Clusters
CCGRID '12: Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)

Architectural hierarchies and hardware complexity brought by multicore or many-core Clusters, greatly challenge MPI applications' performance in two ways: performance efficiency and cross-platform portability. The cross-platform portability assumption, '...
Read More
MT-MPI: multithreaded MPI for many-core environments
ICS '14: Proceedings of the 28th ACM international conference on Supercomputing

Many-core architectures, such as the Intel Xeon Phi, provide dozens of cores and hundreds of hardware threads. To utilize such architectures, application programmers are increasingly looking at hybrid programming models, where multiple threads interact ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EuroMPI '15: Proceedings of the 22nd European MPI Users' Group Meeting
September 2015
149 pages
ISBN:9781450337953
DOI:10.1145/2802658
Conference Chair:
Jack Dongarra,
Program Chairs:
Alexandre Denis,
Brice Goglin,
Emmanuel Jeannot,
Guillaume Mercier
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 September 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
HPC
MPI
Performance tool
performance optimization
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
EuroMPI '15 Paper Acceptance Rate14of29submissions,48%Overall Acceptance Rate66of139submissions,47%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 16
  Total Citations
  View Citations
- 267
  Total Downloads
- Downloads (Last 12 months)18
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

MPI Advisor: a Minimal Overhead Tool for MPI Library Performance Tuning

EuroMPI '15: Proceedings of the 22nd European MPI Users' Group Meeting

ABSTRACT

References

Cited By

Index Terms

Recommendations

Employing MPI_T in MPI Advisor to optimize application performance

Kernel-Assisted MPI Collective Communication among Many-core Clusters

MT-MPI: multithreaded MPI for many-core environments

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

MPI Advisor: a Minimal Overhead Tool for MPI Library Performance Tuning

EuroMPI '15: Proceedings of the 22nd European MPI Users' Group Meeting

ABSTRACT

References

Cited By

Index Terms

Recommendations

Employing MPI_T in MPI Advisor to optimize application performance

Kernel-Assisted MPI Collective Communication among Many-core Clusters

MT-MPI: multithreaded MPI for many-core environments

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media