skip to main content
10.1145/2616498.2616535acmotherconferencesArticle/Chapter ViewAbstractPublication PagesxsedeConference Proceedingsconference-collections
research-article

PGDB: A Debugger for MPI Applications

Published: 13 July 2014 Publication History

Abstract

As MPI applications scale to larger machines, errors that had been hidden from testing at smaller scales begin to manifest themselves. It is therefore necessary to extend debuggers to work at these scales, in order for efficient development of correct applications to proceed. PGDB is the Parallel GDB, an open-source debugger for MPI applications that provides such a capability. It is designed from the ground up to be a robust debugging environment at scale, while presenting an interface similar to that of the typical command-line GDB debugger. Its usage on representative debugging problems is demonstrated and its scalability on the Stampede supercomputer is evaluated.

References

[1]
DDT. http://www.allinea.com/products/ddt/. Accessed: 2014-03-10.
[2]
GDB. https://sourceware.org/gdb/. Accessed: 2014-02-20.
[3]
PGDB. https://github.com/ndryden/PGDB. Accessed: 2014-02-20.
[4]
TotalView. http://www.roguewave.com/products/totalview.aspx. Accessed: 2014-03-10.
[5]
D. H. Ahn, D. C. Arnold, B. Supinski, G. L. Lee, B. P. Miller, and M. Schulz. Overcoming scalability challenges for tool daemon launching. In Parallel Processing, 2008. ICPP'08. 37th International Conference on, pages 578--585. IEEE, 2008.
[6]
J. Alameda, W. Spear, J. L. Overbey, K. Huck, G. R. Watson, and B. Tibbitts. The eclipse parallel tools platform: toward an integrated development environment for xsede resources. In Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond, page 48. ACM, 2012.
[7]
D. C. Arnold, D. H. Ahn, B. R. De Supinski, G. L. Lee, B. P. Miller, and M. Schulz. Stack trace analysis for large scale debugging. In Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International, pages 1--10. IEEE, 2007.
[8]
S. M. Balle, B. R. Brett, C.-P. Chen, and D. LaFrance-Linden. Extending a traditional debugger to debug massively parallel applications. Journal of Parallel and Distributed Computing, 64(5):617--628, 2004.
[9]
I. Karlin, J. Keasler, and R. Neely. Lulesh 2.0 updates and changes. Technical Report LLNL-TR-641973, August 2013.
[10]
G. L. Lee, D. H. Ahn, D. C. Arnold, B. R. De Supinski, M. Legendre, B. P. Miller, M. Schulz, and B. Liblit. Lessons learned at 208k: towards debugging millions of cores. In High Performance Computing, Networking, Storage and Analysis, 2008. SC 2008. International Conference for, pages 1--9. IEEE, 2008.
[11]
MPI Forum Working Group on Tools. The MPIR process acquisition interface version 1.0.
[12]
NVIDIA. CUDA-GDB. https://developer.nvidia.com/cuda-gdb. Accessed: 2014-05-27.
[13]
P. C. Roth, D. C. Arnold, and B. P. Miller. Mrnet: A software-based multicast/reduction network for scalable tools. In Proceedings of the 2003 ACM/IEEE conference on Supercomputing, page 21. ACM, 2003.
[14]
S. Sistare, D. Allen, R. Bowker, K. Jourdenais, J. Simons, et al. A scalable debugger for massively parallel message-passing programs. Parallel & Distributed Technology: Systems & Applications, IEEE, 2(2):50--56, 1994.
[15]
Software and Services Group, Intel Corporation. Intel Xeon Phi Product Family: The GNU Project Debugger. https://software.intel.com/sites/default/files/article/256677/intel-mic-gdb.pdf. Accessed: 2014-05-27.

Cited By

View all
  • (2020)Log Discovery for Troubleshooting Open Distributed Systems with TLQPractice and Experience in Advanced Research Computing 2020: Catch the Wave10.1145/3311790.3396633(224-231)Online publication date: 26-Jul-2020
  • (2015)Providing Parallel Debugging for DASH Distributed Data Structures with GDBProcedia Computer Science10.1016/j.procs.2015.05.34551:C(1383-1392)Online publication date: 1-Sep-2015

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
XSEDE '14: Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment
July 2014
445 pages
ISBN:9781450328937
DOI:10.1145/2616498
  • General Chair:
  • Scott Lathrop,
  • Program Chair:
  • Jay Alameda
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

In-Cooperation

  • NSF: National Science Foundation
  • Drexel University
  • Indiana University: Indiana University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 July 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Debugging
  2. Distributed debugging
  3. MPI
  4. Parallel debugging
  5. XSEDE

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

XSEDE '14

Acceptance Rates

XSEDE '14 Paper Acceptance Rate 80 of 120 submissions, 67%;
Overall Acceptance Rate 129 of 190 submissions, 68%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Log Discovery for Troubleshooting Open Distributed Systems with TLQPractice and Experience in Advanced Research Computing 2020: Catch the Wave10.1145/3311790.3396633(224-231)Online publication date: 26-Jul-2020
  • (2015)Providing Parallel Debugging for DASH Distributed Data Structures with GDBProcedia Computer Science10.1016/j.procs.2015.05.34551:C(1383-1392)Online publication date: 1-Sep-2015

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media