skip to main content
10.1145/1273647.1273651acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
Article

A debugger for flow graph based parallel applications

Published: 09 July 2007 Publication History

Abstract

Flow graphs provide an explicit description of the parallelization of an application by mapping vertices onto serial computations and edges onto message transfers. We present the design and implementation of a debugger for the flow graph based Dynamic Parallel Schedules (DPS) parallelization framework. We use the flow graph to provide both a high level and detailed picture of the current state of the application execution. We describe how reordering incoming messages enables testing for the presence of message races while debugging a parallel application. The knowledge about causal dependencies between messages enables tracking messages and computations along individual branches of the flow graph. In addition to common features such as restricting the analysis to a subset of threads or processes and attaching sequential debuggers to running processes, the proposed debugger also provides support for message alterations and for message content dependent breakpoints.

References

[1]
D. C. Arnold, D. H. Ahn, B. R. de Supinski, G. L. Lee, B. P. Miller, M. Schulz, Stack Trace Analysis for Large Scale Debugging, Proceedings of the 21st International Parallel and Distributed Processing Symposium (IPDPS'07), p.64, Long Beach, CA, March 2007
[2]
S. M. Balle, B. R. Brett, C.-P. Chen, D. LaFrance-Linden, Extending a traditional debugger to debug massively parallel applications, Journal of Parallel and Distributed Computing, vol. 64, pp. 617--628, 2004
[3]
J.-D. Choi, S. L. Min, Race Frontier: reproducing data races in parallel-program debugging, Proceedings of the 3rd ACM SIGPLAN symposium on Principles and practice of parallel programming (PPoPP' 91), pp. 145--154, 1991
[4]
J. Cownie, W. Gropp, A standard interface for debugger access to message queue information in MPI, PVM/MPI, pp. 51--58, 1999
[5]
Data Display Debugger, http://www.gnu.org/software/ddd
[6]
Etnus, LLC. TotalView, http://www.etnus.com/TotalView
[7]
S. Gerlach, R. D. Hersch, DPS - Dynamic Parallel Schedules, Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS'03), Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS), pp. 15--24, Nice, France, April 2003, see also http://dps.epfl.ch
[8]
S. Gerlach, R.D. Hersch, Fault-tolerant Parallel Applications with Dynamic Parallel Schedules, Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS'05), Workshop on Dependable Parallel, Distributed and Network-Centric Systems (DPDNS), p. 278b, 2005
[9]
R. Hood, The p2d2 Project: Building a Portable Distributed Debugger, SIGMETRICS Symposium on Parallel and Distributed Tools (SPDT), pp. 127--136, Philadelphia, PA, 1996
[10]
C.-E. Hong, B.-S. Lee, G.-W. On, D.-H. Chi, Replay for debugging MPI parallel programs, Proceedings of the MPI Developer's Conference, pp. 156--160, July 1996
[11]
R. Jyothi, O. S. Lawlor, L. V. Kal, Debugging Support for Charm++, Proceedings of the 18th International Parallel and Distributed Symposium (IPDPS'04), Parallel and Distributed Systems: Testing and Debugging Workshop (PADTAD), p. 294, 2004
[12]
R. Kilgore, C. Chase. Re-execution of distributed programs to detect bugs hidden by racing messages. Proceedings of the 30th Hawaii International Conference on System Sciences (HICCS), vol. 1, p. 423, 1997
[13]
E. Kraemer, J. T. Stasko, The Visualization of Parallel Systems: An Overview, Journal of Parallel And Distributed Computing, vol. 18, pp. 105--117, 1993
[14]
D. Kranzlmüller, Scalable Parallel Program Debugging with Process Isolation and Grouping, Proceedings of the 16th International Parallel and Distributed Symposium (IPDPS'02), pp. 109--115, April 2002
[15]
T. J. LeBlanc, J. M. Mellor-Crumey, Debugging parallel programs with instant replay, IEEE Transactions on Computers, C36 (4), pp. 471--481, April 1987.
[16]
T. J. LeBlanc, J. M. Mellor-Crumey, R. J. Fowler, Analyzing Parallel Programs Execution Using Multiple Views, Journal of Parallel and Distributed Computing, vol. 9 (2), pp. 203--217, 1990
[17]
S. S. Lumetta, D. E. Culler, The Mantis Parallel Debugger, SIGMETRICS Symposium on Parallel and Distributed Tools (SPDT), pp. 118--126, Philadelphia, PA, 1996
[18]
Parallel Programming Laboratory, University of Illinois, Urbana-Champaign, The Charm++ Programming Language, Version 6.0, Jan 2004
[19]
J. Postel, Transmission Control Protocol, RFC 793, Sept. 1981
[20]
B. Schaeli, S. Gerlach, R.D. Hersch, Decomposing Partial Order Execution Graph to Improve Message Race Detection, Proceedings of the, 21st International Parallel and Distributed Processing Symposium (IPDPS'07), Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS-TOPMoDRS), p. 187, Long Beach, CA, March 2007
[21]
M. Snir, S. Otto, S. Huss-Lederman, D. Walker, J. Dongarra, MPI: The Complete Reference (Vol. 1), 2nd edition, MIT Press, 1998.
[22]
N. Thoai, D. Kranzlmuller, J. Volkert, Shortcut Replay, A Replay Technique for Debugging Long-Running Parallel Programs, Proceedings of the 7th Asian Computing Science Conference on Advances in Computing Science, Lecture Notes In Computer Science, vol. 2550, pp. 34--46, 2002
[23]
Q. Zheng, G. Cheng, L. Huang, Optimal record and replay for debugging of nondeterministic MPI/PVM programs, Proceedings of the 4th International Conference on High Performance Computing in the Asia-Pacific Region, vol. 1, pp. 473--475, May 2000

Cited By

View all
  • (2017)10 Years of research on debugging concurrent and multicore softwareSoftware Quality Journal10.1007/s11219-015-9301-725:1(49-82)Online publication date: 1-Mar-2017
  • (2009)Tools and strategies for debugging distributed stream processing applicationsSoftware—Practice & Experience10.5555/1656321.165632339:16(1347-1376)Online publication date: 1-Nov-2009
  • (2009)Event Recording System in Large Scale Distributed Parallel DSP Processing2009 International Conference on Information Engineering and Computer Science10.1109/ICIECS.2009.5363735(1-4)Online publication date: Dec-2009
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PADTAD '07: Proceedings of the 2007 ACM workshop on Parallel and distributed systems: testing and debugging
July 2007
72 pages
ISBN:9781595937483
DOI:10.1145/1273647
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 July 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. flow graph based debugging
  2. flow graph based parallel applications
  3. message race detection
  4. message reordering
  5. parallel schedules

Qualifiers

  • Article

Conference

ISSTA07
Sponsor:

Upcoming Conference

ISSTA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2017)10 Years of research on debugging concurrent and multicore softwareSoftware Quality Journal10.1007/s11219-015-9301-725:1(49-82)Online publication date: 1-Mar-2017
  • (2009)Tools and strategies for debugging distributed stream processing applicationsSoftware—Practice & Experience10.5555/1656321.165632339:16(1347-1376)Online publication date: 1-Nov-2009
  • (2009)Event Recording System in Large Scale Distributed Parallel DSP Processing2009 International Conference on Information Engineering and Computer Science10.1109/ICIECS.2009.5363735(1-4)Online publication date: Dec-2009
  • (2009)Tools and strategies for debugging distributed stream processing applicationsSoftware: Practice and Experience10.1002/spe.94139:16(1347-1376)Online publication date: 2-Oct-2009
  • (2008)Visual Debugging of MPI ApplicationsProceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface10.1007/978-3-540-87475-1_33(239-247)Online publication date: 7-Sep-2008

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media