research-article

Preserving time in large-scale communication traces

Authors:
Prasun Ratn

North Carolina State University, Raleigh, NC, USA

North Carolina State University, Raleigh, NC, USA
View Profile

,
Frank Mueller

North Carolina State University, Raleigh, NC, USA

North Carolina State University, Raleigh, NC, USA
View Profile

,
Bronis R. de Supinski

Lawrence Livermore National Laboratory, Livermore, CA, USA

Lawrence Livermore National Laboratory, Livermore, CA, USA
View Profile

,
Martin Schulz

Lawrence Livermore National Laboratory, Livermore, CA, USA

Lawrence Livermore National Laboratory, Livermore, CA, USA
View Profile

ICS '08: Proceedings of the 22nd annual international conference on SupercomputingJune 2008Pages 46–55https://doi.org/10.1145/1375527.1375537

Published:07 June 2008Publication History

ICS '08: Proceedings of the 22nd annual international conference on Supercomputing

Pages 46–55

ABSTRACT

Analyzing the performance of large-scale scientific applications is becoming increasingly difficult due to the sheer size of performance data gathered. Recent work on scalable communication tracing applies online interprocess compression to address this problem. Yet, analysis of communication traces requires knowledge about time progression that cannot trivially be encoded in a scalable manner during compression. We develop scalable time stamp encoding schemes for communication traces.

At the same time, our work contributes novel insights into the scalable representation of time stamped data. We show that our representations capture sufficient information to enable what-if explorations of architectural variations and analysis for path-based timing irregularities while not requiring excessive disk space. We evaluate the ability of several time-stamped compressed MPI trace approaches to enable accurate timed replay of communication events. Our lossless traces are orders of magnitude smaller, if not near constant size, regardless of the number of nodes while preserving timing information suitable for application tuning or assessing requirements of future procurements. Our results prove time-preserving tracing without loss of communication information can scale in the number of nodes and time steps, which is a result without precedent.

References

The ASCI purple benchmarks.http://www.llnl.gov/asci/purple/benchmarks, 2002.]]Google Scholar
N. Adiga and et al. An overview of the BlueGene/Lsupercomputer. In Supercomputing, November 2002.]] Google ScholarDigital Library
Dorian C. Arnold, Dong H. Ahn, Bronis R. de Supinski,Gregory L. Lee, Barton P. Miller, and Martin Schulz. Stack trace analysis for large scale debugging. In International Parallel and Distributed Processing Symposium, 2007.]]Google ScholarCross Ref
Daniel Becker, Felix Wolf, Wolfgang Frings, Markus Geimer,Brian J.N. Wylie, and Bernd Mohr. Automatic trace-based performance analysis of metacomputing applications. In International Parallel and Distributed Processing Symposium, 2007.]]Google ScholarCross Ref
Holger Brunst, Hans-Christian Hoppe, Wolfgang E. Nagel, and Manuela Winkler. Performance optimization for large scale computing: The scalable VAMPIR approach. In International Conference on Computational Science (2),pages 751--760, 2001.]] Google ScholarDigital Library
Marc Casas, Rosa Badia, and Jesus Labarta. Automatic structure extraction from mpi applications tracefiles. In Euro-Par Conference, August 2007.]] Google ScholarDigital Library
JaeWoong Chung, Chi Cao Minh, Austen McDonald, Travis Skare, Hassan Chafi, Brian D. Carlstrom, Christos Kozyrakis, and Kunle Olukotun. Tradeoffs in transactional memory virtualization. In Architectural Support for Programming Languages and Operating Systems, 2006.]] Google ScholarDigital Library
F. Freitag, J. Caubet, and J. Labarta. On the scalability of tracing mechanisms. In Euro-Par Conference, pages 97--104, August 2002.]] Google ScholarDigital Library
M. Geimer, F. Wolf, B. Wylie, and B. Mohr. Scalable parallel trace-based performance analysis. In European PVM/MPI Users' Group Meeting, 2007.]] Google ScholarDigital Library
Paul Havlak and Ken Kennedy. An implementation of interprocedural bounded regular section analysis. IEEE Transactions on Parallel and Distributed Systems, 2(3):350--360, July 1991.]] Google ScholarDigital Library
A. Knu"pfer, R. Brendel, H. Brunst, H. Mix, and W. E. Nagel. Introducing the open trace format (OTF). In International Conference on Computational Science, pages 526--533, May 2006.]] Google ScholarDigital Library
Andreas Knupfer. Construction and compression of complete call graphs for post-mortem program trace analysis. In International Conference on Parallel Processing, pages 165--172, 2005.]] Google ScholarDigital Library
D. E. Knuth. The Art of Computer Programming: Fundamental Algorithms, volume 2. Addison-Wesley, 2edition, 1973.]]Google Scholar
J. Marathe, F. Mueller, T. Mohan, B. R. de Supinski, S. A.McKee, and A. Yoo. METRIC: Tracking down inefficiencies in the memory hierarchy via binary rewriting. In International Symposium on Code Generation and Optimization, pages 289-300, March 2003.]] Google ScholarDigital Library
M. Mesnier, M. Wachs, R. Sambasivan, J. Lopez, J. Hendricks, and G. R. Ganger. //trace: Parallel trace replay with approximate causal events. In USENIX Conference on File and Storage Technologies, February 2007.]] Google ScholarDigital Library
W. E. Nagel, A. Arnold, M. Weber, H. C. Hoppe, and K. Solchenbach. VAMPIR: Visualization and analysis of MPIresources. Supercomputer, 12(1):69--80, 1996.]]Google Scholar
Marcin Neyman, Michal Bukowski, and Piotr Kuzora.Efficient replay of PVM programs. In European PVM/MPI Users' Group Meeting on Recent Advances in Parallel VirtualMachine and Message Passing Interface, pages 83--90, 1999.]] Google ScholarDigital Library
M. Noeth, F. Mueller, M. Schulz, and B. R. de Supinski. Scalable compression and replay of communication traces in massively parallel environments. In International Parallel and Distributed Processing Symposium, April 2007.]] Google ScholarDigital Library
V. Pillet, J. Labarta, T. Cortes, and S. Girona. PARAVER: A tool to visualise and analyze parallel code. In Proceedings of WoTUG-18: Transputer and occam Developments,volume 44 of Transputer and Occam Engineering, pages 17--31, April 1995.]]Google Scholar
Philip C. Roth, Dorian C. Arnold, and Barton P. Miller. MRNet: A software-based multicast/reduction network for scalable tools. In Supercomputing, pages 21--36, Washington, DC, USA, 2003. IEEE Computer Society.]] Google ScholarDigital Library
Martin Schulz and Bronis R. de Supinski. PNMPI tools: A whole lot greater than the sum of their parts. In Supercomputing, 2007.]] Google ScholarDigital Library
J. Vetter and M. McCracken. Statistical scalability analysis of communication operations in distributed applications. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2001.]] Google ScholarDigital Library
F. Wong, R. Martin, R. Arpaci-Dusseau, and D. Culler. Architectural requirements and scalability of the NAS parallel benchmarks. In Supercomputing, 1999.]] Google ScholarDigital Library
O. Zaki, E. Lusk, W. Gropp, and D. Swider. Toward scalable performance visualization with Jumpshot. International Journal of High Performance Computing Applications,13(3):277--288, 1999.]] Google ScholarDigital Library

Index Terms

Preserving time in large-scale communication traces

Recommendations

ScalaTrace: tracing, analysis and modeling of HPC codes at scale
PARA'10: Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2

Characterizing the communication behavior of large-scale applications is a difficult and costly task due to code/system complexity and their long execution times. An alternative to running actual codes is to gather their communication traces and then ...
Read More
ScalaExtrap: trace-based communication extrapolation for spmd programs
PPoPP '11

Performance modeling for scientific applications is important for assessing potential application performance and systems procurement in high-performance computing (HPC). Recent progress on communication tracing opens up novel opportunities for ...
Read More
ScalaExtrap: trace-based communication extrapolation for spmd programs
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming

Performance modeling for scientific applications is important for assessing potential application performance and systems procurement in high-performance computing (HPC). Recent progress on communication tracing opens up novel opportunities for ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICS '08: Proceedings of the 22nd annual international conference on Supercomputing
June 2008
390 pages
ISBN:9781605581583
DOI:10.1145/1375527
General Chairs:
Theo Papatheodorou
University of Patras, Greece
,
Utpal Banerjee
Intel (retired), USA
,
Program Chairs:
Avi Mendelson
Intel, Israel
,
Kyle Gallivan
Florida State University, USA
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 June 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
high-performance computing
message passing
tracing
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate584of2,055submissions,28%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 39
  Total Citations
  View Citations
- 304
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Preserving time in large-scale communication traces

ICS '08: Proceedings of the 22nd annual international conference on Supercomputing

ABSTRACT

References

Cited By

Index Terms

Recommendations

ScalaTrace: tracing, analysis and modeling of HPC codes at scale

ScalaExtrap: trace-based communication extrapolation for spmd programs

ScalaExtrap: trace-based communication extrapolation for spmd programs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Preserving time in large-scale communication traces

ICS '08: Proceedings of the 22nd annual international conference on Supercomputing

ABSTRACT

References

Cited By

Index Terms

Recommendations

ScalaTrace: tracing, analysis and modeling of HPC codes at scale

ScalaExtrap: trace-based communication extrapolation for spmd programs

ScalaExtrap: trace-based communication extrapolation for spmd programs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media