ABSTRACT
Analyzing the performance of large-scale scientific applications is becoming increasingly difficult due to the sheer size of performance data gathered. Recent work on scalable communication tracing applies online interprocess compression to address this problem. Yet, analysis of communication traces requires knowledge about time progression that cannot trivially be encoded in a scalable manner during compression. We develop scalable time stamp encoding schemes for communication traces.
At the same time, our work contributes novel insights into the scalable representation of time stamped data. We show that our representations capture sufficient information to enable what-if explorations of architectural variations and analysis for path-based timing irregularities while not requiring excessive disk space. We evaluate the ability of several time-stamped compressed MPI trace approaches to enable accurate timed replay of communication events. Our lossless traces are orders of magnitude smaller, if not near constant size, regardless of the number of nodes while preserving timing information suitable for application tuning or assessing requirements of future procurements. Our results prove time-preserving tracing without loss of communication information can scale in the number of nodes and time steps, which is a result without precedent.
- The ASCI purple benchmarks.http://www.llnl.gov/asci/purple/benchmarks, 2002.]]Google Scholar
- N. Adiga and et al. An overview of the BlueGene/Lsupercomputer. In Supercomputing, November 2002.]] Google ScholarDigital Library
- Dorian C. Arnold, Dong H. Ahn, Bronis R. de Supinski,Gregory L. Lee, Barton P. Miller, and Martin Schulz. Stack trace analysis for large scale debugging. In International Parallel and Distributed Processing Symposium, 2007.]]Google ScholarCross Ref
- Daniel Becker, Felix Wolf, Wolfgang Frings, Markus Geimer,Brian J.N. Wylie, and Bernd Mohr. Automatic trace-based performance analysis of metacomputing applications. In International Parallel and Distributed Processing Symposium, 2007.]]Google ScholarCross Ref
- Holger Brunst, Hans-Christian Hoppe, Wolfgang E. Nagel, and Manuela Winkler. Performance optimization for large scale computing: The scalable VAMPIR approach. In International Conference on Computational Science (2),pages 751--760, 2001.]] Google ScholarDigital Library
- Marc Casas, Rosa Badia, and Jesus Labarta. Automatic structure extraction from mpi applications tracefiles. In Euro-Par Conference, August 2007.]] Google ScholarDigital Library
- JaeWoong Chung, Chi Cao Minh, Austen McDonald, Travis Skare, Hassan Chafi, Brian D. Carlstrom, Christos Kozyrakis, and Kunle Olukotun. Tradeoffs in transactional memory virtualization. In Architectural Support for Programming Languages and Operating Systems, 2006.]] Google ScholarDigital Library
- F. Freitag, J. Caubet, and J. Labarta. On the scalability of tracing mechanisms. In Euro-Par Conference, pages 97--104, August 2002.]] Google ScholarDigital Library
- M. Geimer, F. Wolf, B. Wylie, and B. Mohr. Scalable parallel trace-based performance analysis. In European PVM/MPI Users' Group Meeting, 2007.]] Google ScholarDigital Library
- Paul Havlak and Ken Kennedy. An implementation of interprocedural bounded regular section analysis. IEEE Transactions on Parallel and Distributed Systems, 2(3):350--360, July 1991.]] Google ScholarDigital Library
- A. Knu"pfer, R. Brendel, H. Brunst, H. Mix, and W. E. Nagel. Introducing the open trace format (OTF). In International Conference on Computational Science, pages 526--533, May 2006.]] Google ScholarDigital Library
- Andreas Knupfer. Construction and compression of complete call graphs for post-mortem program trace analysis. In International Conference on Parallel Processing, pages 165--172, 2005.]] Google ScholarDigital Library
- D. E. Knuth. The Art of Computer Programming: Fundamental Algorithms, volume 2. Addison-Wesley, 2edition, 1973.]]Google Scholar
- J. Marathe, F. Mueller, T. Mohan, B. R. de Supinski, S. A.McKee, and A. Yoo. METRIC: Tracking down inefficiencies in the memory hierarchy via binary rewriting. In International Symposium on Code Generation and Optimization, pages 289-300, March 2003.]] Google ScholarDigital Library
- M. Mesnier, M. Wachs, R. Sambasivan, J. Lopez, J. Hendricks, and G. R. Ganger. //trace: Parallel trace replay with approximate causal events. In USENIX Conference on File and Storage Technologies, February 2007.]] Google ScholarDigital Library
- W. E. Nagel, A. Arnold, M. Weber, H. C. Hoppe, and K. Solchenbach. VAMPIR: Visualization and analysis of MPIresources. Supercomputer, 12(1):69--80, 1996.]]Google Scholar
- Marcin Neyman, Michal Bukowski, and Piotr Kuzora.Efficient replay of PVM programs. In European PVM/MPI Users' Group Meeting on Recent Advances in Parallel VirtualMachine and Message Passing Interface, pages 83--90, 1999.]] Google ScholarDigital Library
- M. Noeth, F. Mueller, M. Schulz, and B. R. de Supinski. Scalable compression and replay of communication traces in massively parallel environments. In International Parallel and Distributed Processing Symposium, April 2007.]] Google ScholarDigital Library
- V. Pillet, J. Labarta, T. Cortes, and S. Girona. PARAVER: A tool to visualise and analyze parallel code. In Proceedings of WoTUG-18: Transputer and occam Developments,volume 44 of Transputer and Occam Engineering, pages 17--31, April 1995.]]Google Scholar
- Philip C. Roth, Dorian C. Arnold, and Barton P. Miller. MRNet: A software-based multicast/reduction network for scalable tools. In Supercomputing, pages 21--36, Washington, DC, USA, 2003. IEEE Computer Society.]] Google ScholarDigital Library
- Martin Schulz and Bronis R. de Supinski. PNMPI tools: A whole lot greater than the sum of their parts. In Supercomputing, 2007.]] Google ScholarDigital Library
- J. Vetter and M. McCracken. Statistical scalability analysis of communication operations in distributed applications. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2001.]] Google ScholarDigital Library
- F. Wong, R. Martin, R. Arpaci-Dusseau, and D. Culler. Architectural requirements and scalability of the NAS parallel benchmarks. In Supercomputing, 1999.]] Google ScholarDigital Library
- O. Zaki, E. Lusk, W. Gropp, and D. Swider. Toward scalable performance visualization with Jumpshot. International Journal of High Performance Computing Applications,13(3):277--288, 1999.]] Google ScholarDigital Library
Index Terms
- Preserving time in large-scale communication traces
Recommendations
ScalaTrace: tracing, analysis and modeling of HPC codes at scale
PARA'10: Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2Characterizing the communication behavior of large-scale applications is a difficult and costly task due to code/system complexity and their long execution times. An alternative to running actual codes is to gather their communication traces and then ...
ScalaExtrap: trace-based communication extrapolation for spmd programs
PPoPP '11Performance modeling for scientific applications is important for assessing potential application performance and systems procurement in high-performance computing (HPC). Recent progress on communication tracing opens up novel opportunities for ...
ScalaExtrap: trace-based communication extrapolation for spmd programs
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programmingPerformance modeling for scientific applications is important for assessing potential application performance and systems procurement in high-performance computing (HPC). Recent progress on communication tracing opens up novel opportunities for ...
Comments