skip to main content
10.1145/2807591.2807634acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Recovering logical structure from Charm++ event traces

Published: 15 November 2015 Publication History

Abstract

Asynchrony and non-determinism in Charm++ programs present a significant challenge in analyzing their event traces. We present a new framework to organize event traces of parallel programs written in Charm++. Our reorganization allows one to more easily explore and analyze such traces by providing context through logical structure. We describe several heuristics to compensate for missing dependencies between events that currently cannot be easily recorded. We introduce a new task ordering that recovers logical structure from the non-deterministic execution order. Using the logical structure, we define several metrics to help guide developers to performance problems. We demonstrate our approach through two proxy applications written in Charm++. Finally, we discuss the applicability of this framework to other task-based runtimes and provide guidelines for tracing to support this form of analysis.

References

[1]
Hydrodynamics Challenge Problem, Lawrence Livermore National Laboratory. Technical Report LLNL-TR-490254.
[2]
Open Community Runtime. Intel Open Source, 01.org/projects/open-community-runtime, 2012.
[3]
D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, and R. A. Fatoohi. The NAS parallel benchmarks. Int'l J. of Supercomputer Applications, 5(3):63--73, 1991.
[4]
M. Bauer, S. Treichler, E. Slaughter, and A. Aiken. Legion: Expressing locality and independence with logical regions. In Proc. ACM/IEEE Conf. on Supercomputing, SC '12, pages 66:1--66:11, 2012.
[5]
D. Becker, R. Rabenseifner, and F. Wolf. Timestamp synchronization for event traces of large-scale message-passing applications. In Proc. European Conf. on Recent Advances in PVM and MPI, PVM/MPI'07, pages 315--325. Springer-Verlag, 2007.
[6]
W. Blochinger, M. Kaufmann, and M. Siebenhaller. Visualization aided performance tuning of irregular task-parallel computations. Information Visualization, 5(2):81--94, 2006.
[7]
J. Bueno, L. Martinell, A. Duran, M. Farreras, X. Martorell, R. M. Badia, E. Ayguade, and J. Labarta. Productive Cluster Programming with OmpSs. In Euro-Par 2011 Parallel Processing, volume 6852 of Euro-Par'11, pages 555--566. Springer-Verlag, 2011.
[8]
J. C. de Kergommeaux, B. de Oliveira Stein, and B. P. E. Paje, an interactive visualization tool for tuning multi-threaded parallel applications. Parallel Comput., 26(10):1253--1274, Sept. 2000.
[9]
M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In Proc. ACM SIGPLAN 1998 Conf. on Prog. Lang. Design and Implementation, PLDI '98, pages 212--223, 1998.
[10]
E. R. Gansner and S. C. North. An open graph visualization system and its applications to software engineering. Software : Pract. Exper., 30(11):1203--1233, 2000.
[11]
M. Geimer, F. Wolf, B. J. N. Wylie, E. Ábrahám, D. Becker, and B. Mohr. The Scalasca performance toolset architecture. Concurr. Comput.: Pract. Exper., 22(6):702--719, Apr. 2010.
[12]
K. E. Isaacs, P.-T. Bremer, I. Jusufi, T. Gamblin, A. Bhatele, M. Schulz, and B. Hamann. Combing the communication hairball: Visualizing large-scale parallel execution traces using logical time. IEEE Trans. on Vis. and Comp. Graphics, (InfoVis '14), 20(12):2349--2358, 2014.
[13]
K. E. Isaacs, T. Gamblin, A. Bhatele, M. Schulz, B. Hamann, and P.-T. Bremer. Ordering traces logically to identify lateness in message passing programs. IEEE Trans. on Parallel and Distrib. Systems, to appear.
[14]
L. Kalé and S. Krishnan. CHARM++: A Portable Concurrent Object Oriented System Based on C++. In A. Paepcke, editor, Proceedings of OOPSLA'93, pages 91--108, Sept. 1993.
[15]
L. V. Kale and A. Bhatele, editors. Parallel Science and Engineering Applications: The Charm++ Approach. CRC Press, Oct. 2013.
[16]
L. V. Kale, G. Zheng, C. W. Lee, and S. Kumar. Scaling applications to massively parallel machines using Projections performance analysis tool. In Future Generation Comp. Systems Special Issue on: Large-Scale System Perf. Modeling and Analysis, volume 22, pages 347--358, Feb. 2006.
[17]
A. Knüpfer, C. Rössel, D. Mey, S. Biersdorff, K. Diethelm, D. Eschweiler, M. Geimer, M. Gerndt, D. Lorenz, A. Malony, W. Nagel, Y. Oleynik, P. Philippen, P. Saviankou, D. Schmidl, S. Shende, R. TschÃijter, M. Wagner, B. Wesarg, and F. Wolf. Score-P: A joint performance measurement run-time infrastructure for Periscope, Scalasca, TAU, and Vampir. In H. Brunst, M. S. Müller, W. E. Nagel, and M. M. Resch, editors, Tools for High Performance Computing 2011, pages 79--91. Springer Berlin Heidelberg, 2011.
[18]
A. G. Landge, V. Pascucci, A. Gyulassy, J. C. Bennett, H. Kolla, J. Chen, and P.-T. Bremer. In-situ feature extraction of large scale combustion simulations using segmented merge trees. Proc. ACM/IEEE Conf. on Supercomputing, SC'14. Nov. 2014.
[19]
T. J. LeBlanc, J. M. Mellor-Crummey, and R. J. Fowler. Analyzing parallel program executions using multiple views. J. Parallel Distrib. Comput., 9(2):203--217, June 1990.
[20]
C. W. Lee. Techniques in Scalable and Effective Parallel Performance Analysis. PhD thesis, Dept. of Computer Science, University of Illinois, Urbana-Champaign, Dec. 2009.
[21]
C. W. Lee, C. Mendes, and L. V. Kalé. Towards Scalable Performance Analysis and Visualization through Data Reduction. In Int'l Workshop on High-Level Parallel Prog. Models and Supportive Environments, Apr. 2008.
[22]
B. McCandless. Lassen. codesign.llnl.gov/lassen.php, 2013.
[23]
W. E. Nagel, A. Arnold, M. Weber, H. C. Hoppe, and K. Solchenbach. VAMPIR: Visualization and analysis of MPI resources. Supercomputer, 12(1):69--80, 1996.
[24]
V. Pillet, J. Labarta, T. Cortes, and S. Girona. Paraver: A tool to visualize and analyze parallel code. Technical report UPC-CEPBA 95-3, 1995.
[25]
R. Rabenseifner. The controlled logical clock - a global time for trace based software monitoring of parallel applications in workstation clusters. In In Proc. EUROMICRO Workshop on Parallel and Distrib. Processing, PDP, pages 477--484, 1997.
[26]
C. Schaubschläger, D. Kranzlmüller, and J. Volkert. Event-based program analysis with DeWiz. In Proc. Int'l Workshop on Automated Debugging AADEBUG2003, 2003.
[27]
K. B. Wheeler and D. Thain. Visualizing massively multithreaded applications with threadscope. Concurr. Comput. : Pract. Exper., 22(1):45--67, Jan. 2010.
[28]
O. Zaki, E. Lusk, W. Gropp, and D. Swider. Toward scalable performance visualization with Jumpshot. HPC Applications, 13(2):277--288, Fall 1999.

Cited By

View all
  • (2024)Evaluating Communication Pattern Representations in Execution Trace Gantt Charts2024 IEEE Working Conference on Software Visualization (VISSOFT)10.1109/VISSOFT64034.2024.00011(1-11)Online publication date: 6-Oct-2024
  • (2024)Holistic Performance Analysis for Asynchronous Many-Task Runtimes2024 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER59578.2024.00015(85-96)Online publication date: 24-Sep-2024
  • (2021)Daisen: A Framework for Visualizing Detailed GPU ExecutionComputer Graphics Forum10.1111/cgf.1430340:3(239-250)Online publication date: 29-Jun-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2015
985 pages
ISBN:9781450337236
DOI:10.1145/2807591
  • General Chair:
  • Jackie Kern,
  • Program Chair:
  • Jeffrey S. Vetter
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. asynchrony
  2. performance
  3. task-based models
  4. trace analysis

Qualifiers

  • Research-article

Conference

SC15
Sponsor:

Acceptance Rates

SC '15 Paper Acceptance Rate 79 of 358 submissions, 22%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Evaluating Communication Pattern Representations in Execution Trace Gantt Charts2024 IEEE Working Conference on Software Visualization (VISSOFT)10.1109/VISSOFT64034.2024.00011(1-11)Online publication date: 6-Oct-2024
  • (2024)Holistic Performance Analysis for Asynchronous Many-Task Runtimes2024 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER59578.2024.00015(85-96)Online publication date: 24-Sep-2024
  • (2021)Daisen: A Framework for Visualizing Detailed GPU ExecutionComputer Graphics Forum10.1111/cgf.1430340:3(239-250)Online publication date: 29-Jun-2021
  • (2020)Visualizing Distributed System ExecutionsACM Transactions on Software Engineering and Methodology10.1145/337563329:2(1-38)Online publication date: 4-Mar-2020
  • (2019)Visualizing a Moving Target: A Design Study on Task Parallel Programs in the Presence of Evolving Data and ConcernsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2019.2934285(1-1)Online publication date: 2019
  • (2016)MUSAProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3014904.3014965(1-12)Online publication date: 13-Nov-2016
  • (2016)Grain graphsACM SIGPLAN Notices10.1145/3016078.285115651:8(1-13)Online publication date: 27-Feb-2016
  • (2016)Principled workflow-centric tracing of distributed systemsProceedings of the Seventh ACM Symposium on Cloud Computing10.1145/2987550.2987568(401-414)Online publication date: 5-Oct-2016
  • (2016)Runtime-Guided Mitigation of Manufacturing Variability in Power-Constrained Multi-Socket NUMA NodesProceedings of the 2016 International Conference on Supercomputing10.1145/2925426.2926279(1-12)Online publication date: 1-Jun-2016
  • (2016)Grain graphsProceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/2851141.2851156(1-13)Online publication date: 27-Feb-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media