skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: PredCom: A Predictive Approach to Collecting Approximated Communication Traces

Journal Article · · IEEE Transactions on Parallel and Distributed Systems

Communication traces collected from MPI applications are an important source of information for performance optimization as they can help analysts determine communication patterns and identify inefficiencies. However, their collection, especially at scale, is time consuming, since it usually requires running the complete target application on a large number of nodes. In this work, we present PredCom, a tool-chain to generate a predictive communication proxy based on information gathered from a few small scale runs, which allows us to extract approximate communication traces with an accuracy high enough for most analysis goals. For this, we combine LLVM passes on the original source code (to capture static program structure) with parameter prediction (to capture dynamic and scaling behavior). This approach drastically reduces the time needed for collecting the communication traces, even for traces on large numbers of MPI processes. Here, we demonstrate that PredCom generates communication traces of various applications up to 1612x faster with an accuracy loss of 0.11 on average compared to the original large-scale traces, and we show that the generated traces can be used to optimize process placement.

Research Organization:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA); Japan Science and Technology Agency (JST); Kayamori Foundation of Informational Science Advancement; Japan Society for the Promotion of Science (JSPS)
Grant/Contract Number:
AC52-07NA27344; JP20H04193
OSTI ID:
1769152
Report Number(s):
LLNL-JRNL-748663; 933780
Journal Information:
IEEE Transactions on Parallel and Distributed Systems, Vol. 32, Issue 1; ISSN 1045-9219
Publisher:
IEEECopyright Statement
Country of Publication:
United States
Language:
English

References (33)

ScalaExtrap: trace-based communication extrapolation for spmd programs conference January 2011
ScalaBenchGen: Auto-Generation of Communication Benchmarks Traces
  • Wu, Xing; Deshpande, Vivek; Mueller, Frank
  • 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2012.114
conference May 2012
The Tau Parallel Performance System journal May 2006
A compiler-based communication analysis approach for multiprocessor systems conference January 2006
Transforming MPI source code based on communication patterns journal January 2010
Program Slicing journal July 1984
Statistical scalability analysis of communication operations in distributed applications
  • Vetter, Jeffrey S.; McCracken, Michael O.
  • Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming - PPoPP '01 https://doi.org/10.1145/379539.379590
conference January 2001
Points-to analysis in almost linear time conference January 1996
Exascaling Your Library conference June 2015
Automatic generation of executable communication specifications from parallel applications conference January 2011
The Scalasca performance toolset architecture journal January 2010
Flow-sensitive pointer analysis for millions of lines of code conference April 2011
Interprocedural slicing using dependence graphs journal January 1990
Recovering logical structure from Charm++ event traces
  • Isaacs, Katherine E.; Bhatele, Abhinav; Lifflander, Jonathan
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/2807591.2807634
conference November 2015
Combing the Communication Hairball: Visualizing Parallel Execution Traces using Logical Time journal December 2014
ScalaTrace: Scalable compression and replay of communication traces for high-performance computing journal August 2009
PEMOGEN: automatic adaptive performance modeling during program runtime conference January 2014
Profile-based power shifting in interconnection networks with on/off links conference November 2015
A regression-based approach to scalability prediction conference January 2008
Fast Multi-parameter Performance Modeling conference September 2016
How The Flang Frontend Works conference November 2017
Mpipp conference January 2006
Using automated performance modeling to find scalability bugs in complex codes
  • Calotoiu, Alexandru; Hoefler, Torsten; Poke, Marius
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13 https://doi.org/10.1145/2503210.2503277
conference January 2013
Scalable performance analysis of exascale MPI programs through signature-based clustering algorithms conference January 2014
Ordering Traces Logically to Identify Lateness in Message Passing Programs journal March 2016
Parallel sparse flow-sensitive points-to analysis conference February 2018
MPIWiz journal February 2009
Points-to analysis with efficient strong updates conference January 2011
Persistent pointer information journal June 2014
Automatic pool allocation conference January 2005
Fact conference January 2009
Quality of service profiling conference January 2010
Level by level conference January 2010

Similar Records

Scalable I/O Tracing and Analysis
Conference · Thu Jan 01 00:00:00 EST 2009 · OSTI ID:1769152

Probabilistic Communication and I/O Tracing with Deterministic Replay at Scale
Conference · Sat Jan 01 00:00:00 EST 2011 · OSTI ID:1769152

ScalaTrace: Tracing, Analysis and Modeling of HPC Codes at Scale
Conference · Wed Mar 31 00:00:00 EDT 2010 · OSTI ID:1769152