skip to main content
10.1145/1066650.1066666acmotherconferencesArticle/Chapter ViewAbstractPublication PageslcrConference Proceedingsconference-collections
Article

Replicating memory behavior for performance prediction

Published: 22 October 2004 Publication History

Abstract

This paper introduces a method to monitor an application and generate a short synthetic "memory skeleton" program whose memory access pattern is representative of the application. In particular, the application and its memory skeleton should have similar cache behavior on any memory hierarchy architecture. The objective is to quickly estimate the cache performance of an application on any memory architecture by running its memory skeleton. The paper presents and validates a framework for automatic construction of memory skeletons. The approach is based on sampling the address trace of an executing application, summarizing it, and then employing it to generate a synthetic memory skeleton program. The broad goal of this research is construction of "performance skeletons" designed to quickly estimate the performance of a large application in an unpredictable environment. A performance skeleton must also mimic the communication and execution behavior of the application. However, the memory behavior drives the performance of many scientific applications and hence memory skeletons are a critical component of this approach to performance estimation.

References

[1]
A. Agarwal and M. Huffman. Blocking: Exploiting spatial locality for trace compaction. Proceedings of the ACM SIGMETRICS conference on Measurement and modeling of computer systems, 1990.
[2]
B. Carlson and T. Wagner. An algorithm for off-line detection of phases in execution profiles. Computer Performance Evaluation, Modeling Techniques and Tools, 7th International Conference, pages 253--265, 1994.
[3]
B. Carlson, T. Wagner, L. Dowdy, and P. Worley. Speedup properties of phases in the execution profile of distributed parallel programs. Computer Performance Evaluation, Modeling Techniques and Tools, pages 83--95, 1992.
[4]
H. Casanova, G. Obertelli, F. Berman, and R. Wolski. The AppLeS Parameter Sweep Template: User-level middleware for the grid. In Supercomputing 2000, pages 75--76, 2000.
[5]
T. Conte, M. Hirsch, and W. Hwu. Combining trace sampling with single pass methods for efficient cache simulation. IEEE Trans. Comput., 1998.
[6]
C. Ding and K. Kennedy. Bandwidth-based performance tuning and prediction. In IASTED, MA, November 1999.
[7]
E. Duesterwald, C. Cascaval, and S. Dwarkadas. Characterizing and predicting program behavior and its variability. In International Conference on Parallel Architectures and Compilation Techniques (PACT), New Orleans, LA, September 2003.
[8]
I. Foster and K. Kesselman. Globus: A metacomputing infrastructure toolkit. Journal of Supercomputer Applications, 11(2):115--128, 1997.
[9]
X. Gao and A. Snavely. Exploiting stability to reduce time-space cost for memory tracing. Workshop on Performance Modeling and Analysis - ICCS, June 2003.
[10]
S. Goteti and J. Subhlok. Communication pattern based node selection for shared networks. In Autonomic Computing Workshop: The Fifth Annual International Workshop on Active Middleware Services (AMS 2003), Seattle, WA, June 2003.
[11]
S. Kaplan, Y. Smaragdakis, and P. Wilson. Trace reduction for virtual memory simulations. Proceedings of the ACM SIGMETRICS conference, 1999.
[12]
R. Kessler, M. Hill, and D. Wood. A comparison of trace-sampling techniques for multi-megabyte caches. IEEE Trans. Comput., C-43:664--675, June 94.
[13]
S. Laha, J. Patel, and R. Iyer. Accurate low-cost methods for performance evaluation of cache memory systems. IEEE Trans. Comput., C-37:1325--1336, Feb 88.
[14]
M. Litzkow, M. Livny, and M. Mutka. Condor --- A hunter of idle workstations. In Proceedings of the Eighth Conference on Distributed Computing Systems, San Jose, California, June 1988.
[15]
C. Lu and D. Reed. Compact application signatures for parallel and distributed scientific codes. In Proceedings of Supercomputing 2002, Baltimore, MD, Nov 2002.
[16]
B. Miller, M. Clark, J. Hollingsworth, S. Kierstead, S. Lim, and T. Torzewski. IPS-2: The second generation of a parallel program measurement system. IEEE Transactions on Parallel Distributed Systems, 1(2):206--217, April 1990.
[17]
C. Nevill-Manning and I. Witten. Compression and explanation usinh hierarchical grammars. The Computer Journal, 1997.
[18]
S. Rubin, R. Bodik, and T. Chilimbi. An efficient profile-analysis framework for data-layout optimization. POPL, 2002.
[19]
A. Samples. Mache: no-loss trace compaction. In Proceedings of the 1989 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, volume 40, pages 89--97, Oakland, California, United States, May 1989.
[20]
G. Shao, F. Berman, and R. Wolski. Master/slave computing on the grid. In 9th Heterogeneous Computing Workshop, pages 3--16, 2000.
[21]
T. Sherwood and B. Calder. Time varying behavior of programs. Technical Report 99-630, UCSD-CS, August 1999.
[22]
T. Sherwood, E. Perelman, and B. Calder. Basic block-dsitribution analysis to find periodic behavior and simulation points in applications. In International Conference on Parallel Architectures and Compilation Techniques (PACT), Sep 2001.
[23]
T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X), San Jose, CA, October 2002.
[24]
A. Snavely, N. Wolter, and L. Carrington. Modeling application performance by convolving machine signatures with application profiles. In IEEE Workshop on Workload Characterization, Austin, TX, 2001.
[25]
A. Snavely, N. Wolter, L. Carrington, R. Badia, J. Labarta, and A. Purkasthaya. A framework to enable performance modeling and prediction. In Supercomputing, 2002.
[26]
S. Sodhi and J. Subhlok. Skeleton based performance prediction on shared networks. In Proceedings of the 4th IEEE Symposium on Cluster Computing and the Grid (CCGrid'04) Workshop on Grids and Advanced Networks (GAN'04)., Chicago, Illinois, April 2004.
[27]
J. Subhlok, P. Lieu, and B. Lowekamp. Automatic node selection for high performance applications on networks. In Proceedings of the Seventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 163--172, Atlanta, GA, May 1999.
[28]
A. Toomula. Construction of memory skeletons for performance prediction. Master's thesis, University of Houston, Houston, TX, December 2004.
[29]
R. Uhlig and T. Mudge. Trace-driven memory simulation: A survey. ACM Computing Surveys, 29(2), 1997.
[30]
Valgrind. http://valgrind.kde.org.
[31]
J. Weismann. Metascheduling: A scheduling model for metacomputing systems. In Seventh IEEE Symposium on High-Performance Distributed Computing, Chicago, IL, July 1998.
[32]
D. Wood, M. Hill, and R. Kessler. A model for estimating trace-sampling miss ratios. ACM SIGMETRICS Performance Evaluation Review, 1991.
[33]
S. Zhou. LSF: load sharing in large-scale heterogeneous distributed systems. In Proceedings of the Workshop on Cluster Computing, Orlando, FL, April 1992.
[34]
J. Ziv and A. Lempel. A universal algorithm for sequential data compression. IEEE Transaction on information theory, pages 337--343, 1977.

Cited By

View all
  • (2018)Performance prediction with skeletonsCluster Computing10.1007/s10586-007-0039-211:2(151-165)Online publication date: 24-Dec-2018
  • (2015)A Methodology to Model the Execution of Communication Software for Accurate Network SimulationACM Transactions on Modeling and Computer Simulation (TOMACS)10.1145/274623326:1(1-31)Online publication date: 21-Jul-2015
  • (2008)Construction and evaluation of coordinated performance skeletonsProceedings of the 15th international conference on High performance computing10.5555/1791889.1791901(73-86)Online publication date: 17-Dec-2008
  • Show More Cited By
  1. Replicating memory behavior for performance prediction

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    LCR '04: Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems
    October 2004
    134 pages
    ISBN:9781450377997
    DOI:10.1145/1066650
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    • The Texas Learning & Computation Center
    • University of Houston

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 October 2004

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Conference

    LCR04
    Sponsor:

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 17 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)Performance prediction with skeletonsCluster Computing10.1007/s10586-007-0039-211:2(151-165)Online publication date: 24-Dec-2018
    • (2015)A Methodology to Model the Execution of Communication Software for Accurate Network SimulationACM Transactions on Modeling and Computer Simulation (TOMACS)10.1145/274623326:1(1-31)Online publication date: 21-Jul-2015
    • (2008)Construction and evaluation of coordinated performance skeletonsProceedings of the 15th international conference on High performance computing10.5555/1791889.1791901(73-86)Online publication date: 17-Dec-2008
    • (2008)Automatic construction of coordinated performance skeletons2008 IEEE International Symposium on Parallel and Distributed Processing10.1109/IPDPS.2008.4536405(1-5)Online publication date: Apr-2008
    • (2008)Construction and Evaluation of Coordinated Performance SkeletonsHigh Performance Computing - HiPC 200810.1007/978-3-540-89894-8_10(73-86)Online publication date: 2008
    • (2006)Path Grammar Guided Trace Compression and Trace Approximation2006 15th IEEE International Conference on High Performance Distributed Computing10.1109/HPDC.2006.1652136(57-68)Online publication date: 2006
    • (2005)Automatic Construction and Evaluation of Performance SkeletonsProceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 0110.1109/IPDPS.2005.117Online publication date: 4-Apr-2005

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media