Article

Replicating memory behavior for performance prediction

Authors:

Aditya Toomula,

Jaspal SubhlokAuthors Info & Claims

LCR '04: Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems

Pages 1 - 8

https://doi.org/10.1145/1066650.1066666

Published: 22 October 2004 Publication History

Abstract

This paper introduces a method to monitor an application and generate a short synthetic "memory skeleton" program whose memory access pattern is representative of the application. In particular, the application and its memory skeleton should have similar cache behavior on any memory hierarchy architecture. The objective is to quickly estimate the cache performance of an application on any memory architecture by running its memory skeleton. The paper presents and validates a framework for automatic construction of memory skeletons. The approach is based on sampling the address trace of an executing application, summarizing it, and then employing it to generate a synthetic memory skeleton program. The broad goal of this research is construction of "performance skeletons" designed to quickly estimate the performance of a large application in an unpredictable environment. A performance skeleton must also mimic the communication and execution behavior of the application. However, the memory behavior drives the performance of many scientific applications and hence memory skeletons are a critical component of this approach to performance estimation.

References

[1]

A. Agarwal and M. Huffman. Blocking: Exploiting spatial locality for trace compaction. Proceedings of the ACM SIGMETRICS conference on Measurement and modeling of computer systems, 1990.

Digital Library

[2]

B. Carlson and T. Wagner. An algorithm for off-line detection of phases in execution profiles. Computer Performance Evaluation, Modeling Techniques and Tools, 7th International Conference, pages 253--265, 1994.

Digital Library

[3]

B. Carlson, T. Wagner, L. Dowdy, and P. Worley. Speedup properties of phases in the execution profile of distributed parallel programs. Computer Performance Evaluation, Modeling Techniques and Tools, pages 83--95, 1992.

[4]

H. Casanova, G. Obertelli, F. Berman, and R. Wolski. The AppLeS Parameter Sweep Template: User-level middleware for the grid. In Supercomputing 2000, pages 75--76, 2000.

Digital Library

[5]

T. Conte, M. Hirsch, and W. Hwu. Combining trace sampling with single pass methods for efficient cache simulation. IEEE Trans. Comput., 1998.

Digital Library

[6]

C. Ding and K. Kennedy. Bandwidth-based performance tuning and prediction. In IASTED, MA, November 1999.

[7]

E. Duesterwald, C. Cascaval, and S. Dwarkadas. Characterizing and predicting program behavior and its variability. In International Conference on Parallel Architectures and Compilation Techniques (PACT), New Orleans, LA, September 2003.

Digital Library

[8]

I. Foster and K. Kesselman. Globus: A metacomputing infrastructure toolkit. Journal of Supercomputer Applications, 11(2):115--128, 1997.

Digital Library

[9]

X. Gao and A. Snavely. Exploiting stability to reduce time-space cost for memory tracing. Workshop on Performance Modeling and Analysis - ICCS, June 2003.

Digital Library

[10]

S. Goteti and J. Subhlok. Communication pattern based node selection for shared networks. In Autonomic Computing Workshop: The Fifth Annual International Workshop on Active Middleware Services (AMS 2003), Seattle, WA, June 2003.

[11]

S. Kaplan, Y. Smaragdakis, and P. Wilson. Trace reduction for virtual memory simulations. Proceedings of the ACM SIGMETRICS conference, 1999.

Digital Library

[12]

R. Kessler, M. Hill, and D. Wood. A comparison of trace-sampling techniques for multi-megabyte caches. IEEE Trans. Comput., C-43:664--675, June 94.

Digital Library

[13]

S. Laha, J. Patel, and R. Iyer. Accurate low-cost methods for performance evaluation of cache memory systems. IEEE Trans. Comput., C-37:1325--1336, Feb 88.

Digital Library

[14]

M. Litzkow, M. Livny, and M. Mutka. Condor --- A hunter of idle workstations. In Proceedings of the Eighth Conference on Distributed Computing Systems, San Jose, California, June 1988.

[15]

C. Lu and D. Reed. Compact application signatures for parallel and distributed scientific codes. In Proceedings of Supercomputing 2002, Baltimore, MD, Nov 2002.

Digital Library

[16]

B. Miller, M. Clark, J. Hollingsworth, S. Kierstead, S. Lim, and T. Torzewski. IPS-2: The second generation of a parallel program measurement system. IEEE Transactions on Parallel Distributed Systems, 1(2):206--217, April 1990.

Digital Library

[17]

C. Nevill-Manning and I. Witten. Compression and explanation usinh hierarchical grammars. The Computer Journal, 1997.

[18]

S. Rubin, R. Bodik, and T. Chilimbi. An efficient profile-analysis framework for data-layout optimization. POPL, 2002.

Digital Library

[19]

A. Samples. Mache: no-loss trace compaction. In Proceedings of the 1989 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, volume 40, pages 89--97, Oakland, California, United States, May 1989.

Digital Library

[20]

G. Shao, F. Berman, and R. Wolski. Master/slave computing on the grid. In 9th Heterogeneous Computing Workshop, pages 3--16, 2000.

Digital Library

[21]

T. Sherwood and B. Calder. Time varying behavior of programs. Technical Report 99-630, UCSD-CS, August 1999.

[22]

T. Sherwood, E. Perelman, and B. Calder. Basic block-dsitribution analysis to find periodic behavior and simulation points in applications. In International Conference on Parallel Architectures and Compilation Techniques (PACT), Sep 2001.

Digital Library

[23]

T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X), San Jose, CA, October 2002.

Digital Library

[24]

A. Snavely, N. Wolter, and L. Carrington. Modeling application performance by convolving machine signatures with application profiles. In IEEE Workshop on Workload Characterization, Austin, TX, 2001.

Digital Library

[25]

A. Snavely, N. Wolter, L. Carrington, R. Badia, J. Labarta, and A. Purkasthaya. A framework to enable performance modeling and prediction. In Supercomputing, 2002.

Digital Library

[26]

S. Sodhi and J. Subhlok. Skeleton based performance prediction on shared networks. In Proceedings of the 4th IEEE Symposium on Cluster Computing and the Grid (CCGrid'04) Workshop on Grids and Advanced Networks (GAN'04)., Chicago, Illinois, April 2004.

Digital Library

[27]

J. Subhlok, P. Lieu, and B. Lowekamp. Automatic node selection for high performance applications on networks. In Proceedings of the Seventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 163--172, Atlanta, GA, May 1999.

Digital Library

[28]

A. Toomula. Construction of memory skeletons for performance prediction. Master's thesis, University of Houston, Houston, TX, December 2004.

[29]

R. Uhlig and T. Mudge. Trace-driven memory simulation: A survey. ACM Computing Surveys, 29(2), 1997.

Digital Library

[30]

Valgrind. http://valgrind.kde.org.

[31]

J. Weismann. Metascheduling: A scheduling model for metacomputing systems. In Seventh IEEE Symposium on High-Performance Distributed Computing, Chicago, IL, July 1998.

Digital Library

[32]

D. Wood, M. Hill, and R. Kessler. A model for estimating trace-sampling miss ratios. ACM SIGMETRICS Performance Evaluation Review, 1991.

Digital Library

[33]

S. Zhou. LSF: load sharing in large-scale heterogeneous distributed systems. In Proceedings of the Workshop on Cluster Computing, Orlando, FL, April 1992.

[34]

J. Ziv and A. Lempel. A universal algorithm for sequential data compression. IEEE Transaction on information theory, pages 337--343, 1977.

Cited By

Sodhi SSubhlok JXu Q(2018)Performance prediction with skeletonsCluster Computing10.1007/s10586-007-0039-211:2(151-165)Online publication date: 24-Dec-2018
https://dl.acm.org/doi/10.1007/s10586-007-0039-2
Kristiansen SPlagemann TGoebel V(2015)A Methodology to Model the Execution of Communication Software for Accurate Network SimulationACM Transactions on Modeling and Computer Simulation (TOMACS)10.1145/274623326:1(1-31)Online publication date: 21-Jul-2015
https://dl.acm.org/doi/10.1145/2746233
Xu QSubhlok J(2008)Construction and evaluation of coordinated performance skeletonsProceedings of the 15th international conference on High performance computing10.5555/1791889.1791901(73-86)Online publication date: 17-Dec-2008
https://dl.acm.org/doi/10.5555/1791889.1791901
Show More Cited By

Replicating memory behavior for performance prediction
1. General and reference
  1. Cross-computing tools and techniques

Recommendations

Next high performance and low power flash memory package structure

In general, NAND flash memory has advantages in low power consumption, storage capacity, and fast erase/write performance in contrast to NOR flash. But, main drawback of the NAND flash memory is the slow access time for random read operations. Therefore,...
Energy efficient Phase Change Memory based main memory for future high performance systems
IGCC '11: Proceedings of the 2011 International Green Computing Conference and Workshops

Phase Change Memory (PCM) has recently attracted a lot of attention as a scalable alternative to DRAM for main memory systems. As the need for high-density memory increases, DRAM has proven to be less attractive from the point of view of scaling and ...
Scalable high performance main memory system using phase-change memory technology

The memory subsystem accounts for a significant cost and power budget of a computer system. Current DRAM-based main memory systems are starting to hit the power and cost limit. An alternative memory technology that uses resistance contrast in phase-...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

LCR '04: Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems

October 2004

134 pages

ISBN:9781450377997

DOI:10.1145/1066650

General Chair:
Alan Cox
Rice University
,
Program Chair:
Jaspal Subhlok
University of Houston

Copyright © 2004 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

The Texas Learning & Computation Center
University of Houston

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 October 2004

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

LCR04

Sponsor:

LCR04: 7th Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers

October 22 - 23, 2004

Texas, Houston, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
177
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Sodhi SSubhlok JXu Q(2018)Performance prediction with skeletonsCluster Computing10.1007/s10586-007-0039-211:2(151-165)Online publication date: 24-Dec-2018
https://dl.acm.org/doi/10.1007/s10586-007-0039-2
Kristiansen SPlagemann TGoebel V(2015)A Methodology to Model the Execution of Communication Software for Accurate Network SimulationACM Transactions on Modeling and Computer Simulation (TOMACS)10.1145/274623326:1(1-31)Online publication date: 21-Jul-2015
https://dl.acm.org/doi/10.1145/2746233
Xu QSubhlok J(2008)Construction and evaluation of coordinated performance skeletonsProceedings of the 15th international conference on High performance computing10.5555/1791889.1791901(73-86)Online publication date: 17-Dec-2008
https://dl.acm.org/doi/10.5555/1791889.1791901
Subhlok JXu Q(2008)Automatic construction of coordinated performance skeletons2008 IEEE International Symposium on Parallel and Distributed Processing10.1109/IPDPS.2008.4536405(1-5)Online publication date: Apr-2008
https://doi.org/10.1109/IPDPS.2008.4536405
Xu QSubhlok J(2008)Construction and Evaluation of Coordinated Performance SkeletonsHigh Performance Computing - HiPC 200810.1007/978-3-540-89894-8_10(73-86)Online publication date: 2008
https://doi.org/10.1007/978-3-540-89894-8_10
Xiaofeng Gao Snavely ACarter L(2006)Path Grammar Guided Trace Compression and Trace Approximation2006 15th IEEE International Conference on High Performance Distributed Computing10.1109/HPDC.2006.1652136(57-68)Online publication date: 2006
https://doi.org/10.1109/HPDC.2006.1652136
Sodhi SSubhlok J(2005)Automatic Construction and Evaluation of Performance SkeletonsProceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 0110.1109/IPDPS.2005.117Online publication date: 4-Apr-2005
https://dl.acm.org/doi/10.1109/IPDPS.2005.117

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten