Article

Characteristics of workloads used in high performance and technical computing

Authors:
Razvan Cheveresan

Sun Microsystems, Santa Clara, CA

Sun Microsystems, Santa Clara, CA
View Profile

,
Matt Ramsay

Sun Microsystems, Austin, TX

Sun Microsystems, Austin, TX
View Profile

,
Chris Feucht

Sun Microsystems, Santa Clara, CA

Sun Microsystems, Santa Clara, CA
View Profile

,
Ilya Sharapov

Apple, Cupertino, CA

Apple, Cupertino, CA
View Profile

ICS '07: Proceedings of the 21st annual international conference on SupercomputingJune 2007Pages 73–82https://doi.org/10.1145/1274971.1274984

Published:17 June 2007Publication History

ICS '07: Proceedings of the 21st annual international conference on Supercomputing

Pages 73–82

ABSTRACT

This paper provides a systematic comparison of various characteristics of computationally-intensive workloads. Our analysis focuses on standard HPC benchmarks and representative applications. For the selected workloads we provide a wide range of characterizations based on instruction tracing and hardware counter measurements.

Each workload is analyzed at the instruction level by comparing the dynamic distribution of executed instructions. We also analyze memory access patterns including various aspects of cache utilization and locality properties of address distributions. Since prefetching plays an important role in the performance of computational workloads, we explore the prefetching potential and for parallel workloads we study the sharing properties of memory accesses. For the purpose of completeness, HPC workloads are compared to two commonly used commercial computing benchmarks.

The results of this work show that the HPC application space is surprisingly diverse, with some codes showing similar data sharing and locality properties with commercial applications. The wide range of studies presented in this paper are instrumental in uncovering the diversity of this application space.

References

http://www.spec.org.Google Scholar
D. Bailey, J. Barton, T. Lasinski, and H. Simon. The NAS parallel benchmarks. International Journal of Supercomputing Applications, 27(2):63--73, 1991.Google ScholarDigital Library
K. Beyls and E. Hollander. Reuse distance as a metric for cache behavior. In International Conference on Parallel and Distributed Computing Systems, pages 617--662, 2001.Google Scholar
R. Brown and I. Sharapov. Parallelization of a molecular modeling application: Programmability comparison between OpenMP and MPI. In Workshop on Productivity and Performance in High-End Computing, February 2006.Google Scholar
R. Bunt and J. Murphy. Measurement of locality and the behaviour of programs. The Computer Journal, 27(3):238--245, 1984. Google ScholarDigital Library
R. Bunt, J. Murphy, and S. Majumdar. A measure of program locality and its application. In ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pages 28--40, August 1984. Google ScholarDigital Library
R. Bunt and C. Williamson. Temporal and spatial locality: A time and place for everything. In International Symposium in Honour of Professor Guenter Haring's 60th Birthday, 2003.Google Scholar
L. Carrington, A. Snavely, X. Gao, and N. Wolter. Performance prediction framework for scientic applications. In Lecture Notes in Computer Science, 2659, pages 926--935. Springer, January 2003. Google ScholarDigital Library
F. Darema-Rogers, G. Pfister, and K. So. Memory access patterns of parallel scientific programs. In ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pages 46--58. ACM Press, 1987. Google ScholarDigital Library
P. J. Denning. The working set model for program behavior. Commun. ACM, 11(5):323--333, 1968. Google ScholarDigital Library
P. J. Denning and S. C. Schwartz. Properties of the working-set model. Commun. ACM, 15(3):191--198, 1972. Google ScholarDigital Library
C. Ding and Y. Zhong. Predicting wholeprogram locality through reuse distance analysis. In ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM Press, 2003. Google ScholarDigital Library
J. Dongarra and P. Luszczek. Introduction to the HPC Challenge benchmark suite. http://icl.cs.utk.edu/hpcc/pubs/. Google ScholarDigital Library
J. Dongarra, P. Luszczek, and A. Petitet. The linpack benchmark: Past, present and fugure. Concurrency: Practice and Experience, 15:803--820, 2003.Google ScholarCross Ref
S. J. Eggers. Simulation analysis of data sharing in shared memory multiprocessors. Technical report, University of California at Berkeley, Berkeley, CA, USA, 1989. Google ScholarDigital Library
S. Ghosh, M. Martonosi, and S. Malik. Cache miss equations: a compiler framework for analyzing and tuning memory behavior. ACM Transactions on Programming Languages and Systems, 21(4):703--746, 1999. Google ScholarDigital Library
E. H. Gornish, E. D. Granston, and A. V. Veidenbaum. Compiler-directed data prefetching in multiprocessors with memory hierarchies. In ICS '90: Proceedings of the 4th international conference on Supercomputing, pages 354--368, New York, NY, USA, 1990. ACM Press. Google ScholarDigital Library
J. Hennessy and D. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers, 1990. Google ScholarDigital Library
T. Johnson, M. Merten, and W. Hwu. Runtime spatial locality detection and optimization. In 30th Annual ACM/IEEE International Symposium on Microarchitecture, pages 57--64, 1997. Google ScholarDigital Library
K. Keeton, D. A. Patterson, Y. Q. He, R. C. Raphael, and W. E. Baker. Performance characterization of a quad pentium pro SMP using OLTP workloads. In ISCA, pages 15--26, 1998. Google ScholarDigital Library
S. Kumar and S. Wilkerson. Exploiting spatial locality in data caches using spatial footprints. In ISCA '98: Proceedings of the 25th Annual International Symposium on Computer Architecture, pages 357--368, 1998. Google ScholarDigital Library
T. Macke. Nab, a language for molecular manipulation. PhD Thesis, The Scripps Research Institute, 1996.Google Scholar
J. Peachey, R. Bunt, and C. Colbourn. Towards an intrinsic measure of program locality. In 16th Annual Hawaii International Conference on System Sciences, pages 128--137, 1983.Google Scholar
K. Rupnow, A. Rodrigues, K. Underwood, and K. Compton. Scientific applications vs. spec-fp: A comparison of program behavior. In ICS'06: Proceedings of the 20th ACM International Conference on Supercomputing, Cairns, Australia, 2006. Google ScholarDigital Library
I. Sharapov, R. Kroeger, G. Delamarter, R. Cheveresan, and M. Ramsay. A case study in top-down performance estimation for a large-scale parallel application. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, March 2006. Google ScholarDigital Library
A. J. Smith. Cache memories. ACM Comput. Surv., 14(3):473--530, 1982. Google ScholarDigital Library
L. Spracklen, Y. Chou, and S. G. Abraham. Effective instruction prefetching in chip multiprocessors for modern commercial applications. In HPCA '05: Proceedings of the 11th International Symposium on High-Performance Computer Architecture, pages 225--236, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarDigital Library
E. Strohmaier and H. Shan. Architecture independent performance characterization and benchmarking for scientific applications. In International Symposium on Modeling, Analysis and Simulation of Computer Telecommunications Systems, 2004. Google ScholarDigital Library
J. Torrellas, M. Lam, and J. Hennessy. False sharing and spatial locality in multiprocessor caches. IEEE Transactions on Computers, 43(6):651--663, 1994. Google ScholarDigital Library
P. Trancoso, J.-L. Larriba-Pey, Z. Zhang, and J. Torrellas. The memory performance of DSS commercial workloads in shared-memory multiprocessors. In Proc. of the 3rd IEEE Symp. on High-Performance Computer Architecture (HPCA-3), 1997. Google ScholarDigital Library
R. Uhlig and T. Mudge. Trace-driven memory simulation: A survey. ACM Computing Surveys, 29(2):128--170, 1997. Google ScholarDigital Library
J. Weinberg, M. McCracken, A. Snavely, and E. Strohmair. Quantifying locality in the memory access patterns of HPC applications. In Supercomputing, 2005. Google ScholarDigital Library

Index Terms

Characteristics of workloads used in high performance and technical computing

Recommendations

A performance study of the time-varying cache behavior: a study on APEX, Mantevo, NAS, and PARSEC

Cache has long been used to minimize the latency of main memory accesses by storing frequently used data near the processor. Processor performance depends on the underlying cache performance. Therefore, significant research has been done to identify the ...
Read More
Contrasting characteristics and cache performance of technical and multi-user commercial workloads

Experience has shown that many widely used benchmarks are poor predictors of the performance of systems running commercial applications. Research into this anomaly has long been hampered by a lack of address traces from representative multi-user ...
Read More
Contrasting characteristics and cache performance of technical and multi-user commercial workloads
ASPLOS VI: Proceedings of the sixth international conference on Architectural support for programming languages and operating systems

Experience has shown that many widely used benchmarks are poor predictors of the performance of systems running commercial applications. Research into this anomaly has long been hampered by a lack of address traces from representative multi-user ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICS '07: Proceedings of the 21st annual international conference on Supercomputing
June 2007
315 pages
ISBN:9781595937681
DOI:10.1145/1274971
General Chair:
Burton Smith
Microsoft
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 June 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
HPC
cache coherency
data locality
instruction decomposition
software prefetch
workload characterization
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate584of2,055submissions,28%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 41
  Total Citations
  View Citations
- 778
  Total Downloads
- Downloads (Last 12 months)16
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Characteristics of workloads used in high performance and technical computing

ICS '07: Proceedings of the 21st annual international conference on Supercomputing

ABSTRACT

References

Cited By

Index Terms

Recommendations

A performance study of the time-varying cache behavior: a study on APEX, Mantevo, NAS, and PARSEC

Contrasting characteristics and cache performance of technical and multi-user commercial workloads

Contrasting characteristics and cache performance of technical and multi-user commercial workloads

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Characteristics of workloads used in high performance and technical computing

ICS '07: Proceedings of the 21st annual international conference on Supercomputing

ABSTRACT

References

Cited By

Index Terms

Recommendations

A performance study of the time-varying cache behavior: a study on APEX, Mantevo, NAS, and PARSEC

Contrasting characteristics and cache performance of technical and multi-user commercial workloads

Contrasting characteristics and cache performance of technical and multi-user commercial workloads

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media