skip to main content
10.1145/1274971.1274984acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article

Characteristics of workloads used in high performance and technical computing

Published:17 June 2007Publication History

ABSTRACT

This paper provides a systematic comparison of various characteristics of computationally-intensive workloads. Our analysis focuses on standard HPC benchmarks and representative applications. For the selected workloads we provide a wide range of characterizations based on instruction tracing and hardware counter measurements.

Each workload is analyzed at the instruction level by comparing the dynamic distribution of executed instructions. We also analyze memory access patterns including various aspects of cache utilization and locality properties of address distributions. Since prefetching plays an important role in the performance of computational workloads, we explore the prefetching potential and for parallel workloads we study the sharing properties of memory accesses. For the purpose of completeness, HPC workloads are compared to two commonly used commercial computing benchmarks.

The results of this work show that the HPC application space is surprisingly diverse, with some codes showing similar data sharing and locality properties with commercial applications. The wide range of studies presented in this paper are instrumental in uncovering the diversity of this application space.

References

  1. http://www.spec.org.Google ScholarGoogle Scholar
  2. D. Bailey, J. Barton, T. Lasinski, and H. Simon. The NAS parallel benchmarks. International Journal of Supercomputing Applications, 27(2):63--73, 1991.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. K. Beyls and E. Hollander. Reuse distance as a metric for cache behavior. In International Conference on Parallel and Distributed Computing Systems, pages 617--662, 2001.Google ScholarGoogle Scholar
  4. R. Brown and I. Sharapov. Parallelization of a molecular modeling application: Programmability comparison between OpenMP and MPI. In Workshop on Productivity and Performance in High-End Computing, February 2006.Google ScholarGoogle Scholar
  5. R. Bunt and J. Murphy. Measurement of locality and the behaviour of programs. The Computer Journal, 27(3):238--245, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. Bunt, J. Murphy, and S. Majumdar. A measure of program locality and its application. In ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pages 28--40, August 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Bunt and C. Williamson. Temporal and spatial locality: A time and place for everything. In International Symposium in Honour of Professor Guenter Haring's 60th Birthday, 2003.Google ScholarGoogle Scholar
  8. L. Carrington, A. Snavely, X. Gao, and N. Wolter. Performance prediction framework for scientic applications. In Lecture Notes in Computer Science, 2659, pages 926--935. Springer, January 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. F. Darema-Rogers, G. Pfister, and K. So. Memory access patterns of parallel scientific programs. In ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pages 46--58. ACM Press, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. J. Denning. The working set model for program behavior. Commun. ACM, 11(5):323--333, 1968. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. J. Denning and S. C. Schwartz. Properties of the working-set model. Commun. ACM, 15(3):191--198, 1972. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Ding and Y. Zhong. Predicting wholeprogram locality through reuse distance analysis. In ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM Press, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Dongarra and P. Luszczek. Introduction to the HPC Challenge benchmark suite. http://icl.cs.utk.edu/hpcc/pubs/. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Dongarra, P. Luszczek, and A. Petitet. The linpack benchmark: Past, present and fugure. Concurrency: Practice and Experience, 15:803--820, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  15. S. J. Eggers. Simulation analysis of data sharing in shared memory multiprocessors. Technical report, University of California at Berkeley, Berkeley, CA, USA, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Ghosh, M. Martonosi, and S. Malik. Cache miss equations: a compiler framework for analyzing and tuning memory behavior. ACM Transactions on Programming Languages and Systems, 21(4):703--746, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. E. H. Gornish, E. D. Granston, and A. V. Veidenbaum. Compiler-directed data prefetching in multiprocessors with memory hierarchies. In ICS '90: Proceedings of the 4th international conference on Supercomputing, pages 354--368, New York, NY, USA, 1990. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Hennessy and D. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. Johnson, M. Merten, and W. Hwu. Runtime spatial locality detection and optimization. In 30th Annual ACM/IEEE International Symposium on Microarchitecture, pages 57--64, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. K. Keeton, D. A. Patterson, Y. Q. He, R. C. Raphael, and W. E. Baker. Performance characterization of a quad pentium pro SMP using OLTP workloads. In ISCA, pages 15--26, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Kumar and S. Wilkerson. Exploiting spatial locality in data caches using spatial footprints. In ISCA '98: Proceedings of the 25th Annual International Symposium on Computer Architecture, pages 357--368, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. Macke. Nab, a language for molecular manipulation. PhD Thesis, The Scripps Research Institute, 1996.Google ScholarGoogle Scholar
  23. J. Peachey, R. Bunt, and C. Colbourn. Towards an intrinsic measure of program locality. In 16th Annual Hawaii International Conference on System Sciences, pages 128--137, 1983.Google ScholarGoogle Scholar
  24. K. Rupnow, A. Rodrigues, K. Underwood, and K. Compton. Scientific applications vs. spec-fp: A comparison of program behavior. In ICS'06: Proceedings of the 20th ACM International Conference on Supercomputing, Cairns, Australia, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. I. Sharapov, R. Kroeger, G. Delamarter, R. Cheveresan, and M. Ramsay. A case study in top-down performance estimation for a large-scale parallel application. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, March 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. J. Smith. Cache memories. ACM Comput. Surv., 14(3):473--530, 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. L. Spracklen, Y. Chou, and S. G. Abraham. Effective instruction prefetching in chip multiprocessors for modern commercial applications. In HPCA '05: Proceedings of the 11th International Symposium on High-Performance Computer Architecture, pages 225--236, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. E. Strohmaier and H. Shan. Architecture independent performance characterization and benchmarking for scientific applications. In International Symposium on Modeling, Analysis and Simulation of Computer Telecommunications Systems, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Torrellas, M. Lam, and J. Hennessy. False sharing and spatial locality in multiprocessor caches. IEEE Transactions on Computers, 43(6):651--663, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. P. Trancoso, J.-L. Larriba-Pey, Z. Zhang, and J. Torrellas. The memory performance of DSS commercial workloads in shared-memory multiprocessors. In Proc. of the 3rd IEEE Symp. on High-Performance Computer Architecture (HPCA-3), 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. Uhlig and T. Mudge. Trace-driven memory simulation: A survey. ACM Computing Surveys, 29(2):128--170, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Weinberg, M. McCracken, A. Snavely, and E. Strohmair. Quantifying locality in the memory access patterns of HPC applications. In Supercomputing, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Characteristics of workloads used in high performance and technical computing

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ICS '07: Proceedings of the 21st annual international conference on Supercomputing
          June 2007
          315 pages
          ISBN:9781595937681
          DOI:10.1145/1274971

          Copyright © 2007 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 17 June 2007

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate584of2,055submissions,28%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader