skip to main content
10.1145/1122018.1122054acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesacm-seConference Proceedingsconference-collections
Article
Free access

Automatic benchmark generation for cache optimization of matrix operations

Published: 17 March 1995 Publication History

Abstract

Computationally intensive algorithms must usually be restructured to make the best use of cache memory in current high-performance, hierarchical memory computers. Unfortunately, cache conscious algorithms are sensitive to object sizes and addresses as well as the details of the cache and translation lookaside buffer geometries, and this sensitivity makes both automatic restructuring and hand-turning difficult tasks. An optimization approach is presented in this paper that automatically generates and executes a benchmark program from a concise specification of the algorithm's structure. This technique provides the performance data needed for verification of code generation heuristics or search among the various restructuring options. Matrix transpose and matrix multiplication are examined using this approach for several workstations with restructuring options of loop order, tiling (blocking), and unrolling.

References

[1]
D.H. Bailey, "RISC Microprocessors and Scientific Computing," Proc. Supercomputing '93, Portland, November 1993, pp. 645--654.
[2]
R. Bell, IBM RISC System/6000 Performance Tuning for Numerically Intensive Fortran and C Programs, IBM ITSC Technical Bulletin GG24-3611, October 1990.
[3]
M. Bromley, S. Heller, T. McNerney, and G. L. Steele, Jr., "Fortran at Ten Gigaflops: The Connection Machine Convolution Compiler," Proc. SIGPLAN Conference on Programming Language Design and Implementation (PLDI), Toronto, June 1991, pp. 145--156.
[4]
M. S. Lam, E. E. Rothberg, and M. E. Wolf, "The Cache Performance and Optimizations of Blocked Algorithms," Proc. 4th Intl. Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV), Santa Clara, April 1991, pp. 63--74.
[5]
O. Temam, E. D. Granston, and W. Jalby, "To Copy or Not to Copy: A Compile-Time Technique for Assessing When Data Copying Should be Used to Eliminate Cache Conflicts," Proc. Supercomputing '93, Portland, November 1993, pp. 410--419.
[6]
M. E. Wolf and M. S. Lam, "A Data Locality Optimizing Algorithm," Proc. SIGPLAN Conference on Programming Language Design and Implementation (PLDI), Toronto, June 1991, pp. 30--44.

Cited By

View all
  • (2022)SQL to Stream with S2S: An Automatic Benchmark Generator for the Java Stream APIProceedings of the 21st ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences10.1145/3564719.3568699(179-186)Online publication date: 29-Nov-2022
  • (2019)Spin SummationsACM Transactions on Mathematical Software10.1145/330131945:1(1-22)Online publication date: 14-Mar-2019
  • (2017)TTCACM Transactions on Mathematical Software10.1145/310498844:2(1-21)Online publication date: 16-Aug-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ACMSE '95: Proceedings of the 33rd annual ACM Southeast Conference
March 1995
300 pages
ISBN:0897917472
DOI:10.1145/1122018
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 March 1995

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

ACMSE '95
March 17 - 18, 1995
South Carolina, Clemson

Acceptance Rates

ACMSE '95 Paper Acceptance Rate 47 of 75 submissions, 63%;
Overall Acceptance Rate 502 of 1,023 submissions, 49%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)29
  • Downloads (Last 6 weeks)3
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2022)SQL to Stream with S2S: An Automatic Benchmark Generator for the Java Stream APIProceedings of the 21st ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences10.1145/3564719.3568699(179-186)Online publication date: 29-Nov-2022
  • (2019)Spin SummationsACM Transactions on Mathematical Software10.1145/330131945:1(1-22)Online publication date: 14-Mar-2019
  • (2017)TTCACM Transactions on Mathematical Software10.1145/310498844:2(1-21)Online publication date: 16-Aug-2017
  • (2017)HPTT: a high-performance tensor transposition C++ libraryProceedings of the 4th ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming10.1145/3091966.3091968(56-62)Online publication date: 18-Jun-2017
  • (2016)Statistical Models for Empirical Search-Based Performance TuningThe International Journal of High Performance Computing Applications10.1177/109434200404129318:1(65-94)Online publication date: 26-Jul-2016
  • (2016)TTC: a tensor transposition compiler for multiple architecturesProceedings of the 3rd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming10.1145/2935323.2935328(41-46)Online publication date: 2-Jun-2016
  • (2014)Optimizing matrix multiply using PHiPACACM International Conference on Supercomputing 25th Anniversary Volume10.1145/2591635.2667174(253-260)Online publication date: 10-Jun-2014
  • (2012)Optimizing matrix transposes using a POWER7 cache model and explicit prefetchingACM SIGMETRICS Performance Evaluation Review10.1145/2381056.238107340:2(68-73)Online publication date: 8-Oct-2012
  • (1997)Optimizing matrix multiply using PHiPACProceedings of the 11th international conference on Supercomputing10.1145/263580.263662(340-347)Online publication date: 11-Jul-1997

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media