research-article

Extended histories: improving regularity and performance in correlation prefetchers

Authors:
R. Manikantan

Indian Institute of Science, Bangalore, India

Indian Institute of Science, Bangalore, India
View Profile

,
R. Govindarajan

Indian Institute of Science, Bangalore, India

Indian Institute of Science, Bangalore, India
View Profile

,
Kaushik Rajan

Microsoft Research India

Microsoft Research India
View Profile

HiPEAC '11: Proceedings of the 6th International Conference on High Performance and Embedded Architectures and CompilersJanuary 2011Pages 67–76https://doi.org/10.1145/1944862.1944875

Published:24 January 2011Publication History

HiPEAC '11: Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers

Pages 67–76

ABSTRACT

Data Prefetchers identify and make use of any regularity present in the history/training stream to predict future references and prefetch them into the cache. The training information used is typically the primary misses seen at a particular cache level, which is a filtered version of the accesses seen by the cache. In this work we demonstrate that extending the training information to include secondary misses and hits along with primary misses helps improve the performance of prefetchers. In addition to empirical evaluation, we use the information theoretic metric entropy, to quantify the regularity present in extended histories. Entropy measurements indicate that extended histories are more regular than the default primary miss only training stream. Entropy measurements also help corroborate our empirical findings.

With extended histories, further benefits can be achieved by triggering prefetches during secondary misses also. In this paper we explore the design space of extended prefetch histories and alternative prefetch trigger points for delta correlation prefetchers. We observe that different prefetch schemes benefit to a different extent with extended histories and alternative trigger points. Also the best performing design point varies on a per-benchmark basis. To meet these requirements, we propose a simple adaptive scheme that identifies the best performing design point for a benchmark-prefetcher combination at runtime.

In SPEC2000 benchmarks, using all the L2 accesses as history for prefetcher improves the performance in terms of both IPC and misses reduced over techniques that use only primary misses as history. The adaptive scheme improves the performance of CZone prefetcher over Baseline by 4.6% on an average. These performance gains are accompanied by a moderate reduction in the memory traffic requirements.

References

A. Basu, N. Kirman, M. Kirman, M. Chaudhuri, J. F. Martinez, Scavenger: A New Last Level Cache Architecture With Global Block Priority. In Proc. of Int. Symp. on Microarchitecture-40, MICRO 2007. Google ScholarDigital Library
J. Baer and T. Chen, An effective on-chip preloading scheme to reduce data access penalty. In Proc. of Supercomputing'91, 1991. Google ScholarDigital Library
B. Bloom, Space/Time Trade-offs in Hash Coding with Allowable Errors, In Communications of the ACM, July 1970. Google ScholarDigital Library
R. Desikan, D. C. Burger, S. W. Keckler and T. Austin, Sim-alpha: a Validated, Execution-Driven Alpha 21264 Simulator. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-01-23, 2001.Google Scholar
M. Dimitrov and H. Zhou, Combining Local and Global History for High Performance Data Prefetching. In 1^st JILP Data Prefetching Championship, DPC-1.Google Scholar
J. W. C. Fu and J. H. Patel, Stride directed prefetching in scalar processors. In proceeding of Int. Symp. on Microarchitecture-25, 1992. Google ScholarDigital Library
Y. Ishii, M. Inaba and K. Hiraki, Access Map Pattern Matching Prefetch: Optimization Friendly Method. In 1^st JILP Data Prefetching Championship, DPC-1.Google Scholar
D. Joseph and D. Grunwald, Prefetching Using Markov Predictors. In IEEE Transactions on Computer Systems, 1999. Google ScholarDigital Library
N. P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In Proc. of Intl. Symp. on Computer Architecture, ISCA 1990. Google ScholarDigital Library
W. F. Lin, S. K. Reinhardt, D. Burger and T. R. Puzak, Filtering superfluous prefetches using density vectors. In Proc. of ICCD, 2001. Google ScholarDigital Library
K. J. Nesbit, A. S. Dhodapkar and J. E. Smith, AC/DC: An adaptive data cache prefetcher. In Proc. of PACT, 2004. Google ScholarDigital Library
K. J. Nesbit and J. E. Smith, Data Cache Prefetching Using a Global History Buffer. In Proc. of Int. Symp. on High Performance Computer Architecture-10, 2004. Google ScholarDigital Library
M. K. Qureshi, D. N. Lynch, O. Mutlu, Y. N. Patt, A Case for MLP-Aware Cache Replacement. In Proc. of Int. Symp. Computer Architecture-33, 2006. Google ScholarDigital Library
B. M. Rogers, A. Krisha, G. B. Bell, K. Vu, X. Jiang and Y. Solihin, Scaling the bandwidth wall: challenges in and avenues for CMP scaling. In Proc. of Int. Symp. Computer Architecture, ISCA 2009. Google ScholarDigital Library
C. E. Shannon, A Mathematical Theory of Communication, Bell System Technical Journal, vol. 27, pp. 379--423, 623--656, July, October, 1948Google ScholarCross Ref
T. Sherwood, E. Perelman, G. Hamerly and B. Calder, Automatically Characterizing Large Scale Program Behaviour. In Proc. of ASPLOS-X, 2002. Google ScholarDigital Library
T. Sherwood, S. Sair and B. Calder, Predictor-Directed Stream Buffers. In Proc. of Int. Symp. on Microarchitecture-33, 2000. Google ScholarDigital Library
S. Srinath, O. Mutlu, H. Kim, Y. N. Patt, Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers. In Proc. of Int. Symp. on High Performance Computer Architecture-13, 2007. Google ScholarDigital Library
V. Srinivasan, G. S. Tyson and E. S. Davidson, A static filter for reducing prefetch traffic. CSE-TR-400-99, University of Michigan Technical Report, 1999.Google Scholar
Z. Wang, D. Burger, K. McKinley, S. Reinhardt and C. Weems, Guided Region Prefetching: A Cooperative Hardware/Software Approach. In Proc. of Int. Symp. Computer Architecture-30, 2003. Google ScholarDigital Library
X. Zhuang and H. H. S. Lee, A hardware based cache pollution filtering mechanism for aggressive prefetches. In Proc. of ICCP-32, 2003.Google Scholar

Index Terms

Extended histories: improving regularity and performance in correlation prefetchers
1. Computer systems organization
  1. Architectures

Recommendations

Extended data cache prefetching using a reference prediction table
Read More
Performance Implications of Extended Page Tables on Virtualized x86 Processors
Special Topics

Managing virtual memory is an expensive operation, and becomes even more expensive on virtualized servers. Processing TLB misses on a virtualized x86 server requires a twodimensional page walk that can have 6x more page table lookups, hence 6x more ...
Read More
Performance Implications of Extended Page Tables on Virtualized x86 Processors
VEE '16: Proceedings of the12th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments

Managing virtual memory is an expensive operation, and becomes even more expensive on virtualized servers. Process- ing TLB misses on a virtualized x86 server requires a two-dimensional page walk that can have 6x more page table lookups, hence 6x more ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
HiPEAC '11: Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
January 2011
226 pages
ISBN:9781450302418
DOI:10.1145/1944862
General Chairs:
Manolis Katevenis
FORTH-ICS and U.Crete, Greece
,
Margaret Martonosi
Princeton University
,
Program Chairs:
Christos Kozyrakis
Stanford University
,
Olivier Temam
INRIA, France
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 January 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 149
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.