Abstract
We explore different prefetch distance-degree combinations and very simple, low-cost adaptive policies on a superscalar core with a high bandwidth, high capacity on-chip memory hierarchy. We show that sequential prefetching aggressiveness can be properly tuned at a very low cost to outperform state-of-the-art hardware data prefetchers and complex filtering mechanisms, avoiding performance losses in hostile applications and keeping the pressure of the prefetching on the cache low, turning it out into a real implementation option for current processors.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Baer, J.L., Chen, T.F.: An Effective On-chip Preloading Scheme to Reduce Data Access Penalty. In: ICS, pp. 176–186 (1991)
Burger, D., Austin, T.: The SimpleScalar Toolset, v. 3.0, www.simplescalar.org
Burger, D., et al.: Filtering Superfluous Prefetches Using Density Vectors. In: ICCD, p. 124 (2001)
Charney, M.J., Reeves, A.P.: Generalized correlation-based hardware prefetching. TR EECEG-95-1, School of Electrical Engineering, Cornell Univ. (February 1995)
Cooksey, R., et al.: A Stateless, Content-Directed Data Prefetching Mechanism. In: ASPLOS-X, S. Jose, CA, pp. 279–290 (October 2002)
Dahlgren, F., et al.: Fixed and Adaptive Sequential Prefetching in Shared-Memory Multiprocessors. In: ICPP, pp. 156–163. CRC Press, Boca Raton (1993)
Dahlgren, F., Stenström, P.: Evaluation of Hardware-Based Stride and Sequential Prefetching in Shared-Memory Multiprocessors. IEEE Trans. Parallel and Distributed Systems 7(4), 385–398 (1996)
Doweck, J.: Inside Intel Core Microarchitecture and Smart Memory Access. White Paper, Intel Corporation (2006)
Goeman, B., et al.: Differential FCM: Increasing Value Prediction Accuracy by Improving Table Usage Efficiency. In: HPCA-7, Monterrey, Mexico, pp. 207–218 (2001)
Gracia, D., et al.: MicroLib: A Case for the Quantitative Comparison of Micro-Architecture Mechanisms. MICRO-37, 43–54 (2004)
Hu, Z., et al.: TCP Tag Correlating Prefetchers, HPCA-9 (2003)
Hur, I., Lin, C.: Memory Prefetching Using Adaptive Stream Detection. MICRO-39, 397–408 (2006)
Ibáñez, P., et al.: Characterization and Improvement of Load/Store Cache-based Prefetching. In: ICS, Melbourne, Australia, pp. 369–376 (July 1998)
Joseph, D., Grunwald, D.: Prefetching Using Markov Predictors. IEEE Trans. on Computer Systems 48(2), 121–133 (1999)
Jouppi, N.: Improving direct-mapped cache performance by addition of a small fully associative cache and prefetch buffers. In: ISCA-17, Seattle, WA (1990)
Kalla, R., et al.: IBM Power5 chip: A dual-core multithreaded processor. IEEE Micro. 24(2), 40–47 (2004)
Krewell, K.: Fujitsu Makes SPARC See Double. Microproc. Report (November 2003)
Lai, A., et al.: Dead-Block Correlating Prefetchers. In: ISCA-28, pp. 144–154 (2001)
Lin, W.F., et al.: Filtering superfluous prefetches using density vectors. In: ICCD 2001, Washington D.C., USA, pp. 124–132. IEEE Comp. Society, Los Alamitos (2001)
Nesbit, K.J., Smith, J.E.: Data Cache Prefetching Using a Global History Buffer. In: HPCA-10, Madrid, Spain, pp. 96–105 (2004)
Nesbit, K.J., Smith, J.E.: Data Cache Prefetching Using a Global History Buffer. IEEE Micro. 25(3), 90–97 (2005)
Ramos, L.M., et al.: Data prefetching in a cache hierarchy with high bandwidth and capacity. SIGARCH Comput. Archit. News 35(4), 37–44 (2007), http://doi.acm.org/10.1145/1327312.1327319
Sair, S., et al.: A Decoupled Predictor-Directed Stream Prefetching Architecture. IEEE Trans. on Computers 52(3), 260–276 (2003)
Sherwood, T., et al.: Automatically Characterizing Large Scale Program Behaviour. In: ASPLOS-X (October 2002)
Smith, A.J.: Sequential Program Prefetching in Memory Hierarchies. IEEE Trans. on Computers 11(12), 7–21 (1978)
Somogyi, S., et al.: Spatial Memory Streaming. In: ISCA-33, pp. 252–263 (2006)
Srinath, S., et al.: Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers. In: HPCA-13, pp. 63–74.
Tendler, J.M., et al.: Power4 system microarchitecture. IBM Journal of Research and Development 46(1), 5–26 (2002)
UltraSPARC III Cu - User’s Manual.Sun Microsystems (January 2004), http://www.sun.com/processors/manuals/USIIIv2.pdf
Zhuang, X., Lee, H.-H.S.: Reducing Cache Pollution via Dynamic Data Prefetch Filtering. IEEE Trans. on computers 56(1), 18–31 (2007)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ramos, L.M., Briz, J.L., Ibáñez, P.E., Viñals, V. (2008). Low-Cost Adaptive Data Prefetching. In: Luque, E., Margalef, T., Benítez, D. (eds) Euro-Par 2008 – Parallel Processing. Euro-Par 2008. Lecture Notes in Computer Science, vol 5168. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85451-7_36
Download citation
DOI: https://doi.org/10.1007/978-3-540-85451-7_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85450-0
Online ISBN: 978-3-540-85451-7
eBook Packages: Computer ScienceComputer Science (R0)