Abstract
Memory operations in irregular code are difficult to prefetch, as the future address of a memory location is hard to anticipate by a compiler. However, recent studies as well as our experience indicate that many irregular programs contain loads with near-constant strides. This paper presents a novel compiler technique to profile and prefetch for those loads. The profile captures not only the dominant stride values for each profiled load, but also the differences between the successive strides of the load. The profile information helps the compiler to classify load instructions into strongly or weakly strided and single- strided or phased multi-strided. The prefetching decisions guided by the load classifications are highly selective and beneficial. We obtain significant performance improvement for the CPU2000 integer programs running on ItaniumTM machines. For example, we achieve a 1.55x speedup for ″181.mcf″, 1.15x for ″254.gap″, 1.08x for ″197.parser″ and smaller gains in other benchmarks. We also show that the performance gain is stable across profile data sets and that the profiling overhead is low. These benefits make the new technique suitable for a production compiler.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Ball, T. and J. Larus, “Optimally profiling and tracing programs,” ACM Transactions on Programming Languages and Systems, 16(3): 1319–1360, July 1994.
Calder, B., P. Feller, and A. Eustance, “Value Profiling,” MICRO30, Dec. 1997.
Callahan, D., K. Kennedy, and A. Porterfield, “Software Prefetching”, ASPLOS4, 1991, 40–52.
Collins, J., H. Wang, H. Christopher, D. Tullsen, C. J. Hughes, Y. F. Lee, D. Lavery and J. Shen, ‘Speculative Pre-computation: Long-range Prefetching of Delinquent Loads,” ISCA28, 2001.
Dahlgren, F., Stenstrom, P., “Evaluation of Hardware-Based Stride and Sequential Prefetching in Shared-Memory Multiprocessors”, IEEE Transactions on Parallel and Distributed Systems, Vol. 7, No. 4, April 1996.
Doshi, G., R. Krishnaiyer, and K. Muthukumar, “Optimizing Software Data Prefetches with Rotating Registers”, PACT 2001.
Farkas, K., P. Chow, N. Jouppi, and Z. Vranesic, "Memory-system design considerations for dynamically-scheduled processors," ISCA24, June 1997.
Huang, Xianglong, Zhenlin Wang, and K.S. McKinley, “Compiling for the Impulse memory controller,” PACT2001. Pages: 141–150
Intel Corp, “Benchmarks: Intel® ItaniumTM based systems,” http://www.intel.com/eBusiness/products/ia64/overview/bm012101.htm.
Intel Corp, Intel® ItaniumTM Processor Hardware Developer’s Manual, 2000. http://developer.intel.com/design/ia-64/manuals.htm.
Jouppi, N., "Improving direct-mapped cache performance by the addition of a small fully associative cache and prefetch buffers," ISCA17, May 1990
Karlsson, M., F. Dahlgren, and P. Stenstrom, “A Prefetching Technique for Irregular Accesses to Linked Data Structures,” HPCA6, January. 2000
Lipasti, M.H., W.J. Schmidt, S.R. Kunkel, and R.R. Roediger, “SPAID: Software Prefetching in Pointer and Call Intensive Environments”, MICRO28, Nov 1995, 231–236.
Luk, C., "Tolerating Memory Latency through Software-Controlled Pre-Execution in Simultaneous Multithreading Processors," ISCA28, 2001.
Luk, C.K. and T.C. Mowry, “Compiler-Based Prefetching for Recursive Data Structures,” ASPLOS7, September 1996, 222–233.
Mahlke, S.A., D.C. Lin, W.Y. Chen, R.E. Hank, and R.A. Bringmann, “Effective Compiler Support for Predicated Execution Using Hyperblock,” MICRO25, Dec. 1992, pp 45–54.
Mowry, T.C., M.S. Lam, and A. Gupta, “Design and Evaluation of a Compiler Algorithm for Prefetching,” ASPLOS5, October 1992, 62–73.
Palacharla, S. and R. Kessler, "Evaluating stream buffers as secondary cache replacement," ISCA21, April 1994.
Roth, A., and G. Sohi. “Effective Jump-Pointer Prefetching for linked data structures,” ISCA26, June 1999, 111–121.
Santhanam, V., E. Gornish, and W. Hsu, “Data Prefetching on the HP PA-8000,” ISCA24, June 1997, 264–273.
Sherwood, T., S. Sair, B. Calder, "Predictor-Directed Stream Buffers," MICRO33, Dec. 2000.
Standard Performance Evaluation Corporation, “All SPEC CFP2000 Results Published by SPEC,” http://www.spec.org/osg/cpu2000/results/res2001q2/cpu2000-20010522-00663.html, 2001.
Stoutchinin, A., J. N. Amaral, G. Gao, J. Dehnert, S. Jain, and A. Douillet “Speculative Prefetching of Induction Pointers,” CC 2001, April, 2001. Also in LNCS 2207, pp 289–303, 2001.
Wiel, V., S.P., Lilja, D.J. “When caches aren’t enough: data prefetching techniques,” Computer, Volume: 30 Issue: 7, July 1997, Page(s): 23–30
Zilles, C. and G. Sohi, "Execution-based Prediction Using Speculative Slices," ISCA28, 2001.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wu, Y., Serrano, M., Krishnaiyer, R., Li, W., Fang, J. (2002). Value-Profile Guided Stride Prefetching for Irregular Code. In: Horspool, R.N. (eds) Compiler Construction. CC 2002. Lecture Notes in Computer Science, vol 2304. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45937-5_22
Download citation
DOI: https://doi.org/10.1007/3-540-45937-5_22
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43369-9
Online ISBN: 978-3-540-45937-8
eBook Packages: Springer Book Archive