Skip to main content
Log in

Execution History Guided Instruction Prefetching

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The increasing gap in performance between processors and main memory has made effective instructions prefetching techniques more important than ever. A major deficiency of existing prefetching methods is that most of them require an extra port to I-cache. A recent study by Rivers et al. [19] shows that this factor alone explains why most modern microprocessors do not use such hardware-based I-cache prefetch schemes. The contribution of this paper is two-fold. First, we present a method that does not require an extra port to I-cache. Second, the performance improvement for our method is greater than the best competing method BHGP [23] even disregarding the improvement from not having an extra port. The three key features of our method that prevent the above deficiencies are as follows. First, late prefetching is prevented by correlating misses to dynamically preceding instructions. For example, if the I-cache miss latency is 12 cycles, then the instruction that was fetched 12 cycles prior to the miss is used as the prefetch trigger. Second, the miss history table is kept to a reasonable size by grouping contiguous cache misses together and associated them with one preceding instruction, and therefore, one table entry. Third, the extra I-cache port is avoided through efficient prefetch filtering methods. Experiments show that for our benchmarks, chosen for their poor I-cache performance, an average improvement of 9.2% in runtime is achieved versus the BHGP methods [23], while the hardware cost is also reduced. The improvement will be greater if the runtime impact of avoiding an extra port is considered. When compared to the original machine without prefetching, our method improves performance by about 35% for our benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. A. Ailamaki, D. J. DeWitt, M. D. Hill, and D. A. Wood. DBMSs on a modern processor: Where does time go? In The VLDB Journal, 266–277, 1999.

  2. Alpha Architecture Handbook, Digital Equipment Corporation, Maynard, MA, 1994.

  3. D. Burger and T. Austin. The SimpleScalar Tool Set, Version 2.0. Technical report TR 1342. University of Wisconsin, Madison, WI, June 1997.

    Google Scholar 

  4. S. P. E. Corporation. The SPEC benchmark suites. http://www.spec.org/.

  5. A. M. Grizzaffi, M. Colette, M. Donnelly, and B. R. Olszewski. Contrasting characteristics and cache performance of technical and multi-user commercial workloads. In Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 145–155, October 1994.

  6. J. Hennessy and D. Patterson. Computer Architecture: A Quantitative Approach, 2nd edn. Morgan Kaufmann, Palo Alto, CA, 1996.

    Google Scholar 

  7. D. S. Henry, B. C. Kuszmaul, G. H. Loh, and R. Sami. Circuits for wide-window superscalar processors. In Proceedings of the 27th International Symposium on Computer Architecture (ISCA), pp. 236–247, Vancouver, British Columbia, Canada, June 2000.

  8. W. C. Hsu and J. E. Smith. A performance study of instruction cache prefetching methods. IEEE Transactions on Computers, 47(5):497–508, May 1998.

    Google Scholar 

  9. Intel IA-64 Architecture Software Developer's Manual, Volumes I–IV. Intel Corporation, January 2000. Also available at http://developer.intel.com

  10. Intel(R) Itanium(TM) Processor Hardware Developer's Manual. Intel Corporation, August 2001.

  11. D. Joseph and D. Grunwald. Prefetching using Markov predictors. IEEE Transactions on Computers, 48(2):121–133, 1999.

    Google Scholar 

  12. G. Lauterbach and T. Horel. UltraSPARC-III: designing third generation 64-bit performance. IEEE Micro, 19(3):56–66, 1999.

    Google Scholar 

  13. C. K. Luk and T. C. Mowry. Cooperative instruction prefetching in modern processors. In Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture, pp. 182–194, November 30–December 2, 1998.

  14. Y. Patt, S. J. Patel, M. Evers, D. H. Friendly, and J. Stark. One billion transistors, one uniprocessor, one chip. IEEE Computer, 30(9):51–58, September 1997.

    Google Scholar 

  15. J. Pierce and T. N. Mudge. Wrong-path instruction prefetching. In International Symposium on Microarchitecture, 165–175, 1996.

  16. IBM Regains Performance Lead with Power2. Microprocessor Report, October 1993.

  17. PowerPC 740/PowerPC 750 RISC Microprocessor User's Manual. IBM Corporation, 1999.

  18. G. Reinman, B. Calder, and T. Austin. Fetch directed instruction prefetching. In Proceedings of the 32nd Annual ACM/IEEE international symposium on microarchitecture on MICRO-32, pp. 16–27, Haifa Israel, November 1999.

  19. J. A. Rivers, G. S. Tyson, E. S. Davidson, and T. M. Austin. On high-bandwidth data cache design for multi-issue processors. In Proceedings of the Thirtieth Annual IEEE/ACM International Symposium on Microarchitecture, pp. 46–56, December 1–3, 1997.

  20. International Technology Roadmap for Semiconductors, 1998. Update. Semiconductor Industry Association, p. 4, 1998.

  21. K. Skadron, P. S. Ahuja, M. Martonosi, and D. W. Clark. Improving prediction for procedure returns with return-address-stack repair mechanisms. In International Symposium on Microarchitecture, pp. 259–271, 1998.

  22. A. J. Smith. Cache memories. ACM Computing Surveys, 14(3):473–530, September 1982.

    Google Scholar 

  23. V. Srinivasan, E. S. Davidson, G. S. Tyson, M. J. Charney, and T. R. Puzak. Branch history guided instruction prefetching. In Proceedings of the 7th International Conference on High Performance Computer Architecture (HPCA), pp. 291–300, Monterrey, Mexico, January 2001.

  24. J. Tse and A. J. Smith. CPU cache prefetching: timing evaluation of hardware implementations. IEEE Transactions on Computers, 47(5):509–526, May 1998.

    Google Scholar 

  25. K. Yeager, A. Ani, A. Bomdica, G. Shippen, H. Sucar, H. Su, J. Chuang, N. Vasseghi, R. Ramchandani, R. Martin, R. Conrad, Y. Chen, W. Voegtli Jr., M. Seddighnezhad, and Y. Van Atta. MIPS R10000 Superscalar Microprocessor. Hot Chips VII, 1995.

Download references

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Y., Haga, S. & Barua, R. Execution History Guided Instruction Prefetching. The Journal of Supercomputing 27, 129–147 (2004). https://doi.org/10.1023/B:SUPE.0000009319.31230.a9

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:SUPE.0000009319.31230.a9

Navigation