Skip to main content
Log in

The Misprediction Recovery Cache

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

In modern processors, deep pipelines couple with superscalar techniques to allow each pipe stage to process multiple instructions. When such a pipe must be flushed and refilled, as when predicted program flow beyond a branch is subsequently recognized as wrong, the temporary performance loss is significant. While modern branch target buffer (BTB) technology makes this flush/refill penalty fairly rare, the penalty that accrues from the remaining branch mispredictions is a serious impediment to even higher processor performance. Advanced mechanisms that can reduce this residual misprediction penalty can be of enormous value in future microprocessor designs. In this paper we describe the design and performance of a promising new mechanism called the Misprediction Recovery Cache (MRC). The key results of our study are. (1) Small, finite sized MRCs (16 to 256 entry) can effectively reduce branch penalty in deeply pipelined processors. (2) Commercial Benchmarks such as the Winstone benchmarks make better use of larger M RCs due to large number of unique branch instructions unlike the predominantly technical SPECint benchmarks. (3) The MRC hit rates increase with increasing BTB prediction accuracy (5-200% depending on MRC size) due to fewer residual mispredictions associated with better prediction. (4) For the processor architecture we studied, the M RC resulted in up to 20% improvement in cpi(cycles per instruction). (5) The incremental performance gain achievable by adding an MRC to a modern CISC processor (which uses a BTB with a two-level predictor) is two to three times of what was achievable by going from a one-level predictor to a two-level predictor.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

REFERENCES

  1. A. K. Nanda, J. O. Bondi, and S. Dutta, Misprediction Recovery Cache (MRC): Concept, analysis, and design, Technical Paper, Texas Instruments, pp. 1–30 (June 1996).

  2. J. O. Bondi, A. K. Nanda and S. Dutta, Integrating a misprediction recovery cache into a superscalar pipeline, Proc. Micro-29 (December 1996).

  3. J. O. Bondi, S. Dutta, and A. K. Nanda, Pipelined microprocessor with branch misprediction cache circuits, systems, and methods. Patent application TI-22458 (June 1996).

  4. J. E. Smith, A study of branch prediction strategies, Proc. ISCA, pp. 135–148 (May 1981).

  5. J. Lee and A. J. Smith, Branch prediction strategies and branch target buffer design, Computer, pp. 6–22 (January 1984).

  6. Intel Corporation, Pentium Processor User's Manual (1993).

  7. IBM Corporation, PowerPC 604 (1995).

  8. T.-Y. Yeh and Y. N. Patt, Alternative Implementations of two-level adaptive branch prediction, Proc. ISCA, pp. 124–134 (1992).

  9. S. T. Pan, K. So, and J. T. Rahmeh, Improving the accuracy of dynamic branch prediction using branch correlation, Proc. ASPLOS-V, pp. 76–84 (October 1992).

  10. Intel Corporation, Pentium-Pro Processor Family Developers' Manual (1995).

  11. The Ziff-Davis Benchmark Operation (ZDBOp) Web site, http://www.zdnet.com/zdbop/.

  12. G. S. Tyson, The effects of predicated execution on branch prediction, Proc. Micro-27, pp. 196–206 (November 1994).

  13. S. A. Mahlke, R. E. Hank, R. A. Bringmann, J. C. Gyllenhaal, D. M. Gallagher, W. W. Hwu, Characterizing the impact of predicated execution on branch prediction, Proc. Micro-27, pp. 217–227 (November 1994).

  14. W. W. Hwu et al., Compiler technology for future microprocessors, Proc. of the IEEE, pp. 1625–1640 (December 1995).

  15. P.-Y. Chang, Eric Hao, Tse-Yu Yeh, and Yale Patt, Branch classification: A new mechanism for improving branch predictor performance, Proc. Micro-27, pp. 22–31.

  16. D. R. Ditzel and H. R. McLellan, Branch folding in the CRISP microprocessor: Reducing branch delay to zero, Proc. ISCA, pp. 2–9 (May 1987).

  17. M. Franklin and M. Smotherman, A fill-unit approach to multiple instruction issue, Proc. Micro-27, pp. 162–171 (November 1994).

Download references

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nanda, A.K., Bondi, J.O. & Dutta, S. The Misprediction Recovery Cache. International Journal of Parallel Programming 26, 383–415 (1998). https://doi.org/10.1023/A:1018798331295

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1018798331295

Navigation