Skip to main content
Log in

Alloyed Branch History: Combining Global and Local Branch History for Robust Performance

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

This paper introduces alloyed prediction, a new hardware-based two-level branch predictor organization that combines global and local history in the same structure, combining the advantages of current two-level predictors with those of hybrid predictors. The alloyed organization is motivated by measurements showing that wrong-history mispredictions are even more important than conflict-induced mispredictions. Wrong-history mispredictions arise because current two-level, history-based predictors provide only global or only local history. The contribution of wrong history to the overall misprediction rate is substantial because most programs have some branches that require global history and others that require local history. This paper explores several ways to implement alloyed prediction, including the previously proposed bi-mode organization. Simulations show that mshare is the best alloyed organization among those we examine, and that mshare gives reliably good prediction compared to bimodal (“two-bit”), two-level, and hybrid predictors. The robust performance of alloying across a range of predictor sizes stems from its ability to attack wrong-history mispredictions at even very small sizes without subdividing the branch prediction hardware into smaller and less effective components.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

REFERENCES

  1. D. Parikh, K. Skadron, Y. Zhang, M. Barcella, and M. Stan, Power Issues Related to Branch Prediction, In Proceedings of the Eighth International Symposium on High-Performance Computer Architecture, pp. 233-44 (2002).

  2. L. Gwennap, Digital 21264 Sets New Standard, Microprocessor Report, pp. 11-16 (1996).

  3. P. N. Glaskowsky, Pentium 4 (Partially) Previewed, Microprocessor Report, p.1, 11-13 (2000).

  4. A. Hartstein and T. R. Puzak, The Optimum Pipeline Depth for a Microprocessor, In Proceedings of the 29th Annual International Symposium on Computer Architecture, pp. 7-13 (2002).

  5. M. S. Hrishikesh et al., The Optimal Logic Depth per Pipeline Stage is 6 to 8 FO4 Inverter Delays, In Proceedings of the 29th Annual International Symposium on Computer Architecture, pp. 14-24 (2002).

  6. E. Sprangle and D. Carmean, Increasing Processor Performance by Implementing Deeper Pipelines, In Proceedings of the 29th Annual International Symposium on Computer Architecture, pp. 25-34 (2002).

  7. K. Skadron, P. S. Ahuja, M. Martonosi, and D. W. Clark, Branch Prediction, Instruction-window Size, and Cache Size: Performance Tradeoffs and Simulation Techniques, IEEE Transactions on Computers, 48(11):1260-81 (1999).

    Google Scholar 

  8. N. P. Jouppi and P. Ranganathan, The Relative Importance of Memory Latency, Bandwidth, and Branch Limits to Performance, In The Workshop on Mixing Logic and DRAM: Chips that Compute and Remember (1997), http://iram.cs.berkeley.edu/isca97-workshop.

  9. S.-T. Pan, K. So, and J. T. Rahmeh, Improving the Accuracy of Dynamic Branch Prediction Using Branch Correlation, In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 76-84 (1992).

  10. T.-Y. Yeh and Y. N. Patt, A Comparison of Dynamic Branch Predictors that Use Two Levels of Branch History, In Proceedings of the 20th Annual International Symposium on Computer Architecture, pp. 257-66 (1993).

  11. S. McFarling, Combining Branch Predictors, Tech. Note TN-36, DEC WRL (1993).

  12. A. N. Eden and T. Mudge, The YAGS Branch Prediction Scheme, In Proceedings of the 31st Annual ACM/IEEE International Symposium on Microarchitecture, pp. 69-77 (1998).

  13. P. Michaud, A. Seznec, and R. Uhlig, Trading Conflict and Capacity Aliasing in Conditional Branch Predictors, In Proceedings of the 24th Annual International Symposium on Computer Architecture, pp. 292-303 (1997).

  14. S. Sechrest, C.-C. Lee, and T. Mudge, Correlation and Aliasing in Dynamic Branch Predictors, In Proceedings of the 23rd Annual International Symposium on Computer Architecture, pp. 22-32 (1995).

  15. E. Sprangle, R. S. Chappell, M. Alsup, and Y. N. Patt, The Agree Predictor: A Mechanism for Reducing Negative Branch History Interference, In Proceedings of the 24th Annual International Symposium on Computer Architecture, pp. 284-91 (1997).

  16. P.-Y. Chang, E. Hao, and Y. N. Patt, Alternative Implementations of Hybrid Branch Predictors, In Proceedings of the 28th Annual International Symposium on Microarchitecture, pp. 252-57 (1995).

  17. D. Grunwald, D. Lindsay, and B. Zorn, Static Methods in Hybrid Branch Prediction, In Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques, pp. 222-29 (1998).

  18. D. A. Jiménez, S. W. Keckler, and C. Lin, The Impact of Delay on the Design of Branch Predictors, In Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 67-77 (2000).

  19. R. E. Kessler, E. J. McLellan, and D. A. Webb, The Alpha 21264 Microprocessor Architecture, In Proceedings of the 1998 International Conference on Computer Design, pp. 90-95 (1998).

  20. C.-C. Lee, I-C. K. Chen, and T. N. Mudge, The Bi-Mode Branch Predictor, In Pro-ceedings of the 30th Annual International Symposium on Microarchitecture, pp. 4-13 (1997).

  21. K. Skadron, M. Martonosi, and D. W. Clark, A Taxonomy of Branch Mispredictions, and Alloyed Prediction as a Robust Solution to Wrong-History Mispredictions, In Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques, pp. 199-206 (2000).

  22. J. E. Smith, A Study of Branch Prediction Strategies, In Proceedings of the 8th Annual International Symposium on Computer Architecture, pp. 135-48 (1981).

  23. Digital Semiconductor, Alpha 21164 Microprocessor: Hardware Reference Manual (1995).

  24. T.-Y. Yeh and Y. N. Patt, Two-Level Adaptive Training Branch Prediction, In Proceedings of the 24th Annual International Symposium on Microarchitecture, pp. 51-61 (1991).

  25. P. Song, UltraSparc-3 Aims at MP Servers, Microprocessor Report, pp. 29-34 (1997).

  26. J. P. Shen and M. H. Lipasti, Modern Processor Design, McGraw–Hill, Boston (2003), Beta edition.

    Google Scholar 

  27. K. Skadron, D. W. Clark, and M. Martonosi, Speculative Updates of Local and Global Branch History: A Quantitative Analysis, Journal of Instruction-Level Parallelism (2000), (http://www.jilp.org/vol2).

  28. M. Evers, S. J. Patel, R. S. Chappell, and Y. N. Patt, An Analysis of Correlation and Predictability: What Makes Two-Level Branch Predictors Work, In Proceedings of the 25th Annual International Symposium on Computer Architecture, pp. 52-61 (1998).

  29. I-C. Chen, J. T. Coffey, and T. N. Mudge, Analysis of Branch Prediction via Data Compression, In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 128-37 (1996).

  30. D. C. Burger and T. M. Austin, The SimpleScalar Tool Set, version 2.0, Computer Architecture News, 25(3):13-25 (1997).

    Google Scholar 

  31. J. M. Rabaey, Digital Integrated Circuits: A Design Perspective, Prentice–Hall, 1996.

  32. K. Skadron and P. S. Ahuja, Hydrascalar: A Multipath-Capable Simulator, In Newsletter of the IEEE Technical Committee on Computer Architecture, pp. 65-70 (2001).

  33. S. Jourdan, J. Stark, T.-H. Hsing, and Y. N. Patt, Recovery Requirements of Branch Prediction Storage Structures in the Presence of Mispredicted-Path Execution, International Journal of Parallel Programming, 25(5):363-83 (1997).

    Google Scholar 

  34. K. Skadron, P. S. Ahuja, M. Martonosi, and D. W. Clark, Improving Prediction for Procedure Returns with Return-Address-Stack Repair Mechanisms, In Proceedings of the 31st Annual ACM/IEEE International Symposium on Microarchitecture, pp. 259-71 (1998).

  35. Standard Performance Evaluation Corporation, SPEC CPU95 Benchmarks, http://www.specbench.org/osg/cpu95.

  36. R. Uhlig, D. Nagle, T. Mudge, S. Sechrest, and J. Emer, Instruction Fetching: Coping with Code Bloat, In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pp. 345-56 (1995).

  37. M. D. Smith, Support for Speculative Execution in High-Performance Processors, Ph.D. thesis, Stanford Univ. (1992).

  38. S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta, The SPLASH-2 Programs: Characterization and Methodological Considerations, In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pp. 24-36 (1995).

  39. T. Juan, S. Sanjeevan, and J. J. Navarro, Dynamic History-Length Fitting: A Third Level of Adaptivity for Branch Prediction, In Proceedings of the 25th Annual International Symposium on Computer Architecture, pp. 156-66 (1998).

  40. A. Klauser, S. Manne, and D. Grunwald, Selective Branch Inversion: Confidence Estimation for Branch Predictors, International Journal of Parallel Programming, 29(1):81-110 (2001).

    Google Scholar 

  41. E. Jacobsen, E. Rotenberg, and J. E. Smith, Assigning Confidence to Conditional Branch Predictions, In Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 142-52 (1996).

  42. A. R. Talcott, M. Nemirovsky, and R. C. Wood, The Influence of Branch Prediction Table Interference on Branch Prediction Scheme Performance, In Proceedings of the 1995 International Conference on Parallel Architectures and Compilation Techniques, pp. 89-96 (1995).

  43. C. Young, N. Gloy, and M. D. Smith, A Comparative Analysis of Schemes for Correlated Branch Prediction, In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pp. 276-86 (1995).

  44. P.-Y. Chang, M. Evers, and Y. N. Patt, Improving Branch Prediction Accuracy by Reducing Pattern History Table Interference, In Proceedings of the 1996 International Conference on Parallel Architectures and Compilation Techniques, pp. 48-57 (1996).

  45. M. Evers, P.-Y. Chang, and Y. N. Patt, Using Hybrid Branch Predictors to Improve Branch Prediction Accuracy in the Presence of Context Switches, In Proceedings of the 23rd Annual International Symposium on Computer Architecture, pp. 3-11 (1996).

  46. D. A. Jiménez and C. Lin, Dynamic Branch Prediction with Perceptrons, In Proceedings of the Seventh International Symposium on High-Performance Computer Architecture, pp. 197-206 (2001).

  47. D. A. Jiménez, H. L. Hanson, and C. Lin, Boolean Formula-Based Branch Prediction for Future Technologies, In Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques, pp. 97-106 (2001).

  48. M. Kampe, P. Stenström, and M. Dubois, The FAB Predictor: Using Fourier Analysis to Predict the Outcome of Conditional Branches, In Proceedings of the Eighth International Symposium on High-Performance Computer Architecture, pp. 223-232 (2002).

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lu, Z., Lach, J., Stan, M.R. et al. Alloyed Branch History: Combining Global and Local Branch History for Robust Performance. International Journal of Parallel Programming 31, 137–177 (2003). https://doi.org/10.1023/A:1022669325321

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1022669325321

Navigation