skip to main content
10.1145/2751205.2751228acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Mower: A New Design for Non-blocking Misprediction Recovery

Published: 08 June 2015 Publication History

Abstract

Mower is a micro-architecture technique which targets the branch misprediction penalty in superscalar processors. It speeds-up the misprediction recovery process by dynamically evicting stale instructions and correcting the Register Alias Table (RAT) using explicit control dependency tracking. Tracking control dependencies is accomplished by using simple bit matrices. This low-overhead technique allows overlapping of the recovery process with instruction fetching, renaming and scheduling from the correct path. Our evaluation of the mechanism indicates that it yields performance very close to ideal recovery and provides up to 5% speed-up and 2% reduction in power consumption compared to a recovery mechanism using a reorder buffer and a walker. The simplicity of the mechanism should permit easy implementation of Mower in an actual processor.

References

[1]
H. Akkary, R. Rajwar, and S. T. Srinivasan. Checkpoint processing and recovery: Towards scalable large instruction window processors. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-36), pages 423--434. IEEE, 2003.
[2]
P. Akl and A. Moshovos. Branchtap: Improving performance with very few checkpoints through adaptive speculation control. In Proceedings of the 20th Annual International Conference on Supercomputing, ICS '06, pages 36--45, New York, NY, USA, 2006. ACM.
[3]
P. Akl and A. Moshovos. Turbo-rob: a low cost checkpoint/restore accelerator. In High Performance Embedded Architectures and Compilers, pages 258--272. Springer, 2008.
[4]
J. L. Aragón, J. González, A. González, and J. E. Smith. Dual path instruction processing. In Proceedings of the 16th international conference on Supercomputing, pages 220--229. ACM, 2002.
[5]
D. N. Armstrong, H. Kim, O. Mutlu, and Y. N. Patt. Wrong path events: Exploiting unusual and illegal program behavior for early misprediction detection and recovery. In Proceedings of the 37th International Symposium on Microarchitecture (MICRO-37), pages 119--128. IEEE, 2004.
[6]
D. Brooks, V. Tiwari, and M. Martonosi. Wattch: A framework for architectural-level power analysis and optimizations. In Proceedings of the 27th Annual International Symposium on Computer Architecture, ISCA '00, pages 83--94, New York, NY, USA, 2000. ACM.
[7]
J.-L. Cruz, A. Gonzalez, M. Valero, and N. Topham. Multiple-banked register file architectures. In Proceedings of the 27th International Symposium on Computer Architecture, pages 316--325, June 2000.
[8]
A. Gandhi, H. Akkary, and S. T. Srinivasan. Reducing branch misprediction penalty via selective branch recovery. In 10th International Symposium on High Performance Computer Architecture (HPCA), pages 254--264. IEEE Computer Society, 2004.
[9]
A. Golander and S. Weiss. Checkpoint allocation and release. ACM Trans. Archit. Code Optim., 6(3):10:1--10:27, Oct. 2009.
[10]
T. H. Heil and J. E. Smith. Selective dual path execution. Technical report, University of Wisconsin-Madison, 1996.
[11]
A. Hilton, N. Eswaran, and A. Roth. Cprob: Checkpoint processing with opportunistic minimal recovery. In Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 159--168. IEEE, 2009.
[12]
W.-m. W. Hwu and Y. N. Patt. Checkpoint repair for out-of-order execution machines. In Proceedings of the 14th annual international symposium on Computer architecture, pages 18--26. ACM, 1987.
[13]
M. Johnson. Superscalar Microprocessor Design. Prentice Hall, 1991.
[14]
A. Klauser, A. Paithankar, and D. Grunwald. Selective eager execution on the polypath architecture. In ACM SIGARCH Computer Architecture News, volume 26, pages 250--259. IEEE Computer Society, 1998.
[15]
F. Latorre, G. Magklis, J. González, P. Chaparro, and A. González. Crob: implementing a large instruction window through compression. In Transactions on high-performance embedded architectures and compilers III, pages 115--134. Springer, 2011.
[16]
S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi. Mcpat: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In MICRO-42. 42nd Annual IEEE/ACM International Symposium on Microarchitecture, pages 469--480. IEEE, 2009.
[17]
S. Önder and R. Gupta. Automatic generation of microarchitecture simulators. In IEEE International Conference on Computer Languages, pages 80--89, Chicago, May 1998.
[18]
P. G. Sassone, J. Rupley, II, E. Brekelbaum, G. H. Loh, and B. Black. Matrix scheduler reloaded. In Proceedings of the 34th Annual International Symposium on Computer Architecture, ISCA '07, pages 335--346, New York, NY, USA, 2007. ACM.
[19]
J. E. Smith and A. R. Pleszkun. Implementing precise interrupts in pipelined processors. IEEE Transactions on Computers, 37(5):562--573, 1988.
[20]
A. K. Uht, V. Sindagi, and K. Hall. Disjoint eager execution: An optimal form of speculative execution. In Proceedings of the 28th annual international symposium on Microarchitecture, pages 313--325. IEEE Computer Society Press, 1995.
[21]
S. Wallace, B. Calder, and D. M. Tullsen. Threaded multiple path execution. In ACM SIGARCH Computer Architecture News, volume 26, pages 238--249. IEEE Computer Society, 1998.
[22]
P. Zhou, S. Önder, and S. Carr. Fast branch misprediction recovery in out-of-order superscalar processors. In Proceedings of the 19th Annual International Conference on Supercomputing, ICS '05, pages 41--50, New York, NY, USA, 2005. ACM.

Cited By

View all

Index Terms

  1. Mower: A New Design for Non-blocking Misprediction Recovery

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICS '15: Proceedings of the 29th ACM on International Conference on Supercomputing
    June 2015
    446 pages
    ISBN:9781450335591
    DOI:10.1145/2751205
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 June 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    ICS'15
    Sponsor:
    ICS'15: 2015 International Conference on Supercomputing
    June 8 - 11, 2015
    California, Newport Beach, USA

    Acceptance Rates

    ICS '15 Paper Acceptance Rate 40 of 160 submissions, 25%;
    Overall Acceptance Rate 629 of 2,180 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)24
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 07 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media