Achieving reliable system performance by fast recovery of branch miss prediction

https://doi.org/10.1016/j.jnca.2011.03.015Get rights and content

Abstract

Today's technology evolution provides users inexpensive and powerful computer systems. However, there are argues that system reliability and fault tolerance is necessary in the systems as well. A proper design for the reliable and fault-tolerant computer system requires a trade-off among cost, reliability, and availability. In this paper, we propose a low-cost recovery scheme for reliable system performance. With this approach, it completely eliminates the roll-back overhead on branch misprediction. Thus, the instruction fetcher does not stop and it fetches instructions from the correct path immediately after the misprediction detected. So, this approach prevents a processor from flushing the pipeline, even under branch misprediction by allowing the instruction fetcher to work continuously. Our approach reduces the branch misprediction penalty for achieving reliable system performance. It instantly reconstructs the map table to any mispredicted branch and it outperforms the conventional RMT by an average of 10.93%.

Introduction

The rapid rise in the complexity of computer system is why reliability and fault tolerance is so important in system design. Especially, branch prediction is well known technique for improving performance in microprocessor. Although today's branch predictors show high accuracy, it is still a probabilistic approach and not perfect. Moreover, the misprediction penalty is getting larger due to aggressive speculation and deeper pipelining. The penalty severely affects reliable system performance. Because, this causes frequent performance drop, larger state recovery time, and low system reliability. Therefore, it must be handled properly in order to achieve reliable system performance.

There are two main reasons for performance degradation from the branch mispredictions: cycles to resolve the branch, and cycles to recover the correct architectural state. We focus on reducing the recovery overhead of the above branch misprediction penalty. The recovery process involves not only canceling the effects of instructions that were speculatively executed, but also reestablishing the correct architectural state. Typically, branch misprediction recovery requires flushing the pipeline, reconstructing the architectural state, and then restarting the process of instruction fetching and register renaming. The renaming of correct path instructions will not restart until the register rename map table corresponding to the mispredicted branch is restored. To address the above problems, the following approaches are the most common solutions for restoring the map table: the checkpoint processing and recovery (CPR) method (Akkary et al., 2003), the retirement map table (RMT) method (Hinton et al., 2001) and the history buffer (HB) method (Ranganathan et al., 1997, Smith and Pleszkun, 1985). In the checkpoint repair method, backup copies of the in-order state are created periodically either at every branch or every few cycles. The RMT method uses a retirement map table in addition to the frontend map table. When the mispredicted branch reaches the retire point, the retirement map table is copied to the frontend map table. The HB method uses a stack-like structure to maintain the in-order state superseded by speculative values. On branch misprediction, all mappings are then popped from the HB and updated into the rename map table in reverse order. In fact, the Intel Pentium 4 uses the RMT method to track register mappings. The Alpha 21264 (Kessler, 1999) and MIPS R10000 (Yeager, 1996) use the CPR method to reestablish rename table. With the above three method remaining the basis of branch recovery studies, several recent proposals for are developed as follows: eager misprediction recovery (EMR) method (Zhou et al., 2005) and the selective checkpointing (Gandhi et al., 2004, Akkary et al., 2004, Cristal et al., 2005, Kirman et al., 2005). The EMR method hides the latency of long branch recovery by leveraging the instructions that access correct values to execute continuously, while forcing instructions that reference incorrect speculative values to wait until the correct data are restored. The selective checkpointing minimizes the CPR overhead and enables fast branch misprediction recovery by selectively creating checkpoints at low-confidence branches while conventional checkpointing allocates for all predicted branches.

In this paper, we overcome the complex recovery overhead by proposing a fast and efficient recovery mechanism. To this end, we propose an incremental register renaming (IRR) which is distinct from conventional register renaming because it uses a different reclaiming policy to rename registers. The IRR enforces the destination register number of the instruction stream to appear in non-decreasing order. Then, we develop a fast and efficient structure, called bit-vector based rename map table (BVMT), to reduce the branch recovery overhead. With the incremental property of the IRR, the BVMT recovery scheme completely eliminates the roll-back overhead on branch misprediction. The proposed mechanism does not have to wait until all other pending instructions are completed before starting the recovery process. Thus, the instruction fetcher does not stop and it fetches instructions from the correct path immediately after the misprediction detected. The proposed mechanism makes checkpoints efficiently and instantly rolls the map table back to the correct state.

The rest of this paper is organized as follows. Section 2 presents a brief review of the existing approaches related to register renaming and rename table structures. Section 3 and 4 provide the key idea of this paper such as the IRR strategy and the BVMT structure. Section 5 evaluate the performance. Finally, we conclude by summarizing our results in Section 6.

Section snippets

Related work

The related literature will be discussed in this section. We begin with describing the renaming table structures and then we briefly introduce the various related works to branch misprediction recovery.

There are two possibilities for keeping track of the actual mapping of particular architectural registers to allocated rename buffers: SRAM1

Incremental register renaming

Register renaming is a technique to avoid unnecessary serialization of program operations imposed by the reuse of the same destination registers. The register renaming resolves false data dependencies – anti-dependency and output dependency – that occur in code segment between operands of subsequent instructions. The core of the renaming activity takes place in the register map table (RMT). The RMT keeps track of the register mapping from architectural registers to logical rename registers. In

Why flushing pipeline on branch recovery?

When a branch misprediction occurs, retiring all instructions prior to the mispredicted branch is the state of the art solution to the branch recovery problem. After recovering from the branch, the processor resumes executing instructions on the correct path. However, this solution severely affects the performance, because it requires much time to recover the speculative machine state to the correct state. The recovery process involves squashing all instructions on the mispredicted path and

Experimental results

All tests and evaluations were performed with programs from the Spec CPU2000 benchmark (The Standard Performance Evaluation Corporation) suite on Simplescalar. The SimpleScalar tool set is a system software infrastructure used to build modeling applications for program performance analysis, and detailed microarchitectural modeling. Using the SimpleScalar tools, users can build modeling applications that simulate real programs running on a range of modern processors and systems. The SimpleScalar

Concluding remarks

In this work, we prevent a processor from flushing the pipeline even under branch misprediction by allowing the instruction fetcher to work continuously. To this end, we propose a fast and low-cost branch recovery scheme using the incremental register renaming and the bit-vector based rename map table. The BVMT instantly reconstructs the map table corresponding to any branch and rolls back to correct state, so that the frontend does not stall during the recovery process. Consequently, our

References (24)

  • H. Akkary et al.

    Checkpoint processing and recovery: towards scalable large instruction window processors

  • H. Akkary et al.

    An analysis of a resource efficient checkpoint architecture

    ACM Transactions on Architecture and Code Optimization

    (2004)
  • A. Cristal et al.

    Kilo-instruction processors: overcoming the memory wall

    IEEE Micro

    (2005)
  • Cheng C. The schemes and performances of dynamic branch prediction. Technical Report,...
  • M. Choi et al.

    An energy efficient instruction window for scalable processor architecture

    IEICE Transactions on Electronics

    (2008)
  • A. Gandhi et al.

    Reducing branch misprediction penalty via selective branch recovery

  • Hinton G, Sager D, Upton M, Boggs D, Carmean D, Kyker A, et al. The Microarchitecture of the Pentium 4 processor. Intel...
  • A. Joshi et al.

    Measuring benchmark similarity using inherent program characteristics

    IEEE Transactions on Computers

    (2006)
  • R. Kessler

    The alpha 21264 microprocessor

    IEEE Micro

    (1999)
  • Kirman N, Kirman M, Chaudhuri M, Martinez J. Checkpointed early load retirement. In: IEEE international symposium on...
  • C. Kun et al.

    A power-optimized 64-bit priority encoder utilizing parallel priority look-ahead

  • Y. Lee et al.

    On effective slack reclamation in task scheduling for energy reduction

    Journal of Information Processing Systems

    (2009)
  • Cited by (3)

    This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (2010-0022589 and 2010-0025748).

    View full text