Abstract
Resource-efficient checkpoint processors have been shown to recover to an earlier safe state very fast. Yet in order to complete the misprediction recovery they also need to reexecute the code segment between the recovered checkpoint and the mispredicted instruction. This paper evaluates two novel reuse methods which accelerate reexecution paths by reusing the results of instructions and the outcome of branches obtained during the first run. The paper also evaluates, in the context of checkpoint processors, two other reuse methods targeting trivial and repetitive arithmetic operations. A reuse approach combining all four methods requires an area of 0.87[mm2], consumes 51.6[mW], and improves the energy-delay product by 4.8% and 11.85% for the integer and floating point benchmarks respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Akkary, H., Rajwar, R., Srinivasan, S.T.: An analysis of a resource efficient checkpoint architecture. ACM Transactions on Architecture and Code Optimization 1(4), 418–444 (2004)
Akkary, H., Srinivasan, S.T., Lai, K.: Recycling waste: Exploiting wrong-path execution to improve branch prediction. In: Proc. of the 17th annual Int’l. Conf. on Supercomputing, pp. 12–21 (June 2003)
Akl, P., Moshovos, A.I.: Branchtap: improving performance with very few checkpoints through adaptive speculation control. In: Proc. of the 20th annual Int’l. Conf. on Supercomputing, pp. 36–45 (June 2006)
Austin, T.M., Sohi, G.S.: Zero-cycle loads: microarchitecture support for reducing load latency. In: Proc. of the 28th annual Int’l. Symp. on Microarchitecture, pp. 82–92 (November 1995)
Bannon, P., Keller, J.: Internal architecture of Alpha 21164 microprocessor. In: COMPCON 1995: Proceedings of the 40th IEEE Computer Society International Conference, pp. 79–87 (1995)
Benowitz, E., Ercegovac, M., Fallah, F.: Reducing the latency of division operations with partial caching. In: Proc. of the 36th Asilomar Conf. on Signals, Systems and Computers, pp. 1598–1602 (November 2002)
Cher, C.Y., Vijaykumar, T.N.: Skipper: a microarchitecture for exploiting control-flow independence. In: Proc. of the 34th annual Int’l. Symp. on Microarchitecture, pp. 4–15 (December 2001)
Chou, Y.C., Fung, J., Shen, J.P.: Reducing branch misprediction penalties via dynamic control independence detection. In: Proc. of the 13th annual Int’l. Conf. on Supercomputing, pp. 109–118 (June 1999)
Citron, D., Feitelson, D.G.: Look it up or Do the math: An energy, area, and timing analysis of instruction reuse and memoization. In: Third Int’l. Workshop on Power - Aware Computer Systems, pp. 101–116 (December 2003)
Collins, J.D., Tullsen, D.M., Wang, H.: Control flow optimization via dynamic reconvergence prediction. In: Proc. of the 37th annual Int’l. Symp. on Microarchitecture, pp. 129–140 (December 2004)
Cristal, A., Santana, O.J., Valero, M., Martinez, J.F.: Toward kilo-instruction processors. ACM Transactions on Architecture and Code Optimization 1(4), 389–417 (2004)
Gandhi, A., Akkary, H., Rajwar, R., Srinivasan, S.T., Lai, K.: Scalable load and store processing in latency tolerant processors. In: Proc. of the 32nd annual Int’l. Symp. on Computer Architecture, pp. 446–457 (June 2005)
Gandhi, A., Akkary, H., Srinivasan, S.T.: Reducing branch misprediction penalty via selective branch recovery. In: Proc. of the 10th IEEE Int’l. Symp. on High-Performance Computer Architecture, pp. 254–264 (February 2004)
Golander, A., Weiss, S.: Hiding the misprediction penalty of a resource-efficient high-performance processor. ACM Transactions on Architecture and Code Optimization (accepted) (to appear)
Gonzalez, R., Horowitz, M.: Energy dissipation in general purpose microprocessors. IEEE Journal of Solid State Circuits 31(9), 1277–1284 (1996)
Gwennap, L.: Intel’s P6 uses decoupled superscalar design. Microprocessor Report 9(2) (1995)
Jacobsen, E., Rotenberg, E., Smith, J.E.: Assigning confidence to conditional branch predictions. In: Proc. of the 29th annual Int’l. Symp. on Microarchitecture, pp. 142–152 (December 1996)
Kalla, R., Sinharoy, B., Tendler, J.M.: IBM POWER5 chip: A dual-core multithreaded processor. IEEE Micro. 24(2), 40–47 (2004)
Kessler, R.E.: The Alpha 21264 microprocessor. IEEE micro. 19(2), 24–36 (1999)
Levitan, D., Thomas, T., Tu, P.: The PowerPC 620 microprocessor: a high performance superscalar RISC microprocessor. In: COMPCON 1995: Proceedings of the 40th IEEE Computer Society International Conference, p. 285 (1995)
Lipasti, M.H., Shen, J.P.: Exceeding the dataflow limit via value prediction. In: Proc. of the 29th annual Int’l. Symp. on Microarchitecture, pp. 226–237 (December 1996)
Molina, C., Gonzalez, A., Tubella, J.: Dynamic removal of redundant computations. In: Proc. of the 13th annual Int’l. Conf. on Supercomputing, pp. 474–481 (June 1999)
Moshovos, A.I., Breach, S.E., Vijaykumar, T.N., Sohi, G.S.: Dynamic speculation and synchronization of data dependences. In: Proc. of the 24th annual Int’l. Symp. on Computer Architecture, pp. 181–193 (June 1997)
Moshovos, A.I., Sohi, G.S.: Read-after-read memory dependence prediction. In: Proc. of the 32nd annual Int’l. Symp. on Microarchitecture, pp. 177–185 (November 1999)
Mutlu, O., Kim, H., Stark, J., Patt, Y.N.: On reusing the results of pre-executed instructions in a runahead execution processor. IEEE Computer Architecture Letters 4 (2005)
Oberman, S.F., Flynn, M.J.: Reducing division latency with reciprocal caches. Reliable Computing 2(2), 147–153 (1996)
Pajuelo, A., Gonzalez, A., Valero, M.: Control-flow independence reuse via dynamic vectorization. In: 19th IEEE Int’l. Parallel and Distributed Processing Symp., p. 21a (April 2005)
Richardson, S.E.: Exploiting trivial and redundant computation. In: Proc. of the 11th Symp. on Computer Arithmetic, pp. 220–227 (June 1993)
Rotenberg, E., Jacobson, Q., Smith, J.: A study of control independence in superscalar processors. In: Proc. of the Fifth IEEE Int’l. Symp. on High-Performance Computer Architecture, pp. 115–124 (January 1999)
Roth, A., Sohi, G.S.: Squash reuse via a simplified implementation of register integration. Journal of Instruction-Level Parallelism 3 (October 2001)
Sarangi, S.R., Torrellas, J., Liu, W., Zhou, Y.: Reslice: Selective re-execution of long-retired misspeculated instructions using forward slicing. In: Proc. of the 38th annual Int’l. Symp. on Microarchitecture, pp. 257–270 (November 2005)
Seznec, A., Michaud, P.: A case for (partially) TAgged GEometric history length branch prediction. Journal of Instruction-Level Parallelism 8 (February 2006)
Smith, J.E., Pleszkun, A.R.: Implementing precise interrupts in pipelined processors. IEEE Transactions on Computers 37(5), 562–573 (1988)
Sodani, A., Sohi, G.S.: Dynamic instruction reuse. In: Proc. of the 24th annual Int’l. Symp. on Computer Architecture, pp. 194–205 (June 1997)
Song, S.P., Denman, M., Chang, J.: The PowerPC 604 RISC microprocessor. IEEE Micro. 14(5), 8–17 (1994)
Suresh, B., Chaterjee, B., Harinath, R.: Synthesizable RAM-alternative to low configuration compiler memory for die area reduction. In: Proc. of the 13th Int’l. Conf. on VLSI Design, pp. 512–517 (2000)
Tarjan, D., Thoziyoor, S., Jouppi, N.P.: Cacti 4.0. Technical Report HPL-2006-86, HP Laboratories Palo Alto (June 2006)
Yeager, K.C.: The MIPS R10000 superscalar microprocessor. IEEE micro. 16(2), 28–40 (1996)
Yi, J.J., Lilja, D.J.: Improving processor performance by simplifying and bypassing trivial computations. In: Proc. of the 20th Int’l. Conf. on Computer Design, pp. 462–465 (October 2002)
Yi, J.J., Sendag, R., Lilja, D.J.: Increasing instruction-level parallelism with instruction precomputation. In: Monien, B., Feldmann, R.L. (eds.) Euro-Par 2002. LNCS, vol. 2400, pp. 481–485. Springer, Heidelberg (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Golander, A., Weiss, S. (2009). Reexecution and Selective Reuse in Checkpoint Processors. In: Stenström, P. (eds) Transactions on High-Performance Embedded Architectures and Compilers II. Lecture Notes in Computer Science, vol 5470. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00904-4_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-00904-4_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00903-7
Online ISBN: 978-3-642-00904-4
eBook Packages: Computer ScienceComputer Science (R0)