Skip to main content

Part of the book series: Lecture Notes in Computer Science ((THIPEAC,volume 5470))

Abstract

Resource-efficient checkpoint processors have been shown to recover to an earlier safe state very fast. Yet in order to complete the misprediction recovery they also need to reexecute the code segment between the recovered checkpoint and the mispredicted instruction. This paper evaluates two novel reuse methods which accelerate reexecution paths by reusing the results of instructions and the outcome of branches obtained during the first run. The paper also evaluates, in the context of checkpoint processors, two other reuse methods targeting trivial and repetitive arithmetic operations. A reuse approach combining all four methods requires an area of 0.87[mm2], consumes 51.6[mW], and improves the energy-delay product by 4.8% and 11.85% for the integer and floating point benchmarks respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Akkary, H., Rajwar, R., Srinivasan, S.T.: An analysis of a resource efficient checkpoint architecture. ACM Transactions on Architecture and Code Optimization 1(4), 418–444 (2004)

    Article  Google Scholar 

  2. Akkary, H., Srinivasan, S.T., Lai, K.: Recycling waste: Exploiting wrong-path execution to improve branch prediction. In: Proc. of the 17th annual Int’l. Conf. on Supercomputing, pp. 12–21 (June 2003)

    Google Scholar 

  3. Akl, P., Moshovos, A.I.: Branchtap: improving performance with very few checkpoints through adaptive speculation control. In: Proc. of the 20th annual Int’l. Conf. on Supercomputing, pp. 36–45 (June 2006)

    Google Scholar 

  4. Austin, T.M., Sohi, G.S.: Zero-cycle loads: microarchitecture support for reducing load latency. In: Proc. of the 28th annual Int’l. Symp. on Microarchitecture, pp. 82–92 (November 1995)

    Google Scholar 

  5. Bannon, P., Keller, J.: Internal architecture of Alpha 21164 microprocessor. In: COMPCON 1995: Proceedings of the 40th IEEE Computer Society International Conference, pp. 79–87 (1995)

    Google Scholar 

  6. Benowitz, E., Ercegovac, M., Fallah, F.: Reducing the latency of division operations with partial caching. In: Proc. of the 36th Asilomar Conf. on Signals, Systems and Computers, pp. 1598–1602 (November 2002)

    Google Scholar 

  7. Cher, C.Y., Vijaykumar, T.N.: Skipper: a microarchitecture for exploiting control-flow independence. In: Proc. of the 34th annual Int’l. Symp. on Microarchitecture, pp. 4–15 (December 2001)

    Google Scholar 

  8. Chou, Y.C., Fung, J., Shen, J.P.: Reducing branch misprediction penalties via dynamic control independence detection. In: Proc. of the 13th annual Int’l. Conf. on Supercomputing, pp. 109–118 (June 1999)

    Google Scholar 

  9. Citron, D., Feitelson, D.G.: Look it up or Do the math: An energy, area, and timing analysis of instruction reuse and memoization. In: Third Int’l. Workshop on Power - Aware Computer Systems, pp. 101–116 (December 2003)

    Google Scholar 

  10. Collins, J.D., Tullsen, D.M., Wang, H.: Control flow optimization via dynamic reconvergence prediction. In: Proc. of the 37th annual Int’l. Symp. on Microarchitecture, pp. 129–140 (December 2004)

    Google Scholar 

  11. Cristal, A., Santana, O.J., Valero, M., Martinez, J.F.: Toward kilo-instruction processors. ACM Transactions on Architecture and Code Optimization 1(4), 389–417 (2004)

    Article  Google Scholar 

  12. Gandhi, A., Akkary, H., Rajwar, R., Srinivasan, S.T., Lai, K.: Scalable load and store processing in latency tolerant processors. In: Proc. of the 32nd annual Int’l. Symp. on Computer Architecture, pp. 446–457 (June 2005)

    Google Scholar 

  13. Gandhi, A., Akkary, H., Srinivasan, S.T.: Reducing branch misprediction penalty via selective branch recovery. In: Proc. of the 10th IEEE Int’l. Symp. on High-Performance Computer Architecture, pp. 254–264 (February 2004)

    Google Scholar 

  14. Golander, A., Weiss, S.: Hiding the misprediction penalty of a resource-efficient high-performance processor. ACM Transactions on Architecture and Code Optimization (accepted) (to appear)

    Google Scholar 

  15. Gonzalez, R., Horowitz, M.: Energy dissipation in general purpose microprocessors. IEEE Journal of Solid State Circuits 31(9), 1277–1284 (1996)

    Article  Google Scholar 

  16. Gwennap, L.: Intel’s P6 uses decoupled superscalar design. Microprocessor Report 9(2) (1995)

    Google Scholar 

  17. Jacobsen, E., Rotenberg, E., Smith, J.E.: Assigning confidence to conditional branch predictions. In: Proc. of the 29th annual Int’l. Symp. on Microarchitecture, pp. 142–152 (December 1996)

    Google Scholar 

  18. Kalla, R., Sinharoy, B., Tendler, J.M.: IBM POWER5 chip: A dual-core multithreaded processor. IEEE Micro. 24(2), 40–47 (2004)

    Article  Google Scholar 

  19. Kessler, R.E.: The Alpha 21264 microprocessor. IEEE micro. 19(2), 24–36 (1999)

    Article  Google Scholar 

  20. Levitan, D., Thomas, T., Tu, P.: The PowerPC 620 microprocessor: a high performance superscalar RISC microprocessor. In: COMPCON 1995: Proceedings of the 40th IEEE Computer Society International Conference, p. 285 (1995)

    Google Scholar 

  21. Lipasti, M.H., Shen, J.P.: Exceeding the dataflow limit via value prediction. In: Proc. of the 29th annual Int’l. Symp. on Microarchitecture, pp. 226–237 (December 1996)

    Google Scholar 

  22. Molina, C., Gonzalez, A., Tubella, J.: Dynamic removal of redundant computations. In: Proc. of the 13th annual Int’l. Conf. on Supercomputing, pp. 474–481 (June 1999)

    Google Scholar 

  23. Moshovos, A.I., Breach, S.E., Vijaykumar, T.N., Sohi, G.S.: Dynamic speculation and synchronization of data dependences. In: Proc. of the 24th annual Int’l. Symp. on Computer Architecture, pp. 181–193 (June 1997)

    Google Scholar 

  24. Moshovos, A.I., Sohi, G.S.: Read-after-read memory dependence prediction. In: Proc. of the 32nd annual Int’l. Symp. on Microarchitecture, pp. 177–185 (November 1999)

    Google Scholar 

  25. Mutlu, O., Kim, H., Stark, J., Patt, Y.N.: On reusing the results of pre-executed instructions in a runahead execution processor. IEEE Computer Architecture Letters 4 (2005)

    Google Scholar 

  26. Oberman, S.F., Flynn, M.J.: Reducing division latency with reciprocal caches. Reliable Computing 2(2), 147–153 (1996)

    Article  MATH  Google Scholar 

  27. Pajuelo, A., Gonzalez, A., Valero, M.: Control-flow independence reuse via dynamic vectorization. In: 19th IEEE Int’l. Parallel and Distributed Processing Symp., p. 21a (April 2005)

    Google Scholar 

  28. Richardson, S.E.: Exploiting trivial and redundant computation. In: Proc. of the 11th Symp. on Computer Arithmetic, pp. 220–227 (June 1993)

    Google Scholar 

  29. Rotenberg, E., Jacobson, Q., Smith, J.: A study of control independence in superscalar processors. In: Proc. of the Fifth IEEE Int’l. Symp. on High-Performance Computer Architecture, pp. 115–124 (January 1999)

    Google Scholar 

  30. Roth, A., Sohi, G.S.: Squash reuse via a simplified implementation of register integration. Journal of Instruction-Level Parallelism 3 (October 2001)

    Google Scholar 

  31. Sarangi, S.R., Torrellas, J., Liu, W., Zhou, Y.: Reslice: Selective re-execution of long-retired misspeculated instructions using forward slicing. In: Proc. of the 38th annual Int’l. Symp. on Microarchitecture, pp. 257–270 (November 2005)

    Google Scholar 

  32. Seznec, A., Michaud, P.: A case for (partially) TAgged GEometric history length branch prediction. Journal of Instruction-Level Parallelism 8 (February 2006)

    Google Scholar 

  33. Smith, J.E., Pleszkun, A.R.: Implementing precise interrupts in pipelined processors. IEEE Transactions on Computers 37(5), 562–573 (1988)

    Article  Google Scholar 

  34. Sodani, A., Sohi, G.S.: Dynamic instruction reuse. In: Proc. of the 24th annual Int’l. Symp. on Computer Architecture, pp. 194–205 (June 1997)

    Google Scholar 

  35. Song, S.P., Denman, M., Chang, J.: The PowerPC 604 RISC microprocessor. IEEE Micro. 14(5), 8–17 (1994)

    Article  Google Scholar 

  36. Suresh, B., Chaterjee, B., Harinath, R.: Synthesizable RAM-alternative to low configuration compiler memory for die area reduction. In: Proc. of the 13th Int’l. Conf. on VLSI Design, pp. 512–517 (2000)

    Google Scholar 

  37. Tarjan, D., Thoziyoor, S., Jouppi, N.P.: Cacti 4.0. Technical Report HPL-2006-86, HP Laboratories Palo Alto (June 2006)

    Google Scholar 

  38. Yeager, K.C.: The MIPS R10000 superscalar microprocessor. IEEE micro. 16(2), 28–40 (1996)

    Article  Google Scholar 

  39. Yi, J.J., Lilja, D.J.: Improving processor performance by simplifying and bypassing trivial computations. In: Proc. of the 20th Int’l. Conf. on Computer Design, pp. 462–465 (October 2002)

    Google Scholar 

  40. Yi, J.J., Sendag, R., Lilja, D.J.: Increasing instruction-level parallelism with instruction precomputation. In: Monien, B., Feldmann, R.L. (eds.) Euro-Par 2002. LNCS, vol. 2400, pp. 481–485. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Golander, A., Weiss, S. (2009). Reexecution and Selective Reuse in Checkpoint Processors. In: Stenström, P. (eds) Transactions on High-Performance Embedded Architectures and Compilers II. Lecture Notes in Computer Science, vol 5470. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00904-4_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00904-4_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00903-7

  • Online ISBN: 978-3-642-00904-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics