Reexecution and Selective Reuse in Checkpoint Processors

Golander, Amit; Weiss, Shlomo

doi:10.1007/978-3-642-00904-4_13

Amit Golander¹⁷ &
Shlomo Weiss¹⁷

Part of the book series: Lecture Notes in Computer Science ((THIPEAC,volume 5470))

466 Accesses
1 Citations

Abstract

Resource-efficient checkpoint processors have been shown to recover to an earlier safe state very fast. Yet in order to complete the misprediction recovery they also need to reexecute the code segment between the recovered checkpoint and the mispredicted instruction. This paper evaluates two novel reuse methods which accelerate reexecution paths by reusing the results of instructions and the outcome of branches obtained during the first run. The paper also evaluates, in the context of checkpoint processors, two other reuse methods targeting trivial and repetitive arithmetic operations. A reuse approach combining all four methods requires an area of 0.87[mm²], consumes 51.6[mW], and improves the energy-delay product by 4.8% and 11.85% for the integer and floating point benchmarks respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Akkary, H., Rajwar, R., Srinivasan, S.T.: An analysis of a resource efficient checkpoint architecture. ACM Transactions on Architecture and Code Optimization 1(4), 418–444 (2004)
Article Google Scholar
Akkary, H., Srinivasan, S.T., Lai, K.: Recycling waste: Exploiting wrong-path execution to improve branch prediction. In: Proc. of the 17th annual Int’l. Conf. on Supercomputing, pp. 12–21 (June 2003)
Google Scholar
Akl, P., Moshovos, A.I.: Branchtap: improving performance with very few checkpoints through adaptive speculation control. In: Proc. of the 20th annual Int’l. Conf. on Supercomputing, pp. 36–45 (June 2006)
Google Scholar
Austin, T.M., Sohi, G.S.: Zero-cycle loads: microarchitecture support for reducing load latency. In: Proc. of the 28th annual Int’l. Symp. on Microarchitecture, pp. 82–92 (November 1995)
Google Scholar
Bannon, P., Keller, J.: Internal architecture of Alpha 21164 microprocessor. In: COMPCON 1995: Proceedings of the 40th IEEE Computer Society International Conference, pp. 79–87 (1995)
Google Scholar
Benowitz, E., Ercegovac, M., Fallah, F.: Reducing the latency of division operations with partial caching. In: Proc. of the 36th Asilomar Conf. on Signals, Systems and Computers, pp. 1598–1602 (November 2002)
Google Scholar
Cher, C.Y., Vijaykumar, T.N.: Skipper: a microarchitecture for exploiting control-flow independence. In: Proc. of the 34th annual Int’l. Symp. on Microarchitecture, pp. 4–15 (December 2001)
Google Scholar
Chou, Y.C., Fung, J., Shen, J.P.: Reducing branch misprediction penalties via dynamic control independence detection. In: Proc. of the 13th annual Int’l. Conf. on Supercomputing, pp. 109–118 (June 1999)
Google Scholar
Citron, D., Feitelson, D.G.: Look it up or Do the math: An energy, area, and timing analysis of instruction reuse and memoization. In: Third Int’l. Workshop on Power - Aware Computer Systems, pp. 101–116 (December 2003)
Google Scholar
Collins, J.D., Tullsen, D.M., Wang, H.: Control flow optimization via dynamic reconvergence prediction. In: Proc. of the 37th annual Int’l. Symp. on Microarchitecture, pp. 129–140 (December 2004)
Google Scholar
Cristal, A., Santana, O.J., Valero, M., Martinez, J.F.: Toward kilo-instruction processors. ACM Transactions on Architecture and Code Optimization 1(4), 389–417 (2004)
Article Google Scholar
Gandhi, A., Akkary, H., Rajwar, R., Srinivasan, S.T., Lai, K.: Scalable load and store processing in latency tolerant processors. In: Proc. of the 32nd annual Int’l. Symp. on Computer Architecture, pp. 446–457 (June 2005)
Google Scholar
Gandhi, A., Akkary, H., Srinivasan, S.T.: Reducing branch misprediction penalty via selective branch recovery. In: Proc. of the 10th IEEE Int’l. Symp. on High-Performance Computer Architecture, pp. 254–264 (February 2004)
Google Scholar
Golander, A., Weiss, S.: Hiding the misprediction penalty of a resource-efficient high-performance processor. ACM Transactions on Architecture and Code Optimization (accepted) (to appear)
Google Scholar
Gonzalez, R., Horowitz, M.: Energy dissipation in general purpose microprocessors. IEEE Journal of Solid State Circuits 31(9), 1277–1284 (1996)
Article Google Scholar
Gwennap, L.: Intel’s P6 uses decoupled superscalar design. Microprocessor Report 9(2) (1995)
Google Scholar
Jacobsen, E., Rotenberg, E., Smith, J.E.: Assigning confidence to conditional branch predictions. In: Proc. of the 29th annual Int’l. Symp. on Microarchitecture, pp. 142–152 (December 1996)
Google Scholar
Kalla, R., Sinharoy, B., Tendler, J.M.: IBM POWER5 chip: A dual-core multithreaded processor. IEEE Micro. 24(2), 40–47 (2004)
Article Google Scholar
Kessler, R.E.: The Alpha 21264 microprocessor. IEEE micro. 19(2), 24–36 (1999)
Article Google Scholar
Levitan, D., Thomas, T., Tu, P.: The PowerPC 620 microprocessor: a high performance superscalar RISC microprocessor. In: COMPCON 1995: Proceedings of the 40th IEEE Computer Society International Conference, p. 285 (1995)
Google Scholar
Lipasti, M.H., Shen, J.P.: Exceeding the dataflow limit via value prediction. In: Proc. of the 29th annual Int’l. Symp. on Microarchitecture, pp. 226–237 (December 1996)
Google Scholar
Molina, C., Gonzalez, A., Tubella, J.: Dynamic removal of redundant computations. In: Proc. of the 13th annual Int’l. Conf. on Supercomputing, pp. 474–481 (June 1999)
Google Scholar
Moshovos, A.I., Breach, S.E., Vijaykumar, T.N., Sohi, G.S.: Dynamic speculation and synchronization of data dependences. In: Proc. of the 24th annual Int’l. Symp. on Computer Architecture, pp. 181–193 (June 1997)
Google Scholar
Moshovos, A.I., Sohi, G.S.: Read-after-read memory dependence prediction. In: Proc. of the 32nd annual Int’l. Symp. on Microarchitecture, pp. 177–185 (November 1999)
Google Scholar
Mutlu, O., Kim, H., Stark, J., Patt, Y.N.: On reusing the results of pre-executed instructions in a runahead execution processor. IEEE Computer Architecture Letters 4 (2005)
Google Scholar
Oberman, S.F., Flynn, M.J.: Reducing division latency with reciprocal caches. Reliable Computing 2(2), 147–153 (1996)
Article MATH Google Scholar
Pajuelo, A., Gonzalez, A., Valero, M.: Control-flow independence reuse via dynamic vectorization. In: 19th IEEE Int’l. Parallel and Distributed Processing Symp., p. 21a (April 2005)
Google Scholar
Richardson, S.E.: Exploiting trivial and redundant computation. In: Proc. of the 11th Symp. on Computer Arithmetic, pp. 220–227 (June 1993)
Google Scholar
Rotenberg, E., Jacobson, Q., Smith, J.: A study of control independence in superscalar processors. In: Proc. of the Fifth IEEE Int’l. Symp. on High-Performance Computer Architecture, pp. 115–124 (January 1999)
Google Scholar
Roth, A., Sohi, G.S.: Squash reuse via a simplified implementation of register integration. Journal of Instruction-Level Parallelism 3 (October 2001)
Google Scholar
Sarangi, S.R., Torrellas, J., Liu, W., Zhou, Y.: Reslice: Selective re-execution of long-retired misspeculated instructions using forward slicing. In: Proc. of the 38th annual Int’l. Symp. on Microarchitecture, pp. 257–270 (November 2005)
Google Scholar
Seznec, A., Michaud, P.: A case for (partially) TAgged GEometric history length branch prediction. Journal of Instruction-Level Parallelism 8 (February 2006)
Google Scholar
Smith, J.E., Pleszkun, A.R.: Implementing precise interrupts in pipelined processors. IEEE Transactions on Computers 37(5), 562–573 (1988)
Article Google Scholar
Sodani, A., Sohi, G.S.: Dynamic instruction reuse. In: Proc. of the 24th annual Int’l. Symp. on Computer Architecture, pp. 194–205 (June 1997)
Google Scholar
Song, S.P., Denman, M., Chang, J.: The PowerPC 604 RISC microprocessor. IEEE Micro. 14(5), 8–17 (1994)
Article Google Scholar
Suresh, B., Chaterjee, B., Harinath, R.: Synthesizable RAM-alternative to low configuration compiler memory for die area reduction. In: Proc. of the 13th Int’l. Conf. on VLSI Design, pp. 512–517 (2000)
Google Scholar
Tarjan, D., Thoziyoor, S., Jouppi, N.P.: Cacti 4.0. Technical Report HPL-2006-86, HP Laboratories Palo Alto (June 2006)
Google Scholar
Yeager, K.C.: The MIPS R10000 superscalar microprocessor. IEEE micro. 16(2), 28–40 (1996)
Article Google Scholar
Yi, J.J., Lilja, D.J.: Improving processor performance by simplifying and bypassing trivial computations. In: Proc. of the 20th Int’l. Conf. on Computer Design, pp. 462–465 (October 2002)
Google Scholar
Yi, J.J., Sendag, R., Lilja, D.J.: Increasing instruction-level parallelism with instruction precomputation. In: Monien, B., Feldmann, R.L. (eds.) Euro-Par 2002. LNCS, vol. 2400, pp. 481–485. Springer, Heidelberg (2002)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Tel Aviv University, Tel Aviv, 69978, Israel
Amit Golander & Shlomo Weiss

Authors

Amit Golander
View author publications
You can also search for this author in PubMed Google Scholar
Shlomo Weiss
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Chalmers University of Technology, 412 96, Gothenburg, Sweden
Per Stenström

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Golander, A., Weiss, S. (2009). Reexecution and Selective Reuse in Checkpoint Processors. In: Stenström, P. (eds) Transactions on High-Performance Embedded Architectures and Compilers II. Lecture Notes in Computer Science, vol 5470. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00904-4_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-00904-4_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00903-7
Online ISBN: 978-3-642-00904-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics