Abstract
This paper presents a novel high performance substrate for building energy-efficient out-of-order superscalar cores. The architecture does not require a reorder buffer or physical registers for register renaming and instruction retirement. Instead, it uses a large number of virtual register IDs for register renaming, a physical register file of the same size as the logical register file, and checkpoints to bulk retire instructions and to recover from exceptions and branch mispredictions. By eliminating physical register renaming and the reorder buffer, the architecture not only eliminates complex power hungry hardware structures, but also reduces reorder buffer capacity stalls when execution encounters long delays from data cache misses, thus improving performance. The paper presents performance and power evaluation of this new architecture using Spec 2006 benchmarks. The performance data was collected using an x86 ASIM-based performance simulator from Intel Labs. The data shows that the new architecture improves performance of a 2-wide out-of-order x86 processor core by an average of 4.2%, while saving 43% of the energy consumption of the reorder buffer and retirement register file functional block.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Akkary, H., Rajwar, R., Srinivasan, S.: Checkpoint processing and recovery: towards scalable large instruction window processors. In: Proceedings of MICRO 2003 (2003)
Akkary, H., Rajwar, R., Srinivasan, S.: Checkpoint processing and recovery: an efficient, scalable alternative to reorder buffers. IEEE MICRO 23(6), 11–19 (2003)
Akkary, H., Rajwar, R., Srinivasan, S.: An analysis of a resource efficient checkpoint architecture. ACM Transactions on Architecture and Code Optimization 1(4), 418–444 (2004)
Cristal, A., Santana, O.J., Valero, M., Martinez, J.F.: Toward kilo-instruction processors. ACM Transactions on Architecture and Code Optimization 1(4), 389–417 (2004)
Cristal, A., Ortega, D., Llosa, J., Valero, M.: Out-of-order commit processors. In: Proceedings of HPCA 2004 (2004)
Cristal, A., Valero, M., Llosa, J., Gonzalez, A.: Large virtual ROBs by processor checkpointing. Tech. Report, UPC-DAC-2002-39, Department of Computer Science, Barcelona, Spain (July 2002)
Emer, J., Ahuja, P., Borch, E., Klauser, A., Luk, C.-K., Manne, S., Mukherjee, S.S., Patil, H., Wallace, S., Binkert, N., Espasa, R., Juan, T.: ASIM: A performance model framework. IEEE Computer 35(2), 68–76 (2002)
Gonzalez, A., Gonzalez, J., Valero, M.: Virtual-physical registers. In: Proceedings of HPCA 1998 (1998)
Gonzalez, A., Valero, M., Gonzalez, J., Monreal, T.: Virtual registers. In: Proceedings of HPCA 1997 (1997)
Hilton, A., Nagarakatte, S., Roth, A.: Tolerating all-level cache misses in in-order processors. In: Proceedings of HPCA 2009 (2009)
Hilton, A., Roth, A.: BOLT: energy-efficient out-of-order latency tolerant execution. In: Proceedings of HPCA 2010 (2010)
Hinton, G., Sager, D., Upton, M., Boggs, D., Carmean, D., Kyker, A., Roussel, P.: The microarchitecture of the Pentium 4 processor. Intel Technology Journal 5(4) (February 2001)
Hwu, W.W., Patt, Y.N.: Checkpoint repair for out-of-order execution machines. In: Proceedings of ISCA 1987 (1987)
Jacobsen, E., Rotenberg, E., Smith, J.E.: Assigning confidence to conditional branch predictions. In: Proceedings of MICRO 1996 (1996)
Jothi, K., Akkary, H., Sharafeddine, M.: Simultaneous continual flow pipeline architecture. In: Proceedings of ICCD 2011 (2011)
Leibholz, D., Razdan, R.: The Alpha 21264: a 500 MHz out-of-order execution microprocessor. In: Proceedings of the 42nd IEEE Computer Society International Conference (COMPCON), pp. 28–36 (February 1997)
Martinez, J.F., Renau, J., Huang, M.C., Prvulovic, M., Torrellas, J.: Cherry: checkpoint early resource recycling in out-of-order Microprocessors. In: Proc. of MICRO 2002 (2002)
Moudgill, M., Pingali, K., Vassiliadis, S.: Register renaming and dynamic speculation: an alternative approach. In: Proceedings of MICRO 1993 (1993)
Papworth, D.B.: Tuning the Pentium Pro microarchitecture. IEEE MICRO 16(2), 8–15 (1996)
Smith, J.E., Pleszkun, A.R.: Implementation of precise interrupts in pipelined processors. In: Proceedings of ISCA 1985 (1985)
Smith, J.E., Sohi, G.S.: The microarchitecture of superscalar processors. Proceedings of the IEEE 83(12), 1609–1624 (1995)
Srinivasan, S.T., Rajwar, R., Akkary, H., Gandhi, A., Upton, M.: Continual flow pipelines. In: ASPLOS-11 (October 2004)
Tomasulo, R.M.: An efficient algorithm for exploiting multiple arithmetic Units. IBM Journal of Research and Development 11, 25–33 (1967)
Yeager, K.: The MIPS R10000 superscalar microprocessor. IEEE Micro 16(2), 28–40 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sharafeddine, M., Akkary, H., Carmean, D. (2013). Virtual Register Renaming. In: Kubátová, H., Hochberger, C., Daněk, M., Sick, B. (eds) Architecture of Computing Systems – ARCS 2013. ARCS 2013. Lecture Notes in Computer Science, vol 7767. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36424-2_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-36424-2_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36423-5
Online ISBN: 978-3-642-36424-2
eBook Packages: Computer ScienceComputer Science (R0)