Abstract
Current processors require a large number of in-flight instructions in order to look for further parallelism and hide the increasing gap between memory latency and processor cycle time. These in-flight instructions are typically stored in centralized structures called reorder buffer (ROB), which is a centerpiece to handle precise exceptions and recover a safe state in the event of a branch misprediction. However, this structure is becoming so big that it is difficult to fit it in the power budget of future processors designs. In this paper we propose a novel ROB microarchitecture named CROB (Compressed ROB) that can compress ROB entries and therefore give the illusion of having a larger virtual ROB than the number of ROB entries. The performance study of CROB shows a tremendous benefit, with an average speedup of 20% and 12% for a 128-entry and 256-entry ROB respectively. For some benchmark categories such as SpecFP2000, speedup raise up to 30%.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Smith, J., Pleszkun, A.R.: Implementing precise interrupts in pipelined processors. IEEE Transactions on Computers 37(5), 562–573 (1988)
Hinton, G., Sager, D., Upton, M., Boggs, D., Carmean, D., Kyker, A., Roussel, P.: The Microarchitecture of the Pentium® 4 Processor. Intel Technology Journal (February 2001)
Martinez, J.F., Renau, J., Huang, M.C., Prvulovic, M., Cherry, T.J.: Checkpointed Early Recycling in Out-of-order Microprocessors. In: Proceedings of International Symposium on Microarchitecture (November 2002)
Akkary, H., Rajwar, R., Srinivasan, S.T.: Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors. In: Proceedings of International Symposium on Microarchitecture, pp. 423–434 (December 2003)
Cristal, A., Santana, O., Valero, M.: Toward Kilo-instruction Processors. ACM Transactions on Architecture and Code Optimization 1(4), 389–417 (2004)
Canal, R., Parcerisa, J.M., González, A.: Dynamic Cluster Assignment Mechanisms. In: Proceedings of International Symposium on High Performance Computer Architectures (2000)
Balasubramonian, R., Dwarkadas, S., Albonesi, D.: Dynamically Managing the Communication-Parallelism Trade-off in Future Clustered Processors. In: Proceedings of the Annual International Symposium on Computer Architecture (June 2003)
Baniasadi, A., Moshovos, A.: Instruction Distribution Heuristics for Quad-Cluster, Dynamically-Schedule, Superscalar Processors. In: Proceedings of International Symposium on Microarchitecture (December 2000)
Aggarwal, A., Franklin, M.: An Empirical Study of the Scalability Aspects of Instruction Distribution Algorithms for Clustered Processors. In: Proceedings of ISPASS (2001)
Palacharla, S., Jouppi, N.P., Smith, J.E.: Complexity-effective Superscalar Processors. In: Proceedings of the Annual International Symposium on Computer Architecture, pp. 210–218 (June 1997)
Brown, M.D., Stark, J., Patt, Y.N.: Select-free instruction scheduling logic. In: Proceedings of International Symposium on Microarchitecture, pp. 204–213 (December 2001)
Buyuktosunoglu, A., Bose, P., Cook, P.W., Schuster, S.E.: Tradeoffs in Power-Efficient Issue Queue Design. In: Proceedings of International Conference on Parallel Architectures and Compilation Techniques (November 2000)
Folegnani, D., Gonzalez, A.: Energy-Effective Issue Logic. In: Proceedings ACM/IEEE 27th Intl. Symposium Computer Architecture, pp. 230–239 (June 2001)
Fields, B., Rubin, S., Bodik, R.: Focusing Processor Policies via Critical-Path Prediction. In: Proceedings 28th annual Intl. Symposium on Computer Architecture, pp. 74–85 (2001)
Lebeck, R., Li, T., Rotenberg, E., Koppanalil, J., Patwardhan, J.: A Large, Fast Instruction Window for Tolerating Cache Misses. In: Proceedings ACM/IEEE 29th Intl. Symposium on Computer Architecture, pp. 59–70 (June 2002)
Ponomarev, D., Kucuk, G., Ghose, K.: Reducing Power Requirements of Instruction Scheduling Through Dynamic Allocation of Multiple Datapath Resources. In: Proceedings 34th ACM/IEEE International Symposium on Microarchitecture, pp. 90–101 (2001)
Capitanio, A., Dutt, N., Nicolau, A.: Partitioned Register Files for VLIWs: A Preliminary Analysis of Trade-offs. In: Proceedings of the International Symposium on Microarchitecture, pp. 292–300 (December 1992)
Wallace, S., Bagherzadeh, N.: A Scalable Register File Architecture for Dynamically Scheduled Processors. In: Proceedings of International Conference on Parallel Architectures and Compilation Techniques, pp. 179–184 (1996)
Gonzalez, A., Gonzalez, J., Valero, M.: Virtual-Physical Registers. In: Proceedings of International Symposium on High-Performance Computer Architectures, pp. 175–184 (February 1998)
Cruz, J.-L., Gonzalez, A., Valero, M., Topham, N.: Multiple-Banked Register File Architectures. In: Proceedings of International Symposium on Computer Architecture, pp. 316–325 (June 2000)
Shivakumar, P., Jouppi, N.P.: CACTI 3.0: An Integrated Cache Timing, Power, and Area Model. WRL Research Report 2001/2 (August 2001)
Ergin, O., Balkan, D., Ponomarev, D., Ghose, K.: Increasing Processor Performance Through Early Register Release. In: Proceedings of 22nd International Conference on Computer Design, pp. 480–487 (October 2004)
http://www-03.ibm.com/servers/eserver/pseries/hardware/whitepapers/power4.html
Raasch, S.E., Binkert, N.L., Reinhardt, S.K.: A Scalable Instruction Queue Design Using Dependence Chains. In: Proceedings of 29th Annual Int’l Symp. on Computer Architecture, pp. 318–329 (May 2002)
Moshovos, A.: Checkpointing Alternatives for High Performance, Power-AwareProcessors. In: Proceedings of the IEEE Intl’ Symposium on Low Power Electronic Devices and Design (August 2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Latorre, F., Magklis, G., González, J., Chaparro, P., González, A. (2011). CROB: Implementing a Large Instruction Window through Compression. In: Stenström, P. (eds) Transactions on High-Performance Embedded Architectures and Compilers III. Lecture Notes in Computer Science, vol 6590. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19448-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-19448-1_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19447-4
Online ISBN: 978-3-642-19448-1
eBook Packages: Computer ScienceComputer Science (R0)