Skip to main content

Part of the book series: Lecture Notes in Computer Science ((THIPEAC,volume 6590))

Abstract

Current processors require a large number of in-flight instructions in order to look for further parallelism and hide the increasing gap between memory latency and processor cycle time. These in-flight instructions are typically stored in centralized structures called reorder buffer (ROB), which is a centerpiece to handle precise exceptions and recover a safe state in the event of a branch misprediction. However, this structure is becoming so big that it is difficult to fit it in the power budget of future processors designs. In this paper we propose a novel ROB microarchitecture named CROB (Compressed ROB) that can compress ROB entries and therefore give the illusion of having a larger virtual ROB than the number of ROB entries. The performance study of CROB shows a tremendous benefit, with an average speedup of 20% and 12% for a 128-entry and 256-entry ROB respectively. For some benchmark categories such as SpecFP2000, speedup raise up to 30%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Smith, J., Pleszkun, A.R.: Implementing precise interrupts in pipelined processors. IEEE Transactions on Computers 37(5), 562–573 (1988)

    Article  Google Scholar 

  2. Hinton, G., Sager, D., Upton, M., Boggs, D., Carmean, D., Kyker, A., Roussel, P.: The Microarchitecture of the Pentium® 4 Processor. Intel Technology Journal (February 2001)

    Google Scholar 

  3. Martinez, J.F., Renau, J., Huang, M.C., Prvulovic, M., Cherry, T.J.: Checkpointed Early Recycling in Out-of-order Microprocessors. In: Proceedings of International Symposium on Microarchitecture (November 2002)

    Google Scholar 

  4. Akkary, H., Rajwar, R., Srinivasan, S.T.: Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors. In: Proceedings of International Symposium on Microarchitecture, pp. 423–434 (December 2003)

    Google Scholar 

  5. Cristal, A., Santana, O., Valero, M.: Toward Kilo-instruction Processors. ACM Transactions on Architecture and Code Optimization 1(4), 389–417 (2004)

    Article  Google Scholar 

  6. Canal, R., Parcerisa, J.M., González, A.: Dynamic Cluster Assignment Mechanisms. In: Proceedings of International Symposium on High Performance Computer Architectures (2000)

    Google Scholar 

  7. Balasubramonian, R., Dwarkadas, S., Albonesi, D.: Dynamically Managing the Communication-Parallelism Trade-off in Future Clustered Processors. In: Proceedings of the Annual International Symposium on Computer Architecture (June 2003)

    Google Scholar 

  8. Baniasadi, A., Moshovos, A.: Instruction Distribution Heuristics for Quad-Cluster, Dynamically-Schedule, Superscalar Processors. In: Proceedings of International Symposium on Microarchitecture (December 2000)

    Google Scholar 

  9. Aggarwal, A., Franklin, M.: An Empirical Study of the Scalability Aspects of Instruction Distribution Algorithms for Clustered Processors. In: Proceedings of ISPASS (2001)

    Google Scholar 

  10. Palacharla, S., Jouppi, N.P., Smith, J.E.: Complexity-effective Superscalar Processors. In: Proceedings of the Annual International Symposium on Computer Architecture, pp. 210–218 (June 1997)

    Google Scholar 

  11. Brown, M.D., Stark, J., Patt, Y.N.: Select-free instruction scheduling logic. In: Proceedings of International Symposium on Microarchitecture, pp. 204–213 (December 2001)

    Google Scholar 

  12. Buyuktosunoglu, A., Bose, P., Cook, P.W., Schuster, S.E.: Tradeoffs in Power-Efficient Issue Queue Design. In: Proceedings of International Conference on Parallel Architectures and Compilation Techniques (November 2000)

    Google Scholar 

  13. Folegnani, D., Gonzalez, A.: Energy-Effective Issue Logic. In: Proceedings ACM/IEEE 27th Intl. Symposium Computer Architecture, pp. 230–239 (June 2001)

    Google Scholar 

  14. Fields, B., Rubin, S., Bodik, R.: Focusing Processor Policies via Critical-Path Prediction. In: Proceedings 28th annual Intl. Symposium on Computer Architecture, pp. 74–85 (2001)

    Google Scholar 

  15. Lebeck, R., Li, T., Rotenberg, E., Koppanalil, J., Patwardhan, J.: A Large, Fast Instruction Window for Tolerating Cache Misses. In: Proceedings ACM/IEEE 29th Intl. Symposium on Computer Architecture, pp. 59–70 (June 2002)

    Google Scholar 

  16. Ponomarev, D., Kucuk, G., Ghose, K.: Reducing Power Requirements of Instruction Scheduling Through Dynamic Allocation of Multiple Datapath Resources. In: Proceedings 34th ACM/IEEE International Symposium on Microarchitecture, pp. 90–101 (2001)

    Google Scholar 

  17. Capitanio, A., Dutt, N., Nicolau, A.: Partitioned Register Files for VLIWs: A Preliminary Analysis of Trade-offs. In: Proceedings of the International Symposium on Microarchitecture, pp. 292–300 (December 1992)

    Google Scholar 

  18. Wallace, S., Bagherzadeh, N.: A Scalable Register File Architecture for Dynamically Scheduled Processors. In: Proceedings of International Conference on Parallel Architectures and Compilation Techniques, pp. 179–184 (1996)

    Google Scholar 

  19. Gonzalez, A., Gonzalez, J., Valero, M.: Virtual-Physical Registers. In: Proceedings of International Symposium on High-Performance Computer Architectures, pp. 175–184 (February 1998)

    Google Scholar 

  20. Cruz, J.-L., Gonzalez, A., Valero, M., Topham, N.: Multiple-Banked Register File Architectures. In: Proceedings of International Symposium on Computer Architecture, pp. 316–325 (June 2000)

    Google Scholar 

  21. Shivakumar, P., Jouppi, N.P.: CACTI 3.0: An Integrated Cache Timing, Power, and Area Model. WRL Research Report 2001/2 (August 2001)

    Google Scholar 

  22. Ergin, O., Balkan, D., Ponomarev, D., Ghose, K.: Increasing Processor Performance Through Early Register Release. In: Proceedings of 22nd International Conference on Computer Design, pp. 480–487 (October 2004)

    Google Scholar 

  23. http://www-03.ibm.com/servers/eserver/pseries/hardware/whitepapers/power4.html

  24. Raasch, S.E., Binkert, N.L., Reinhardt, S.K.: A Scalable Instruction Queue Design Using Dependence Chains. In: Proceedings of 29th Annual Int’l Symp. on Computer Architecture, pp. 318–329 (May 2002)

    Google Scholar 

  25. Moshovos, A.: Checkpointing Alternatives for High Performance, Power-AwareProcessors. In: Proceedings of the IEEE Intl’ Symposium on Low Power Electronic Devices and Design (August 2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Latorre, F., Magklis, G., González, J., Chaparro, P., González, A. (2011). CROB: Implementing a Large Instruction Window through Compression. In: Stenström, P. (eds) Transactions on High-Performance Embedded Architectures and Compilers III. Lecture Notes in Computer Science, vol 6590. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19448-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-19448-1_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-19447-4

  • Online ISBN: 978-3-642-19448-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics