Abstract
Superscalar processors tolerate long-latency memory operations by maintaining a high number of in-flight instructions. Since the gap between processor and memory speed continues increasing every year, the number of in-flight instructions needed to support the large memory access latencies expected in the future should be higher and higher. However, scaling-up the structures required by current processors to support such a high number of in-flight instructions is impractical due to area, power consumption, and cycle time constraints.
The kilo-instruction processor is an affordable architecture able to tolerate the memory access latency by supporting thousands of in-flight instructions. Instead of simply up-sizing the processor structures, the kilo-instruction architecture relies on an efficient multi-checkpointing mechanism. Multi-checkpointing leverages a set of techniques like multi-level instruction queues, late register allocation, and early register release. These techniques emphasize the intelligent use of the available resources, avoiding scalability problems in the design of the critical processor structures. Furthermore, the kilo-instruction architecture is orthogonal to other architectures, like multi-processors and vector processors, which can be combined to boost the overall processor performance.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Akkary, H., Rajwar, R., Srinivasan, S.T.: Checkpoint processing and recovery: towards scalable large instruction window processors. In: Procs. of the 36th Intl. Symp. on Microarchitecture (2003)
Brekelbaum, E., Rupley, J., Wilkerson, C., Black, B.: Hierarchical scheduling windows. In: Procs. of the 35th Intl. Symp. on Microarchitecture (2002)
Cristal, A., Valero, M., Gonzalez, A., Llosa, J.: Large virtual ROBs by processor checkpointing. Technical Report UPC-DAC-2002-39, Departament d’Arquitectura de Computadors, Universitat Politècnica de Catalunya (2002)
Cristal, A., Ortega, D., Llosa, J., Valero, M.: Kilo-instruction processors. In: Procs. of the 5th Intl. Symp. on High Performance Computing (2003)
Cristal, A., Martinez, J.F., Llosa, J., Valero, M.: Ephemeral registers with multicheckpointing. Technical Report UPC-DAC-2003-51, Departament d’Arquitectura de Computadors, Universitat Politècnica de Catalunya (2003)
Cristal, A., Ortega, D., Llosa, J., Valero, M.: Out-of-order commit processors. In: Procs. of the 10th Intl. Symp. on High-Performance Computer Architecture (2004)
Galluzzi, M., Puente, V., Cristal, A., Beivide, R., Gregorio, J.A., Valero, M.: A first glance at kilo-instruction based multiprocessors. In: Procs. of the 1st Conf. on Computing Frontiers (2004)
Hwu, W.M., Patt, Y.N.: Checkpoint repair for out-of-order execution machines. In: Procs. of the 14th Intl. Symp. on Computer Architecture (1987)
Lebeck, A., Koppanalil, J., Li, T., Patwardhan, J., Rotenberg, E.: A large, fast instruction window for tolerating cache misses. In: Procs. of the 29th Intl. Symp. on Computer Architecture (2002)
Martinez, J.F., Renau, J., Huang, M., Prvulovic, M., Torrellas, J.: Cherry: checkpointed early resource recycling in out-of-order microprocessors. In: Procs. of the 35th Intl. Symp. on Microarchitecture (2002)
Martinez, J.F., Cristal, A., Valero, M., Llosa, J.: Ephemeral registers. Technical Report CSL-TR-2003-1035, Cornell Computer Systems Lab (2003)
Monreal, T., Gonzalez, A., Valero, M., Gonzalez, J., Viñals, V.: Delaying physical register allocation through virtual-physical registers. In: Procs. of the 32nd Intl. Symp. on Microarchitecture (1999)
Moudgill, M., Pingali, K., Vassiliadis, S.: Register renaming and dynamic speculation: an alternative approach. In: Procs. of the 26th Intl. Symp. on Microarchitecture (1993)
Mutlu, O., Stark, J., Wilkerson, C., Patt, Y.N.: Runahead execution: an alternative to very large instruction windows for out-of-order processors. In: Procs. of the 9th Intl. Symp. on High-Performance Computer Architecture (2003)
Palacharla, S., Jouppi, N.P., Smith, J.E.: Complexity-effective superscalar processors. In: Procs. of the 24th Intl. Symp. on Computer Architecture (1997)
Park, I., Ooi, C., Vijaykumar, T.: Reducing design complexity of the load/store queue. In: Procs. of the 36th Intl. Symp. on Microarchitecture (2003)
Sethumadhavan, S., Desikan, R., Burger, D., Moore, C., Keckler, S.: Scalable hardware memory disambiguation for high ILP processors. In: Procs. of the 36th Intl. Symp. on Microarchitecture (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cristal, A., Santana, O.J., Valero, M. (2004). Maintaining Thousands of In-flight Instructions. In: Danelutto, M., Vanneschi, M., Laforenza, D. (eds) Euro-Par 2004 Parallel Processing. Euro-Par 2004. Lecture Notes in Computer Science, vol 3149. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27866-5_2
Download citation
DOI: https://doi.org/10.1007/978-3-540-27866-5_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22924-7
Online ISBN: 978-3-540-27866-5
eBook Packages: Springer Book Archive