Abstract
We present the Instruction Register File (IRF) coupled with a basic block translator, aiming to deliver a high instruction fetch rate. The IRF has one write port to load instruction cache blocks into registers. It also has p read ports to fetch up to p basic blocks per cycle from up to p registers. The translator predicts up to p on-path basic blocks per cycle and translates their start address into an IRF reference. The references are used in the fetch stage to read the registers and the basic blocks limits serve to merge the accessed registers into a dynamically predicted trace line. The IRF coupled with basic block descriptor tables avoid the need to cache traces as in the trace cache micro-architecture. Moreover, the IRF places the instruction memory hierarchy out of the cycle determining path, as does the data register file with the data memory hierarchy. The IRF performance is estimated with a SimpleScalar based simulator run on the Mediabench benchmark suite and compared to the trace cache performance on the same benchmarks. We show that on this benchmark suite, an IRF-based processor fetching up to 3 basic blocks per cycle outperforms a trace-cache-based processor fetching 16 instructions long traces by 25% on the average.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ahuja, P.S., Skadron, K., Martonosi, M., Clark, D.W.: Multipath execution: opportunities and limits. In: ICS12 (1998)
Black, B., Rychlik, B., Shen, J.P.: The block-based trace cache. In: ISCA26 (1999)
Burger, D., Austin, T.M.: The SimpleScalar tool set, version 2.0. Technical report 1342, University of Wisconsin-Madison (June 1997)
Calder, B., Grunwald, D.: Reducing branch costs via branch alignment. In: ASPLOS6 (1994)
Chen, I.K., Lee, C.C., Mudge, T.N.: Instruction prefetching using branch prediction information. In: ICCD 1997 (1997)
Fisher, J.A.: Trace scheduling: a technique for global microcode compaction. IEEE Trans. on Computers C30(7), 478–490 (1981)
Friendly, D.H., Patel, S.J., Patt, Y.N.: Alternative fetch and issue policies for the trace cache fetch mechanism. In: Micr30 (1997)
Gray, C.T., Liu, W., Cain III, R.K.: Wave pipelining: theory and CMOS implementation. Kluwer academic publishers, Norwell (1993)
Hao, E., Chang, P., Evers, M., Patt, Y.: Increasing the prediction fetch rate via block-structured instruction set architectures. In: Micro29 (1996)
ftp://www.hotchips.org/pub/hot7to11cd/hc98/pdf_1up/hc98/1a_johnson_1up.pdf
Jacobson, Q., Smith, J.E.: Trace preconstruction. In: ISCA27 (2000)
Lee, C., Potkonjak, M., Mangione-Smith, W.H.: Mediabench: a tool for evaluating and synthetizing multimedia and communications systems. In: Micr30 (1997)
Mc Farling, S.: Combining branch predictors. Technical report TN-36, DEC-WRL (June 1993)
Mahlke, S.A., et al.: Characterizing the impact of predicated execution on branch prediction. In: Micro27 (1994)
Patel, S.J., Evers, M., Patt, Y.N.: Improving trace cache effectiveness with branch promotion and trace packing. In: ISCA25 (1998)
Peleg, A., Weiser, U.: Dynamic flow instruction cache memory organized around trace segments independent of virtual address line. U.S. patent 5–381–533 (1994)
Perleberg, C.H., Smith, A.J.: Branch target buffer design and optimization. IEEE Trans. on Computers 42(4), 396–412 (1993)
Pierce, J., Mudge, T.: Wrong-path instruction prefetching. In: Micro29 (1996)
ftp://download.intel.com/pentium4/download/netburstdetail.pdf
Rakvic, R., Black, B., Shen, J.P.: Completion time multiple branch prediction for enhancing trace cache performance. In: ISCA27 (2000)
Ramirez, A., Larriba-Rey, J., Valero, M.: Trace cache redundancy: red and blue traces. In: HPCA6 (2000)
Reinmann, G., Austin, T., Calder, B.: A scalable front-end architecture for fast instruction delivery. In: ISCA26 (1999)
Reinmann, G., Calder, B., Austin, T.: Fetch directed instruction prefetching. In: Micro32 (1999)
Rotenberg, E., Bennett, S., Smith, J.E.: Trace cache: a low latency approach to high bandwidth instruction fetching. In: Micro29 (1996)
Seznec, A., Jourdan, S., Sainrat, P., Michaud, P.: Multiple-block ahead branch predictors. In: ASPLOS7 (1996)
Veidenbaum, A., Zhao, Q., Shameer, A.: Non sequential instruction cache prefetching for multiple-issue processors. Int. Journal of Highspeed Computing 10(1), 115–140 (1999)
Yeh, T., Patt, Y.: A comprehensive instruction fetch mechanism for a processor supporting speculative execution. In: Micro25 (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Goossens, B. (2003). The Instruction Register File. In: Malyshkin, V.E. (eds) Parallel Computing Technologies. PaCT 2003. Lecture Notes in Computer Science, vol 2763. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45145-7_43
Download citation
DOI: https://doi.org/10.1007/978-3-540-45145-7_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40673-0
Online ISBN: 978-3-540-45145-7
eBook Packages: Springer Book Archive