Abstract
The next stream predictor is an accurate branch predictor that provides stream level sequencing. Every stream prediction contains a full stream of instructions, that is, a sequence of instructions from the target of a taken branch to the next taken branch, potentially containing multiple basic blocks. The long size of instruction streams makes it possible for the stream predictor to provide high fetch bandwidth and to tolerate the prediction table access latency. Therefore, an excellent way for improving the behavior of the next stream predictor is to enlarge instruction streams.
In this paper, we provide a comprehensive analysis of dynamic instruction streams, showing that there are several kinds of streams according to the terminating branch type. Consequently, focusing on particular kinds of stream is not a good strategy due to Amdahl’s law. We propose the multiple stream predictor, a novel mechanism that deals with all kinds of streams by combining single streams into long virtual streams. We show that our multiple stream predictor is able to tolerate the prediction table access latency without requiring the complexity caused by additional hardware mechanisms like prediction overriding, also reducing the overall branch predictor energy consumption.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agarwal, V., Hrishikesh, M.S., Keckler, S.W., Burger, D.: Clock rate versus IPC: The end of the road for conventional microarchitectures. In: 27th Intl. Symp. on Computer Architecture (2000)
Falcón, A., Santana, O.J., Ramirez, A., Valero, M.: Tolerating branch predictor latency on SMT. In: 5th Intl. Symp. on High Performance Computing (2003)
Gwennap, L.: Digital 21264 sets new standard. Microprocessor Report 10(14) (1996)
Hrishikesh, M.S., Jouppi, N.P., Farkas, K.I., Burger, D., Keckler, S.W., Shivakumar, P.: The optimal useful logic depth per pipeline stage is 6-8 fo4. In: 29th Intl. Symp. on Computer Architecture (2002)
Jacobson, Q., Rotenberg, E., Smith, J.E.: Path-based next trace prediction. In: 30th Intl. Symp. on Microarchitecture (1997)
Jimenez, D.A.: Reconsidering complex branch predictors. In: 9th Intl. Conf. on High Performance Computer Architecture (2003)
Jimenez, D.A., Keckler, S.W., Lin, C.: The impact of delay on the design of branch predictors. In: 33rd Intl. Symp. on Microarchitecture (2000)
Jimenez, D.A., Lin, C.: Dynamic branch prediction with perceptrons. In: 7th Intl. Conf. on High Performance Computer Architecture (2001)
Kaeli, D., Emma, P.: Branch history table prediction of moving target branches due to subroutine returns. In: 18th Intl. Symp. on Computer Architecture (1991)
Ramirez, A., Larriba-Pey, J.L., Valero, M.: Trace cache redundancy: red & blue traces. In: 6th Intl. Conf. on High Performance Computer Architecture (2000)
Ramirez, A., Santana, O.J., Larriba-Pey, J.L., Valero, M.: Fetching instruction streams. In: 35th Intl. Symp. on Microarchitecture (2002)
Reinman, G., Austin, T., Calder, B.: A scalable front-end architecture for fast instruction delivery. In: 26th Intl. Symp. on Computer Architecture (1999)
Rosner, R., Mendelson, A., Ronen, R.: Filtering techniques to improve trace cache efficiency. In: 10th Intl. Conf. on Parallel Architectures and Compilation Techniques (2001)
Rotenberg, E., Bennett, S., Smith, J.E.: A trace cache microarchitecture and evaluation. IEEE Transactions on Computers 48(2) (1999)
Santana, O.J., Falcón, A., Fernández, E., Medina, P., Ramirez, A., Valero, M.: A comprehensive analysis of indirect branch prediction. In: 4th Intl. Symp. on High Performance Computing (2002)
Santana, O.J., Ramirez, A., Larriba-Pey, J.L., Valero, M.: A low-complexity fetch architecture for high-performance superscalar processors. ACM Transactions on Architecture and Code Optimization 1(2) (2004)
Santana, O.J., Ramirez, A., Valero, M.: Latency tolerant branch predictors. In: Intl. Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems (2003)
Santana, O.J., Ramirez, A., Valero, M.: Techniques for enlarging instruction streams. Technical Report UPC-DAC-RR-2005-11, Departament d’Arquitectura de Computadors, Universitat Politècnica de Catalunya (2005)
Seznec, A., Felix, S., Krishnan, V., Sazeides, Y.: Design tradeoffs for the Alpha EV8 conditional branch predictor. In: 29th Intl. Symp. on Computer Architecture (2002)
Seznec, A., Fraboulet, A.: Effective ahead pipelining of instruction block address generation. In: 30th Intl. Symp. on Computer Architecture (2003)
Sherwood, T., Perelman, E., Calder, B.: Basic block distribution analysis to find periodic behavior and simulation points in applications. In: 10th Intl. Conf. on Parallel Architectures and Compilation Techniques (2001)
Shivakumar, P., Jouppi, N.P.: CACTI 3.0: an integrated cache timing, power and area model. Technical Report 2001/2, Western Research Laboratory (2001)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Santana, O.J., Ramirez, A., Valero, M. (2008). Multiple Stream Prediction. In: Labarta, J., Joe, K., Sato, T. (eds) High-Performance Computing. ISHPC ALPS 2005 2006. Lecture Notes in Computer Science, vol 4759. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77704-5_1
Download citation
DOI: https://doi.org/10.1007/978-3-540-77704-5_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77703-8
Online ISBN: 978-3-540-77704-5
eBook Packages: Computer ScienceComputer Science (R0)