Multiple Stream Prediction

Santana, Oliverio J.; Ramirez, Alex; Valero, Mateo

doi:10.1007/978-3-540-77704-5_1

Oliverio J. Santana¹,
Alex Ramirez^1,2 &
Mateo Valero^1,2

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4759))

Included in the following conference series:

794 Accesses
1 Citations

Abstract

The next stream predictor is an accurate branch predictor that provides stream level sequencing. Every stream prediction contains a full stream of instructions, that is, a sequence of instructions from the target of a taken branch to the next taken branch, potentially containing multiple basic blocks. The long size of instruction streams makes it possible for the stream predictor to provide high fetch bandwidth and to tolerate the prediction table access latency. Therefore, an excellent way for improving the behavior of the next stream predictor is to enlarge instruction streams.

In this paper, we provide a comprehensive analysis of dynamic instruction streams, showing that there are several kinds of streams according to the terminating branch type. Consequently, focusing on particular kinds of stream is not a good strategy due to Amdahl’s law. We propose the multiple stream predictor, a novel mechanism that deals with all kinds of streams by combining single streams into long virtual streams. We show that our multiple stream predictor is able to tolerate the prediction table access latency without requiring the complexity caused by additional hardware mechanisms like prediction overriding, also reducing the overall branch predictor energy consumption.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agarwal, V., Hrishikesh, M.S., Keckler, S.W., Burger, D.: Clock rate versus IPC: The end of the road for conventional microarchitectures. In: 27th Intl. Symp. on Computer Architecture (2000)
Google Scholar
Falcón, A., Santana, O.J., Ramirez, A., Valero, M.: Tolerating branch predictor latency on SMT. In: 5th Intl. Symp. on High Performance Computing (2003)
Google Scholar
Gwennap, L.: Digital 21264 sets new standard. Microprocessor Report 10(14) (1996)
Google Scholar
Hrishikesh, M.S., Jouppi, N.P., Farkas, K.I., Burger, D., Keckler, S.W., Shivakumar, P.: The optimal useful logic depth per pipeline stage is 6-8 fo4. In: 29th Intl. Symp. on Computer Architecture (2002)
Google Scholar
Jacobson, Q., Rotenberg, E., Smith, J.E.: Path-based next trace prediction. In: 30th Intl. Symp. on Microarchitecture (1997)
Google Scholar
Jimenez, D.A.: Reconsidering complex branch predictors. In: 9th Intl. Conf. on High Performance Computer Architecture (2003)
Google Scholar
Jimenez, D.A., Keckler, S.W., Lin, C.: The impact of delay on the design of branch predictors. In: 33rd Intl. Symp. on Microarchitecture (2000)
Google Scholar
Jimenez, D.A., Lin, C.: Dynamic branch prediction with perceptrons. In: 7th Intl. Conf. on High Performance Computer Architecture (2001)
Google Scholar
Kaeli, D., Emma, P.: Branch history table prediction of moving target branches due to subroutine returns. In: 18th Intl. Symp. on Computer Architecture (1991)
Google Scholar
Ramirez, A., Larriba-Pey, J.L., Valero, M.: Trace cache redundancy: red & blue traces. In: 6th Intl. Conf. on High Performance Computer Architecture (2000)
Google Scholar
Ramirez, A., Santana, O.J., Larriba-Pey, J.L., Valero, M.: Fetching instruction streams. In: 35th Intl. Symp. on Microarchitecture (2002)
Google Scholar
Reinman, G., Austin, T., Calder, B.: A scalable front-end architecture for fast instruction delivery. In: 26th Intl. Symp. on Computer Architecture (1999)
Google Scholar
Rosner, R., Mendelson, A., Ronen, R.: Filtering techniques to improve trace cache efficiency. In: 10th Intl. Conf. on Parallel Architectures and Compilation Techniques (2001)
Google Scholar
Rotenberg, E., Bennett, S., Smith, J.E.: A trace cache microarchitecture and evaluation. IEEE Transactions on Computers 48(2) (1999)
Google Scholar
Santana, O.J., Falcón, A., Fernández, E., Medina, P., Ramirez, A., Valero, M.: A comprehensive analysis of indirect branch prediction. In: 4th Intl. Symp. on High Performance Computing (2002)
Google Scholar
Santana, O.J., Ramirez, A., Larriba-Pey, J.L., Valero, M.: A low-complexity fetch architecture for high-performance superscalar processors. ACM Transactions on Architecture and Code Optimization 1(2) (2004)
Google Scholar
Santana, O.J., Ramirez, A., Valero, M.: Latency tolerant branch predictors. In: Intl. Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems (2003)
Google Scholar
Santana, O.J., Ramirez, A., Valero, M.: Techniques for enlarging instruction streams. Technical Report UPC-DAC-RR-2005-11, Departament d’Arquitectura de Computadors, Universitat Politècnica de Catalunya (2005)
Google Scholar
Seznec, A., Felix, S., Krishnan, V., Sazeides, Y.: Design tradeoffs for the Alpha EV8 conditional branch predictor. In: 29th Intl. Symp. on Computer Architecture (2002)
Google Scholar
Seznec, A., Fraboulet, A.: Effective ahead pipelining of instruction block address generation. In: 30th Intl. Symp. on Computer Architecture (2003)
Google Scholar
Sherwood, T., Perelman, E., Calder, B.: Basic block distribution analysis to find periodic behavior and simulation points in applications. In: 10th Intl. Conf. on Parallel Architectures and Compilation Techniques (2001)
Google Scholar
Shivakumar, P., Jouppi, N.P.: CACTI 3.0: an integrated cache timing, power and area model. Technical Report 2001/2, Western Research Laboratory (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Departament d’Arquitectura de Computadors, Universitat Politècnica de Catalunya, Barcelona, Spain
Oliverio J. Santana, Alex Ramirez & Mateo Valero
Barcelona Supercomputing Center, Barcelona, Spain
Alex Ramirez & Mateo Valero

Authors

Oliverio J. Santana
View author publications
You can also search for this author in PubMed Google Scholar
Alex Ramirez
View author publications
You can also search for this author in PubMed Google Scholar
Mateo Valero
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Jesús Labarta Kazuki Joe Toshinori Sato

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Santana, O.J., Ramirez, A., Valero, M. (2008). Multiple Stream Prediction. In: Labarta, J., Joe, K., Sato, T. (eds) High-Performance Computing. ISHPC ALPS 2005 2006. Lecture Notes in Computer Science, vol 4759. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77704-5_1

Download citation

DOI: https://doi.org/10.1007/978-3-540-77704-5_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77703-8
Online ISBN: 978-3-540-77704-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics