Skip to main content

An Integrated Partitioning and Scheduling Based Branch Decoupling

  • Conference paper
Advances in Computer Systems Architecture (ACSAC 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3740))

Included in the following conference series:

  • 1042 Accesses

Abstract

Conditional branch induced control hazards cause significant performance loss in modern out-of-order superscalar processors. Dynamic branch prediction techniques help alleviate the penalties associated with conditional branch instructions. However, branches still constitute one of the main hurdles towards achieving higher ILP. Dynamic branch prediction relies on the temporal locality of and spatial correlations between branches. Branch decoupling is yet another mechanism that exploits the innate lead in the branch schedule with respect to the rest of the computation. The compiler is responsible for generating the two maximally decoupled instruction streams: branch stream and program stream. Our earlier work on trace based evaluation of branch decoupling demonstrates a performance advantage of between 12% to 46% over 2-level branch prediction. However, how much of these gains are achievable through static, compiler driven decoupling is not known. This paper answers the question partially. A novel decoupling algorithm that integrates graph bi-partitioning and scheduling, was deployed in the GNU C compiler to generate a two instruction stream executable. These executables were targeted to branch decoupled architecture simulator with superscalar cores for the branch stream and program stream processors. Simulations show an average performance improvement of 7.7% and 5.5% for integer and floating point benchmarks of the SPEC2000 benchmark suite respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Burger, D., Austin, T.: The simplescalar tool set, ver 2.0. Technical Report 1342 Computer Science Department, University of Wisconsin-Madison (1997)

    Google Scholar 

  2. Chappell, R., Tseng, F., Patt, Y., Yoaz, A.: Difficult-path branch prediction using subordinate microthreads. In: Proceedings of IEEE 29th Annual International Symposium on Computer Architecture, ISCA (2002)

    Google Scholar 

  3. Cormen, T., Leiserson, C., Rivest, R.: Introduction to algorithms, 2nd edn. MIT Press, Cambridge (1990)

    MATH  Google Scholar 

  4. De Micheli, G.: Synthesis and optimization of digital circuits. McGraw Hill, Inc, New York (1994)

    Google Scholar 

  5. Fields, B., Bodik, R., Hill, M.: Slack: Maximizing performance under technological constraints. In: Proceedings of IEEE International Symposium on Computer Architecture, ISCA (2002)

    Google Scholar 

  6. Kernighan, B., Lin, S.: An efficient heuristic procedure for partitioning graphs. The Bell System Technical Journal (1970)

    Google Scholar 

  7. Lengauer, T.: Combinatorial algorithms for integrated circuit layout. John Wiley and Sons, Chichester (1990)

    MATH  Google Scholar 

  8. Nadkarni, A., Tyagi, A.: A trace based evaluation of speculative branch decoupled architectures. In: Proceedings of IEEE International Conference on Computer Design, ICCD (2000)

    Google Scholar 

  9. Sair, S., Charney, M.: Memory behavior of the spec2000 benchmark suite. Technical Report RC-21852, IBM T.J. Watson Research Center (2000)

    Google Scholar 

  10. Smith, J., Weiss, S., Pang, N.: A simulation study of decoupled architecture computers. IEEE Transactions on Computers (1986)

    Google Scholar 

  11. Stallman, R.: Gnu compiler collection internals. Free Software Foundation (2002)

    Google Scholar 

  12. Standard Performance and Evaluation Corporation. SPEC CPU2000 V1.2 (2000)

    Google Scholar 

  13. Tyagi, A.: Branch decoupled architectures. In: Proceedings of the Workshop on Interaction between Compilers and Computer Architecture, 3rd International Symposium on High-Performance Computer Architecture (1997)

    Google Scholar 

  14. Tyagi, A., Ng, H., Mohapatra, P.: Dynamic branch decoupled architecture. In: Proceedings of the IEEE International Conference on Computer Design (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ramarao, P., Tyagi, A. (2005). An Integrated Partitioning and Scheduling Based Branch Decoupling. In: Srikanthan, T., Xue, J., Chang, CH. (eds) Advances in Computer Systems Architecture. ACSAC 2005. Lecture Notes in Computer Science, vol 3740. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11572961_21

Download citation

  • DOI: https://doi.org/10.1007/11572961_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29643-0

  • Online ISBN: 978-3-540-32108-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics