Exploring branch target buffer access filtering for low-energy and high-performance microarchitectures

S. Wang; J. Hu; S.G. Ziavras

Exploring branch target buffer access filtering for low-energy and high-performance microarchitectures

Access Full Text

Exploring branch target buffer access filtering for low-energy and high-performance microarchitectures

Author(s): S. Wang ; J. Hu ; S.G. Ziavras
DOI: 10.1049/iet-cdt.2010.0102

For access to this article, please select a purchase option:

Buy article PDF

Buy Knowledge Pack

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership

Recommend Title Publication to library

IET Computers & Digital Techniques — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

Author(s): S. Wang ¹ ; J. Hu ² ; S.G. Ziavras ³
- Affiliations: 1: National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, People's Republic of China
  2: National Key Laboratory for Novel Software Technology, Intel Corporation, Portland, USA
  3: Department of Electrical and Computer Engineering, New Jersey Institute of Technology, Newark, USA
Source: Volume 6, Issue 1, January 2012, p. 50 – 58
DOI: 10.1049/iet-cdt.2010.0102 , Print ISSN 1751-8601, Online ISSN 1751-861X

Published

Powerful branch predictors along with a large branch target buffer (BTB) are employed in superscalar and simultaneous multi-threading (SMT) processors for instruction-level parallelism and thread-level parallelism exploitation. However, the large BTB not only dominates the predictor energy consumption, but also becomes a major roadblock in achieving faster clock frequencies at deep sub-micron technologies. The authors propose here a filtering scheme to dramatically reduce the accesses to the BTB to achieve significantly reduced energy consumption in the BTB while maintaining the performance. For a simulated superscalar microprocessor, the experimental evaluation shows that the BTB access filtering (BAF) design achieves an 88.5% dynamic energy reduction with negligible performance loss. The authors also study the leakage behaviour and its control in the BAF design. The results show that by applying a drowsy strategy, very effective leakage control can be achieved. For the high-performance design, the BAF can also improve BTB's performance scalability at new technologies. For the simultaneous multi-threading environment, the authors evaluate the effectiveness of the BAF design and propose a banked BAF (BK-BAF) scheme to further reduce the energy consumption and performance overhead. The experimental results confirm that the BK-BAF scheme can be an energy/performance-effective design for next generation SMT processors.

References

1. 1)
  - Hily, S., Seznec, A.: `Branch prediction and simultaneous multithreading', Proc. Int. Conf. on Parallel Architectures and Compilation Techniques, October 1996, p. 169–173.
2. 2)
  - Preston, R.P., Badeau, R.W., Bailey, D.W.: `Design of an 8-issue superscalar RISC microprocessor with simultaneous multithreading', Proc. IEEE Int. Solid-State Circuits Conf., 2002.
3. 3)
  - Flautner, K., Kim, N., Martin, S., Blaauw, D., Mudge, T.: `Drowsy caches: simple techniques for reducing leakage power', Proc. 29th Int. Symp. on Computer Architecture, Anchorage, May 2002, AK, p. 148–157.
4. 4)
  - D. Parikh , K. Skadron , Y. Zhang , M. Stan . Power-aware branch prediction: characterization and design. IEEE Trans. Comput. , 2 , 168 - 186
5. 5)
  - J. Kin , M. Gupta , W.H. Mangione-Smith . Filtering memory references to increase energy efficiency. IEEE Trans. Comput. , 1 , 1 - 15
6. 6)
  - Tseng, J., Asanovic, K.: `Banked multiported register files for high-frequency superscalar microprocessors', 30thInt. Symp. on Computer Architecture (ISCA-30), June 2003, San Diego, CA, p. 62–71.
7. 7)
  - A. Falcon , O.J. Santana , A. Ramirez , M. Valero . A latency-conscious SMT branch prediction architecture. Int. J. High Perform. Comput. Netw. , 1 , 11 - 21
8. 8)
  - Burger, D., Austin, T.M.: `The SimpleScalar tool set, version 2.0′', Technical report 1342, Computer Sciences Department, University of Wisconsin, 1997.
9. 9)
  - Sherwood, T., Perelman, E., Hamerly, G.: `Automatically characterizing large scale program behavior', Proc. ASPLOS X, October 2002, p. 45–57.
10. 10)
  - Wang, S., Hu, J., Ziavras, S.G.: `BTB access filtering: a low energy and high performance design', Proc. IEEE Computer Society Annual Symp. on VLSI, April 2008, p. 81–86.
11. 11)
  - Chang, Y.-J.: `Lazy BTB: reduce BTB energy consumption using dynamic profiling', Proc. 2006 Conf. Asia South Pacific Design Automation, ASP-DAC'06, 2006, p. 917–922.
12. 12)
  - Pizzol, G.D., Navaux, P.O.A.: `Branch prediction topologies for SMT architectures', Proc. 17th Int. Symp. on Computer Architecture and High Performance Computing, 2005, p. 118–125.
13. 13)
  - Brooks, D., Tiwari, V., Martonosi, M.: `Wattch: a framework for architectural-level power analysis and optimizations', Proc. Int. Symp. on High-Performance Computer Architecture, 2000, p. 83–94.
14. 14)
  - A. Seznec , S. Felix , V. Krishnan , Y. Sazeides . Design tradeoffs for the alpha EV8 conditional branch predictor. ACM SIGARCH Comput. Architect. News , 2 , 295 - 306
15. 15)
  - Smith, J.E.: `A study of branch prediction strategies', Proc. Eighth Annual Symp. on Computer Architecture, ISCA'81, 1981, p. 135–148.
16. 16)
  - Kaxiras, S., Hu, Z., Martonosi, M.: `Cache decay: exploiting generational behavior to reduce cache leakage power', Proc. Int. Symp. on Computer Architecture, 2001, p. 240–251.
17. 17)
  - J. Casazza . (2008) First the tick, now the tock: Intel microarchitecture (Nehalem).
18. 18)
  - Hu, Z., Juang, P., Skadron, K., Clark, D., Martonosi, M.: `Applying decay strategies to branch predictors for leakage energy savings', Proc. 2002 Int. Conf. Computer Design, September 2002, p. 442–445.
19. 19)
  - Wallace, S., Bagherzadeh, N.: `A scalable register file architecture for dynamically scheduled processors', Proc. 1996 Conf. on Parallel Architectures and Compilation Techniques, 1996, p. 179–184.
20. 20)
  - Petrov, P., Orailoglu, A.: `Low-power branch target buffer for application-specific embedded processors', Proc. Euromicro Symp. on Digital Systems Design, DSD'03, 2003, p. 158–165.
21. 21)
  - Bannon, P.: `Alpha 21364: a scalable single-chip SMP', Microprocessor Forum, 1998.
22. 22)
  - S. Eggers , J. Emer , H. Levy , J. Lo , R. Stamm , D. Tullsen . Simultaneous multithreading: a platform for next-generation processors. IEEE Micro , 5 , 12 - 19
23. 23)
  - Raasch, S., Reinhardt, S.: `The impact of resource partitioning on SMT processors', Proc. Int. Conf. on Parallel Architectures and Compilation Techniques, 2003, p. 15–25.
24. 24)
  - Tullsen, D., Eggers, S., Emer, J., Levy, H., Lo, J., Stamm, R.: `Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor', Proc. 22nd Annual Int. Symp. on Computer Architecture, May 1996, p. 191–202.
25. 25)
  - R.E. Kessler . The alpha 21264 microprocessor. IEEE Micro , 2 , 24 - 36
26. 26)
  - Borch, E., Tune, E., Manne, S., Emer, J.: `Loose loops sink chips', Proc. HPCA-8, February 2002, p. 270–281.
27. 27)
  - Ernst, D., Hamel, A., Austin, T.: `Cyclone: a broadcast-free dynamic instruction scheduler selective replay', Proc. 30th Annual Int. Symp. Computer Architecture, June 2003, p. 235–262.
28. 28)
  - Park, I., Powell, M., Vijaykumar, T.: `Reducing register ports for higher speed and lower energy', Proc. Int. Symp. on Microarchitecture, December 2002, p. 171–182.
29. 29)
  - Canal, R., Gonzalez, A.: `Reducing the complexity of the issue logic', Proc. 2001 Int. Conf. on Supercomputing, June 2001, p. 312–320.
30. 30)
  - Palacharla, S., Jouppi, N.P., Smith, J.: `Complexity-effective superscalar processors', Proc. 24th Annual Int. Symp. on Computer Architecture, June 1997, p. 206–218.
31. 31)
  - Weglarz, E., Saluja, K., Lipasti, M.: `Minimizing energy consumption for high-performance processing', Proc. Asia and South Pacific Design Automation Conf., 2002, p. 199–204.
32. 32)
  - Tullsen, D., Eggers, S., Levy, H.: `Simultaneous multithreading: maximizing on-chip parallelism', Proc. 22nd Annual Int. Symp. Computer Architecture, June 1995, p. 392–403.
33. 33)
  - Kin, J., Gupta, M., Mangione-Smith, W.H.: `The filter cache: an energy efficient memory structure', Proc. Annual ACM/IEEE Int. Symp. on Microarchitecture, 1997, p. 184–193.
34. 34)
  - Yeh, T.-Y., Patt, Y.N.: `Alternative implementations of two-level adaptive branch predictions', 19thAnnual Int. Symp. Computer Architecture, Gold Coast, May 1992, Australia, p. 124–134.
35. 35)
  - Jimnez, D.A., Lin, C.: `Dynamic branch prediction with perceptrons', Proc. Seventh Int. Symp. on High-Performance Computer Architecture, HPCA'01, 2001, p. 197–206.
36. 36)
  - Hrishikesh, M.S., Burger, D., Keckler, S.W., Shivakumar, P., Jouppi, N.P., Farkas, K.I.: `The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays', Proc. 29th Annual Int. Symp. on Computer Architecture, May 2002, p. 14–24.
37. 37)
  - C. Perleberg , A. Smith . Branch target buffer design and optimization. IEEE Trans. Comput. , 4 , 396 - 412
38. 38)
  - Ramsay, M., Feucht, C., Lipasti, M.H.: `Exploring efficient SMT branch predictor design', Proc. Workshop on Complexity-Effective Design, June 2003.
39. 39)
  - A. Falcon , O.J. Santana , A. Ramirez , M. Valero . Tolerating branch predictor latency on SMT. Lect. Notes Comput. Sci. , 86 - 98
40. 40)
  - McFarling, S.: `Combining branch predictors’. WRL Technical Note TN-36', Technical report, 1993.
41. 41)
  - C.F. Webb . Ibm z10: the next-generation mainframe microprocessor. IEEE Micro , 2 , 12 - 19
42. 42)
  - M.C. Huang , D. Chaver , L. Pinuel . Customizing the branch predictor to reduce complexity and energy consumption. IEEE Micro , 5 , 12 - 25

Login

Not registered yet?

Share

Tools

Login to add to favourites

Key

Exploring branch target buffer access filtering for low-energy and high-performance microarchitectures

Exploring branch target buffer access filtering for low-energy and high-performance microarchitectures

Buy article PDF

Buy Knowledge Pack

Thank you

References

Related content