research-article

Branch penalty reduction on IBM cell SPUs via software branch hinting

Authors:
Jing Lu

Arizona State University, Tempe, AZ, USA

Arizona State University, Tempe, AZ, USA
View Profile

,
Yooseong Kim

Arizona State University, Tempe, AZ, USA

Arizona State University, Tempe, AZ, USA
View Profile

,
Aviral Shrivastava

Arizona State University, Tempe, AZ, USA

Arizona State University, Tempe, AZ, USA
View Profile

,
Chuan Huang

Arizona State University, Tempe, AZ, USA

Arizona State University, Tempe, AZ, USA
View Profile

CODES+ISSS '11: Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesisOctober 2011Pages 355–364https://doi.org/10.1145/2039370.2039425

Published:09 October 2011Publication History

CODES+ISSS '11: Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis

Pages 355–364

ABSTRACT

As power-efficiency becomes paramount concern in processor design, architectures are coming up that completely do away with hardware branch prediction, and rely solely on software branch hinting. A popular example is the Synergistic Processing Unit (SPU) in the IBM Cell processor. To be able to minimize the branch penalty using branch hint instructions, in addition to estimating the branch probabilities (which has been looked at before [6, 25, 24]), it is important to carefully insert branch hints. Towards this, in this paper, we i) construct a branch penalty model for compiler, ii) formulate the problem of minimizing branch penalty using branch hinting and iii) propose a heuristic to solve this problem. The heuristic is based on three basic techniques that we introduce in this paper: NOP padding, hint pipelining, and nested loop restructuring. Experimental results on several benchmarks show that our solution can reduce the branch penalty as much as 35.4% over the previous approach.

References

GNU Toolchain 4.1.1 and GDB for the Cell BE's PPU/SPU. http://www.bsc.es/plantillaH.php?cat_id=304.Google Scholar
IBM Full-System Simulator for Cell BE. http://www.alphaworks.ibm.com/tech/cellsystemsim.Google Scholar
A. Agarwal and M. Levy. The kill rule for multicore. In Proceedings of the 44th annual Design Automation Conference, DAC '07, pages 750--753, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
K. Bai and A. Shrivastava. Heap Data Management for Limited Local Memory (LLM) Multi-core Processors. In CODES+ISSS '10: Proceedings of the 23th international symposium on System Synthesis, New York, NY, USA, 2010. ACM Press. ISBN. Google ScholarDigital Library
K. Bai, A. Shrivastava, and S. Kudchadker. Stack Data Management for Limited Local Memory (LLM) Multi-core Processors. In Proceedings of the International Conference on Application Specific Systems, Architectures and Processors (ASAP), 2011. Google ScholarDigital Library
T. Ball and J. R. Larus. Branch prediction for free. In Proceedings of PLDI, pages 300--313, New York, NY, USA, 1993. ACM. Google ScholarDigital Library
M. Briejer, C. Meenderinck, and B. Juurlink. Extending the Cell SPE with energy efficient branch prediction. In Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I, EuroPar'10, pages 304--315, Berlin, Heidelberg, 2010. Springer-Verlag. Google ScholarDigital Library
A. E. Eichenberger, K. O'Brien, K. O'Brien, P. Wu, T. Chen, P. H. Oden, D. A. Prener, J. C. Shepherd, B. So, Z. Sura, A. Wang, T. Zhang, P. Zhao, and M. Gschwind. Optimizing Compiler for the CELL Processor. In Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, PACT '05, pages 161--172, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarDigital Library
M. Gschwind, H. Hofstee, B. Flachs, M. Hopkins, Y. Watanabe, and T. Yamazaki. Synergistic processing in Cell's multicore architecture. IEEE Micro, 26(2):10--24, 2006. Google ScholarDigital Library
J. Gustafsson, A. Betts, A. Ermedahl, and B. Lisper. The Mälardalen WCET benchmarks - past, present and future. pages 137--147, Brussels, Belgium, July 2010. OCG.Google Scholar
H. Hofstee. Power efficient processor architecture and the Cell processor. In High-Performance Computer Architecture, 2005. HPCA-11. 11th International Symposium on, pages 258--262, 2005. Google ScholarDigital Library
IBM. Cell Broadband Engine Programming Handbook including PowerXCell 8i. https://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/7A77CCDF14FE7%0D5852575CA0074E8ED.Google Scholar
IBM. IBM Cell SDK 3.1. http://www.ibm.com/developerworks/power/cell.Google Scholar
Dual-Core Intel Itanium Processor 9000 and 9100 Series. http://download.intel.com/design/itanium/downloads/314054.pdf, 2007.Google Scholar
D. A. Jiménez and C. Lin. Dynamic branch prediction with perceptrons. In HPCA '01: Proceedings of the 7th International Symposium on High-Performance Computer Architecture, page 197, Washington, DC, USA, 2001. IEEE Computer Society. Google ScholarDigital Library
S. C. Jung, A. Shrivastava, and K. Bai. Dynamic code mapping for limited local memory systems. In Application-specific Systems Architectures and Processors (ASAP), 2010 21st IEEE International Conference on, pages 13--20, 2010.Google ScholarCross Ref
J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, and D. Shippy. Introduction to the cell multiprocessor. IBM J. Res. Dev., 49:589--604, July 2005. Google ScholarDigital Library
J. Kalamatianos and D. R. Kaeli. Improving the accuracy of indirect branch prediction via branch classification. SIGARCH Comput. Archit. News, 27(1):23--26, 1999. Google ScholarDigital Library
D. Kolson, A. Nicolau, and N. Dutt. Elimination of redundant memory traffic in high-level synthesis. IEEE Trans. on Comp-aided Design, 15:1354--1363, 1996. Google ScholarDigital Library
P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-way multithreaded Sparc processor. IEEE Micro, 25(2):21--29, 2005. Google ScholarDigital Library
A. Pabalkar, A. Shrivastava, A. Kannan, and J. Lee. SDRM: Simultaneous Determination of Regions and Function-to-Region Mapping for Scratchpad Memories. In HIPC 2008: International Conference on High Performance Computing, pages 569--582, 2008. Google ScholarDigital Library
B. Sinharoy and S. W. White. Use of software hint for branch prediction in the absence of hint bit in the branch instruction. http://www.freepatentsonline.com/6971000.html.Google Scholar
A. S. Stephen, S. Felix, V. Krishnan, and Y. Sazeides. Design Tradeoffs for the Alpha EV8 Conditional Branch Predictor. In in 29th Annual International Symposium on Computer Architecture, pages 295--306, 2002. Google ScholarDigital Library
T. A. Wagner, V. Maverick, S. L. Graham, and M. A. Harrison. Accurate static estimators for program optimization. In Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation, PLDI '94, pages 85--96, New York, NY, USA, 1994. ACM. Google ScholarDigital Library
Y. Wu and J. R. Larus. Static branch frequency and program profile analysis. In Proceedings of the 27th annual international symposium on Microarchitecture, pages 1--11, New York, NY, USA, 1994. ACM. Google ScholarDigital Library

Index Terms

Branch penalty reduction on IBM cell SPUs via software branch hinting
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Source code generation

Recommendations

An Improved Pipelined Processor Architecture Eliminating Branch and Jump Penalty
ICCEA '10: Proceedings of the 2010 Second International Conference on Computer Engineering and Applications - Volume 01

Control dependencies are one of the major limitations to increase the performance of pipelined processors. This paper deals with eliminating penalties in pipelined processor. We present our discussion in the light of MIPS pipelined processor ...
Read More
Reducing the penalty of branch and load hazards in pipelined microprocessors
Read More
Reducing Branch Misprediction Penalty via Selective Branch Recovery
HPCA '04: Proceedings of the 10th International Symposium on High Performance Computer Architecture

Branch misprediction penalty consists of two components: the time wasted on mis-speculative execution until the mispredicted branch is resolved and the time to restart the pipeline with useful instructions once the branch is resolved. Current processor ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CODES+ISSS '11: Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
October 2011
402 pages
ISBN:9781450307154
DOI:10.1145/2039370
Program Chairs:
Robert P. Dick
University of Michigan
,
Jan Madsen
Technical University of Denmark
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 October 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
branch hint
cell processor
compiler optimization
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate280of864submissions,32%
Upcoming Conference
ESWEEK '24

Sponsor:

sigbed

sigbed

sigbed

Twentieth Embedded Systems Week

September 29 - October 4, 2024

Raleigh , NC , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 115
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Branch penalty reduction on IBM cell SPUs via software branch hinting

CODES+ISSS '11: Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis

ABSTRACT

References

Cited By

Index Terms

Recommendations

An Improved Pipelined Processor Architecture Eliminating Branch and Jump Penalty

Reducing the penalty of branch and load hazards in pipelined microprocessors

Reducing Branch Misprediction Penalty via Selective Branch Recovery

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Branch penalty reduction on IBM cell SPUs via software branch hinting

CODES+ISSS '11: Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis

ABSTRACT

References

Cited By

Index Terms

Recommendations

An Improved Pipelined Processor Architecture Eliminating Branch and Jump Penalty

Reducing the penalty of branch and load hazards in pipelined microprocessors

Reducing Branch Misprediction Penalty via Selective Branch Recovery

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media