research-article

If-Convert as Early as You Must

Authors:
Dorit Nuzman

Mobileye, Haifa, Israel

Mobileye, Haifa, Israel

0009-0009-4761-5022
View Profile

,
Ayal Zaks

Mobileye, Haifa, Israel

Mobileye, Haifa, Israel

0009-0008-8238-3135
View Profile

,
Ziv Ben-Zion

Mobileye, Haifa, Israel

Mobileye, Haifa, Israel

0009-0002-3441-6496
View Profile

CC 2024: Proceedings of the 33rd ACM SIGPLAN International Conference on Compiler ConstructionFebruary 2024Pages 26–38https://doi.org/10.1145/3640537.3641562

Published:20 February 2024Publication History

CC 2024: Proceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction

Pages 26–38

ABSTRACT

Optimizing compilers employ a rich set of transformations that generate highly efficient code for a variety of source languages and target architectures. These transformations typically operate on general control flow constructs which trigger a range of optimization opportunities, such as moving code to less frequently executed paths, and more. Regular loop nests are specifically relevant for accelerating certain domains, leveraging architectural features including vector instructions, hardware-controlled loops and data flows, provided their internal control-flow is eliminated. Compilers typically apply predicating if-conversion late, in their backend, to remove control-flow undesired by the target. Until then, transformations triggered by control-flow constructs that are destined to be removed may end up doing more harm than good. We present an approach that leverages the existing powerful and general optimization flow of LLVM when compiling for targets without control-flow in loops. Rather than trying to teach various transformations how to avoid misoptimizing for such targets, we propose to introduce an aggressive if-conversion pass as early as possible, along with carefully addressing pass-ordering implications. This solution outperforms the traditional compilation flow with only a modest tuning effort, thereby offering a robust and promising compilation approach for branch-restricted targets.

References

John R. Allen, Ken Kennedy, Carrie Porterfield, and Joe Warren. 1983. Conversion of Control Dependence to Data Dependence. In Proceedings of the 10th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL ’83). Association for Computing Machinery, New York, NY, USA. 177–189. isbn:0897910907 https://doi.org/10.1145/567067.567085 Google ScholarDigital Library
David I. August, Wen Mei W. Hwu, and Scott A. Mahlke. 1999. Partial reverse if-conversion framework for balancing control flow and predication. International Journal of Parallel Programming, 27, 5 (1999), 381–423. issn:0885-7458 https://doi.org/10.1023/A:1018787007582 Google ScholarDigital Library
David I. August, Wen-mei W. Hwu, and Scott A. Mahlke. 1997. A Framework for Balancing Control Flow and Predication. In Proceedings of 30th Annual International Symposium on Microarchitecture (Micro ’97). IEEE Computer Society, USA. 92–103. https://doi.org/10.1109/MICRO.1997.645801 Google ScholarCross Ref
Christopher Barton, Arie Tal, Bob Blainey, and José Nelson Amaral. 2005. Generalized Index-Set Splitting. In Compiler Construction, Rastislav Bodik (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg. 106–120. isbn:978-3-540-31985-6 Google Scholar
Muthu Manikandan Baskaran, Uday Bondhugula, Sriram Krishnamoorthy, J. Ramanujam, Atanas Rountev, and P. Sadayappan. 2008. A Compiler Framework for Optimization of Affine Loop Nests for Gpgpus. In Proceedings of the 22nd Annual International Conference on Supercomputing (ICS ’08). Association for Computing Machinery, New York, NY, USA. 225–234. isbn:9781605581583 https://doi.org/10.1145/1375527.1375562 Google ScholarDigital Library
Yishen Chen, Charith Mendis, and Saman Amarasinghe. 2022. All You Need is Superword-Level Parallelism: Systematic Control-Flow Vectorization with SLP. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI 2022). Association for Computing Machinery, New York, NY, USA. 301–315. isbn:9781450392655 https://doi.org/10.1145/3519939.3523701 Google ScholarDigital Library
Shuhan Ding and Soner Önder. 2010. Unrestricted Code Motion: A Program Representation and Transformation Algorithms Based on Future Values. In Compiler Construction, Rajiv Gupta (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg. 26–45. isbn:978-3-642-11970-5 Google Scholar
Kemal Ebcioğlu. 1987. A Compilation Technique for Software Pipelining of Loops with Conditional Jumps. In Proceedings of the 20th Annual Workshop on Microprogramming (Micro 20). Association for Computing Machinery, New York, NY, USA. 69–79. isbn:0897912500 https://doi.org/10.1145/255305.255317 Google ScholarDigital Library
Alexandre E. Eichenberger, Kathryn O’Brien, Kevin O’Brien, Peng Wu, Tong Chen, Peter H. Oden, Daniel A. Prener, Janice C. Shepherd, Byoungro So, Zehra Sura, Amy Wang, Tao Zhang, Peng Zhao, and Michael Gschwind. 2005. Optimizing Compiler for the CELL Processor. In Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT ’05). IEEE Computer Society, USA. 161–172. isbn:076952429X https://doi.org/10.1109/PACT.2005.33 Google ScholarDigital Library
Alexander Jordan, Nikolai Kim, and Andreas Krall. 2013. IR-Level versus Machine-Level If-Conversion for Predicated Architectures. In Proceedings of the 10th Workshop on Optimizations for DSP and Embedded Systems (ODES ’13). Association for Computing Machinery, New York, NY, USA. 3–10. isbn:9781450319058 https://doi.org/10.1145/2443608.2443611 Google ScholarDigital Library
Hyesoon Kim, Onur Mutlu, Jared Stark, and Yale Patt. 2006. Wish Branches: Enabling Adaptive and Aggressive Predicated Execution. IEEE Micro, 26 (2006), 48–58. https://api.semanticscholar.org/CorpusID:6838785 Google ScholarDigital Library
JinYing Kong, Lin Han, JinLong Xu, and Kai Nie. 2022. Research on control flow conversion technique based on Domestic Sunway compiler. In 2022 7th International Conference on Intelligent Computing and Signal Processing (ICSP). IEEE Computer Society, Xi’an, China. 1340–1344. https://doi.org/10.1109/ICSP54964.2022.9778356 Google ScholarCross Ref
Samuel Larsen and Saman Amarasinghe. 2000. Exploiting Superword Level Parallelism with Multimedia Instruction Sets. In Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation (PLDI ’00). Association for Computing Machinery, New York, NY, USA. 145–156. isbn:1581131992 https://doi.org/10.1145/349299.349320 Google ScholarDigital Library
Tanya M. Lattner. 2005. An Implementation of Swing Modulo Scheduling with Extensions for Superblocks. Master’s thesis. Computer Science Dept., University of Illinois at Urbana-Champaign. Urbana, IL. See http://llvm.cs.uiuc.edu. Google Scholar
LLVM. 2023. Auto-Vectorization in LLVM. https://llvm.org/docs/Vectorizers.html Google Scholar
LLVM. 2023. Vectorization Plan. https://llvm.org/docs/VectorizationPlan.html Google Scholar
Dragan Milicev and Zoran Jovanovic. 2002. Control Flow Regeneration for Software Pipelined Loops with Conditions. International Journal of Parallel Programming, 30 (2002), 06, 149–179. https://doi.org/10.1023/A:1015453520790 Google ScholarDigital Library
Simon Moll. 2020. Vector Predication Roadmap. https://llvm.org/docs/Proposals/VectorPredication.html Google Scholar
Simon Moll and Sebastian Hack. 2018. Partial Control-Flow Linearization. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2018). Association for Computing Machinery, New York, NY, USA. 543–556. isbn:9781450356985 https://doi.org/10.1145/3192366.3192413 Google ScholarDigital Library
Simon Moll, Shrey Sharma, Matthias Kurtenacker, and Sebastian Hack. 2019. Multi-Dimensional Vectorization in LLVM. In Proceedings of the 5th Workshop on Programming Models for SIMD/Vector Processing (WPMVP’19). Association for Computing Machinery, New York, NY, USA. Article 3, 8 pages. isbn:9781450362917 https://doi.org/10.1145/3303117.3306172 Google ScholarDigital Library
Jaime H. Moreno, Victor V. Zyuban, Uzi Shvadron, Fredy D. Neeser, Jeff H. Derby, Malcolm S. Ware, Krishnan Kailas, Ayal Zaks, Amir B. Geva, Shay Ben-David, Sameh W. Asaad, Thomas W. Fox, Daniel Littrell, Marina Biberstein, Dorit Naishlos, and Hillery C. Hunter. 2003. An innovative low-power high-performance programmable signal processor for digital communications. IBM J. Res. Dev., 47, 2-3 (2003), 299–326. https://doi.org/10.1147/RD.472.0299 Google ScholarDigital Library
Todd C. Mowry, Monica S. Lam, and Anoop Gupta. 1992. Design and Evaluation of a Compiler Algorithm for Prefetching. In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS V). Association for Computing Machinery, New York, NY, USA. 62–73. isbn:0897915348 https://doi.org/10.1145/143365.143488 Google ScholarDigital Library
Dorit Nuzman, Ira Rosen, and Ayal Zaks. 2006. Auto-Vectorization of Interleaved Data for SIMD. In Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’06). Association for Computing Machinery, New York, NY, USA. 132–143. isbn:1595933204 https://doi.org/10.1145/1133981.1133997 Google ScholarDigital Library
Vasileios Porpodas and Pushkar Ratnalikar. 2021. PostSLP: Cross-Region Vectorization of Fully or Partially Vectorized Code. In Languages and Compilers for Parallel Computing, Santosh Pande and Vivek Sarkar (Eds.). Springer International Publishing, Cham. 15–31. isbn:978-3-030-72789-5 Google Scholar
Rodrigo C. O. Rocha, Vasileios Porpodas, Pavlos Petoumenos, Luís F. W. Góes, Zheng Wang, Murray Cole, and Hugh Leather. 2020. Vectorization-Aware Loop Unrolling with Seed Forwarding. In Proceedings of the 29th International Conference on Compiler Construction (CC 2020). Association for Computing Machinery, New York, NY, USA. 1–13. isbn:9781450371209 https://doi.org/10.1145/3377555.3377890 Google ScholarDigital Library
Charitha Saumya, Kirshanthan Sundararajah, and Milind Kulkarni. 2022. DARM: Control-Flow Melding for SIMT Thread Divergence Reduction. In 2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). 1–13. https://doi.org/10.1109/CGO53902.2022.9741285 Google ScholarDigital Library
Fabian Schuiki, Florian Zaruba, Torsten Hoefler, and Luca Benini. 2021. Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores. IEEE Trans. Comput., 70, 2 (2021), feb, 212–227. issn:0018-9340 https://doi.org/10.1109/TC.2020.2987314 Google ScholarDigital Library
Jaewook Shin, Mary Hall, and Jacqueline Chame. 2005. Superword-Level Parallelism in the Presence of Control Flow. In Proceedings of the International Symposium on Code Generation and Optimization (CGO ’05). IEEE Computer Society, USA. 165–175. isbn:076952298X https://doi.org/10.1109/CGO.2005.33 Google ScholarDigital Library
James E. Smith. 1982. Decoupled Access/Execute Computer Architectures. In Proceedings of the 9th Annual Symposium on Computer Architecture (ISCA ’82). IEEE Computer Society Press, Washington, DC, USA. 112–119. Google ScholarDigital Library
TI. 2023. C7000 C/C++ Optimization Guide. www.ti.com Google Scholar
Gang-Ryung Uh, Yuhong Wang, Sanjay Jinturkar, Chris Burns, and Vincent Cao. 2000. Techniques for Effectively Exploiting a Zero Overhead Loop Buffer. In Proceedings of the 9th International Conference on Compiler Construction. 157–172. isbn:978-3-540-67263-0 https://doi.org/10.1007/3-540-46423-9_11 Google ScholarCross Ref
Janek van Oirschot. 2022. Hardware Loops in the IPU Backend. https://llvm.org/devmtg/2022-05/slides/ Google Scholar
Nicolas Vasilache, Cédric Bastoul, and Albert Cohen. 2006. Polyhedral Code Generation in the Real World. In Proceedings of the 15th International Conference on Compiler Construction (CC’06). Springer-Verlag, Berlin, Heidelberg. 185–201. isbn:354033050X https://doi.org/10.1007/11688839_16 Google ScholarDigital Library
Miao Wang, Rongcai Zhao, Jianmin Pang, and Guoming Cai. 2008. Reconstructing Control Flow in Modulo Scheduled Loops. In Seventh IEEE/ACIS International Conference on Computer and Information Science (ICIS 2008). IEEE, Portland, OR. 539–544. isbn:978-0-7695-3131-1 https://doi.org/10.1109/ICIS.2008.16 Google ScholarDigital Library
Zhengrong Wang and Tony Nowatzki. 2019. Stream-Based Memory Access Specialization for General Purpose Processors. In Proceedings of the 46th International Symposium on Computer Architecture (ISCA ’19). Association for Computing Machinery, New York, NY, USA. 736–749. isbn:9781450366694 https://doi.org/10.1145/3307650.3322229 Google ScholarDigital Library
Nancy J. Warter, Scott A. Mahlke, Wen-Mei W. Hwu, and B. Ramakrishna Rau. 1993. Reverse If-Conversion. In Proceedings of the ACM SIGPLAN 1993 Conference on Programming Language Design and Implementation (PLDI ’93). Association for Computing Machinery, New York, NY, USA. 290–299. isbn:0897915984 https://doi.org/10.1145/155090.155118 Google ScholarDigital Library
Baofen Yuan, Jianfeng Zhu, Xingchen Man, Zijiao Ma, Shouyi Yin, Shaojun Wei, and Leibo Liu. 2022. Dynamic-II Pipeline: Compiling Loops With Irregular Branches on Static-Scheduling CGRA. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 41, 9 (2022), 2929–2942. https://doi.org/10.1109/TCAD.2021.3121346 Google ScholarCross Ref
Han-saem Yun, Jihong Kim, and Soo-mook Moon. 2001. A First Step Towards Time Optimal Software Pipelining of Loops with Control Flows. In Proceedings of the 10th International Conference on Compiler Construction. Springer-Verlag, Berlin, Heidelberg, Genove, Italy. isbn:978-3-540-41861-0 https://doi.org/10.1007/3-540-45306-7_13 Google ScholarCross Ref
Han-Saem Yun, Jihong Kim, and Soo-Mook Moon. 2002. Optimal Software Pipelining of Loops with Control Flows. In Proceedings of the 16th International Conference on Supercomputing (ICS ’02). Association for Computing Machinery, New York, NY, USA. 117–128. isbn:1581134835 https://doi.org/10.1145/514191.514210 Google ScholarDigital Library
Eric Zimmerman. 2005. Profile-directed If-Conversion in Superscalar Microprocessors. Master’s thesis. Computer Science Dept., University of Illinois at Urbana-Champaign. https://llvm.org/pubs/2005-07-ZimmermanMSThesis.html Google Scholar

Index Terms

If-Convert as Early as You Must

Recommendations

Vectorizing programs with IF-statements for processors with SIMD extensions
Abstract
Vectorization of programs is crucial for achieving high performance on modern processors with SIMD (Single Instruction Multiple Data) extensions. Programs with IF-statements suffer from control flow divergence that seriously complicates automatic ...
Read More
The effects of predicated execution on branch prediction
MICRO 27: Proceedings of the 27th annual international symposium on Microarchitecture

High performance architectures have always had to deal with the performance limiting impact of branch operations. Microprocessor designs are going to have to deal with this problem as well, as they move towards deeper pipelines and support for multiple ...
Read More
Control-Flow Decoupling
MICRO-45: Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture

Mobile and PC/server class processor companies continue to roll out flagship core micro architectures that are faster than their predecessors. Meanwhile placing more cores on a chip coupled with constant supply voltage puts per-core energy consumption ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CC 2024: Proceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction
February 2024
261 pages
ISBN:9798400705076
DOI:10.1145/3640537
General Chair:
Gabriel Rodríguez
Universidade da Coruña, Spain
,
Program Chairs:
P. Sadayappan
University of Utah, USA
,
Aravind Sukumaran-Rajam
Meta, USA
Copyright © 2024 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 February 2024
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
DAE
DSP
Decoupled Access Execute
If-Conversion
Phase-Ordering
Predication
Zero Overhead Loop
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 128
  Total Downloads
- Downloads (Last 12 months)128
- Downloads (Last 6 weeks)42
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

If-Convert as Early as You Must

CC 2024: Proceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction

ABSTRACT

References

Cited By

Index Terms

Recommendations

Vectorizing programs with IF-statements for processors with SIMD extensions

The effects of predicated execution on branch prediction

Control-Flow Decoupling