Vectorizing programs with IF-statements for processors with SIMD extensions

Sun, Huihui; Gorlatch, Sergei; Zhao, Rongcai

doi:10.1007/s11227-019-03057-4

Vectorizing programs with IF-statements for processors with SIMD extensions

Published: 11 November 2019

Volume 76, pages 4731–4746, (2020)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

734 Accesses
1 Citation
Explore all metrics

Abstract

Vectorization of programs is crucial for achieving high performance on modern processors with SIMD (Single Instruction Multiple Data) extensions. Programs with IF-statements suffer from control flow divergence that seriously complicates automatic vectorization. Therefore, contemporary compilers employ the IF-conversion approach to convert control flow to data flow, which relies on using predicated execution techniques (i.e., masked or select SIMD instructions). In this paper, we enhance the compiler’s capabilities to generate efficiently vectorized code for processors without masked instructions. We improve the state of the art in program vectorization by developing a novel approach—IF-select transformation—which is applicable to arbitrarily nested IF-statements. We implement our approach in the open-source Open64 compiler and evaluate its performance on the SW26010 processor used in the Sunway TaihuLight supercomputer (currently #3 in the TOP500 list) that does not support masked instructions. We extend our vectorization approach by providing an additional LLVM optimization pass to reduce the amount of masked memory accesses on processors without masked instructions, e.g., IBM Power8 and ARMCortex-A8. Experimental results demonstrate the performance advantages of the suggested vectorization techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Refactoring Loops with Nested IFs for SIMD Extensions Without Masked Instructions

Automated Compiler Optimization of Multiple Vector Loads/Stores

Article 09 January 2017

PostSLP: Cross-Region Vectorization of Fully or Partially Vectorized Code

References

Allen JR, Kennedy K, Porterfield C et al (1983) Conversion of control dependence to data dependence. In: Proceedings of the symposium on principles of programming languages (POPL), Austin, Texas, USA, pp 177–189. https://doi.org/10.1145/567067.567085
AMD (2012) Using the x86 Open64 compiler suite. For x86 Open64 version 4.5.2
Barton C, Tal A, Blainey B, Amaral JN (2005) Generalized index-set splitting. In: Bodik R (ed) Compiler construction. Springer, Berlin, pp 106–120
Chapter Google Scholar
Bik AJC, Girkar M, Grey PM, Tian X (2002) Automatic intra-register vectorization for the Intel® architecture. Int J Parallel Program 30(2):65–98. https://doi.org/10.1023/A:1014230429447
Article MATH Google Scholar
Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Lee S, Skadron K (2009) Rodinia: A benchmark suite for heterogeneous computing. In: Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC), Austin, TX, USA, pp 44–54. https://doi.org/10.1109/IISWC.2009.5306797
Cooper K, Torczon L (2011) Engineering a compiler. Elsevier, Amsterdam
MATH Google Scholar
Danalis A, Marin G, McCurdy C, Meredith JS, Roth PC, Spafford K, Tipparaju V, Vetter JS (2010) The scalable heterogeneous computing (shoc) benchmark suite. In: Proceedings of the 3rdWorkshop on General-Purpose Computation on Graphics Processing Units, ACM, pp 63–74. https://doi.org/10.1145/1735688.1735702
Free Software Foundation (2019) Using the GNU Compiler Collection (GCC). https://gcc.gnu.org/onlinedocs/gcc/. Accessed 24 May 2019
Fu H, Liao J, Yang J et al (2016) The Sunway TaihuLight supercomputer: system and applications. Sci China Inf Sci 59:1–16. https://doi.org/10.1007/s11432-016-5588-7
Article Google Scholar
Henning JL (2006) SPEC CPU2006 benchmark descriptions. ACM SIGARCH Comput Archit News 34(4):1–17. https://doi.org/10.1145/1186736.1186737
Article Google Scholar
Intel (2019) Intel 64 and IA-32 Architectures Optimization Reference Manual. Accessed May 2019
Intel (2017) Intel C++ Compiler Developer Guide and Reference. Version 18.0
Karrenberg R, Hack S (2011) Whole-function vectorization. In: Proceedings of the international symposium on code generation and optimization (CGO), Chamonix, France, pp 141–150. https://doi.org/10.1109/CGO.2011.5764682
Larsen S, Amarasinghe SP (2000) Exploiting superword level parallelism with multimedia instruction sets. In: Proceedings of the Conference on Programming Language Design and Implementation (PLDI), Vancouver, BC, Canada, pp 145–156. https://doi.org/10.1145/358438.349320
Article Google Scholar
Lattner C, Adve VS (2004) LLVM: a compilation framework for lifelong program analysis & transformation. In: Proceedings of the international symposium on code generation and optimization (CGO), San Jose, CA, USA, pp 75–88. https://doi.org/10.1109/CGO.2004.1281665
Lokuciejewski P, Gedikli F, Marwedel P (2009) Accelerating WCET-driven optimizations by the invariant path paradigm: a case study of loop unswitching. In: Proceedings of the 12th international workshop on software and compilers for embedded systems, SCOPES ’09. ACM, New York, NY, USA, pp 11–20. http://dl.acm.org/citation.cfm?id=1543820.1543823
Moll S (2019) The Region Vectorizer (RV). https://github.com/cdl-saarland/rv. Accessed May 2019
Moll S, Hack S (2018) Partial control-flow linearization. In: Proceedings of the Conference on Programming Language Design and Implementation (PLDI), New York, NY, USA. https://doi.org/10.1145/3192366.3192413
Pharr M, Mark WR (2012) ispc: a SPMD compiler for high-performance CPU programming. In: Innovative parallel computing (InPar). IEEE, pp 1–13. https://doi.org/10.1109/InPar.2012.6339601
Pohl A, Cosenza B, Juurlink BHH (2018) Control flow vectorization for ARM NEON. In: Proceedings of the 21st international workshop on software and compilers for embedded systems (SCOPES), May 28–30, 2018, Sankt Goar, Germany, pp 66–75. https://doi.org/10.1145/3207719.3207721
Shin J, Hall MW, Chame J (2005) Superword-level parallelism in the presence of control flow. In: Proceedings of the international symposium on code generation and optimization (CGO), San Jose, CA, USA, pp 165–175. https://doi.org/10.1109/cgo.2005.33
Shin J, Hall MW, Chame J (2009) Evaluating compiler technology for control-flow optimizations for multimedia extension architectures. Microprocess Microsyst Embed Hardw Des 33(4):235–243. https://doi.org/10.1016/j.micpro.2009.02.002
Article Google Scholar
Sreraman N, Govindarajan R (2000) A vectorizing compiler for multimedia extensions. Int J Parallel Program 28:363–400. https://doi.org/10.1023/A:1007559022013
Article Google Scholar
Sujon MH, Whaley RC, Yi Q (2013) Vectorization past dependent branches through speculation. In: Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, PACT ’13. IEEE Press, Piscataway, NJ, USA, pp 353–362. http://dl.acm.org/citation.cfm?id=2523721.2523769
Sun H, Fey F, Zhao J, Gorlatch S (2019) WCCV: Improving the vectorization of IF-statements with warp-coherent conditions. In: Proceedings of the 2018 International Conference on Supercomputing, ICS ’19. ACM, New York, NY, USA, pp 319–329. https://doi.org/10.1145/3330345.3331059
Tanaka H, Ota Y, Matsumoto N, Hieda T, Takeuchi Y, Imai M (2010) A new compilation technique for SIMD code generation across basic block boundaries. In: 2010 15th Asia and South Pacific Design Automation Conference (ASP-DAC), pp 101–106. https://doi.org/10.1109/ASPDAC.2010.5419911
Thomas J, Allen F, Cocke J (1971) A catalogue of optimizing transformations. Prentice-Hall, Englewood Cliffs
Google Scholar
TOP500: https://www.top500.org/lists/2018/11/. Accessed 24 May 2019

Download references

Acknowledgements

This research is supported by the Chinese Scholarship Council (CSC) scholarship, and by the German Federal Ministry of Education and Research (BMBF) in the Project HPC²SE. Thanks are due to the National Supercomputing Center in Wuxi/China for providing access to the Sunway TaihuLight Supercomputer.

Author information

Authors and Affiliations

University of Münster, Münster, Germany
Huihui Sun & Sergei Gorlatch
National Digital Switching System Engineering and Technological Research Center, Zhengzhou, China
Rongcai Zhao

Authors

Huihui Sun
View author publications
You can also search for this author in PubMed Google Scholar
Sergei Gorlatch
View author publications
You can also search for this author in PubMed Google Scholar
Rongcai Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huihui Sun.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, H., Gorlatch, S. & Zhao, R. Vectorizing programs with IF-statements for processors with SIMD extensions. J Supercomput 76, 4731–4746 (2020). https://doi.org/10.1007/s11227-019-03057-4

Download citation

Published: 11 November 2019
Issue Date: June 2020
DOI: https://doi.org/10.1007/s11227-019-03057-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Vectorizing programs with IF-statements for processors with SIMD extensions

Abstract

Access this article

Similar content being viewed by others

Refactoring Loops with Nested IFs for SIMD Extensions Without Masked Instructions

Automated Compiler Optimization of Multiple Vector Loads/Stores

PostSLP: Cross-Region Vectorization of Fully or Partially Vectorized Code

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Vectorizing programs with IF-statements for processors with SIMD extensions

Abstract

Access this article

Similar content being viewed by others

Refactoring Loops with Nested IFs for SIMD Extensions Without Masked Instructions

Automated Compiler Optimization of Multiple Vector Loads/Stores

PostSLP: Cross-Region Vectorization of Fully or Partially Vectorized Code

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation