Skip to main content
Log in

Vectorizing programs with IF-statements for processors with SIMD extensions

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Vectorization of programs is crucial for achieving high performance on modern processors with SIMD (Single Instruction Multiple Data) extensions. Programs with IF-statements suffer from control flow divergence that seriously complicates automatic vectorization. Therefore, contemporary compilers employ the IF-conversion approach to convert control flow to data flow, which relies on using predicated execution techniques (i.e., masked or select SIMD instructions). In this paper, we enhance the compiler’s capabilities to generate efficiently vectorized code for processors without masked instructions. We improve the state of the art in program vectorization by developing a novel approach—IF-select transformation—which is applicable to arbitrarily nested IF-statements. We implement our approach in the open-source Open64 compiler and evaluate its performance on the SW26010 processor used in the Sunway TaihuLight supercomputer (currently #3 in the TOP500 list) that does not support masked instructions. We extend our vectorization approach by providing an additional LLVM optimization pass to reduce the amount of masked memory accesses on processors without masked instructions, e.g., IBM Power8 and ARMCortex-A8. Experimental results demonstrate the performance advantages of the suggested vectorization techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Allen JR, Kennedy K, Porterfield C et al (1983) Conversion of control dependence to data dependence. In: Proceedings of the symposium on principles of programming languages (POPL), Austin, Texas, USA, pp 177–189. https://doi.org/10.1145/567067.567085

  2. AMD (2012) Using the x86 Open64 compiler suite. For x86 Open64 version 4.5.2

  3. Barton C, Tal A, Blainey B, Amaral JN (2005) Generalized index-set splitting. In: Bodik R (ed) Compiler construction. Springer, Berlin, pp 106–120

    Chapter  Google Scholar 

  4. Bik AJC, Girkar M, Grey PM, Tian X (2002) Automatic intra-register vectorization for the Intel® architecture. Int J Parallel Program 30(2):65–98. https://doi.org/10.1023/A:1014230429447

    Article  MATH  Google Scholar 

  5. Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Lee S, Skadron K (2009) Rodinia: A benchmark suite for heterogeneous computing. In: Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC), Austin, TX, USA, pp 44–54. https://doi.org/10.1109/IISWC.2009.5306797

  6. Cooper K, Torczon L (2011) Engineering a compiler. Elsevier, Amsterdam

    MATH  Google Scholar 

  7. Danalis A, Marin G, McCurdy C, Meredith JS, Roth PC, Spafford K, Tipparaju V, Vetter JS (2010) The scalable heterogeneous computing (shoc) benchmark suite. In: Proceedings of the 3rdWorkshop on General-Purpose Computation on Graphics Processing Units, ACM, pp 63–74. https://doi.org/10.1145/1735688.1735702

  8. Free Software Foundation (2019) Using the GNU Compiler Collection (GCC). https://gcc.gnu.org/onlinedocs/gcc/. Accessed 24 May 2019

  9. Fu H, Liao J, Yang J et al (2016) The Sunway TaihuLight supercomputer: system and applications. Sci China Inf Sci 59:1–16. https://doi.org/10.1007/s11432-016-5588-7

    Article  Google Scholar 

  10. Henning JL (2006) SPEC CPU2006 benchmark descriptions. ACM SIGARCH Comput Archit News 34(4):1–17. https://doi.org/10.1145/1186736.1186737

    Article  Google Scholar 

  11. Intel (2019) Intel 64 and IA-32 Architectures Optimization Reference Manual. Accessed May 2019

  12. Intel (2017) Intel C++ Compiler Developer Guide and Reference. Version 18.0

  13. Karrenberg R, Hack S (2011) Whole-function vectorization. In: Proceedings of the international symposium on code generation and optimization (CGO), Chamonix, France, pp 141–150. https://doi.org/10.1109/CGO.2011.5764682

  14. Larsen S, Amarasinghe SP (2000) Exploiting superword level parallelism with multimedia instruction sets. In: Proceedings of the Conference on Programming Language Design and Implementation (PLDI), Vancouver, BC, Canada, pp 145–156. https://doi.org/10.1145/358438.349320

    Article  Google Scholar 

  15. Lattner C, Adve VS (2004) LLVM: a compilation framework for lifelong program analysis & transformation. In: Proceedings of the international symposium on code generation and optimization (CGO), San Jose, CA, USA, pp 75–88. https://doi.org/10.1109/CGO.2004.1281665

  16. Lokuciejewski P, Gedikli F, Marwedel P (2009) Accelerating WCET-driven optimizations by the invariant path paradigm: a case study of loop unswitching. In: Proceedings of the 12th international workshop on software and compilers for embedded systems, SCOPES ’09. ACM, New York, NY, USA, pp 11–20. http://dl.acm.org/citation.cfm?id=1543820.1543823

  17. Moll S (2019) The Region Vectorizer (RV). https://github.com/cdl-saarland/rv. Accessed May 2019

  18. Moll S, Hack S (2018) Partial control-flow linearization. In: Proceedings of the Conference on Programming Language Design and Implementation (PLDI), New York, NY, USA. https://doi.org/10.1145/3192366.3192413

  19. Pharr M, Mark WR (2012) ispc: a SPMD compiler for high-performance CPU programming. In: Innovative parallel computing (InPar). IEEE, pp 1–13. https://doi.org/10.1109/InPar.2012.6339601

  20. Pohl A, Cosenza B, Juurlink BHH (2018) Control flow vectorization for ARM NEON. In: Proceedings of the 21st international workshop on software and compilers for embedded systems (SCOPES), May 28–30, 2018, Sankt Goar, Germany, pp 66–75. https://doi.org/10.1145/3207719.3207721

  21. Shin J, Hall MW, Chame J (2005) Superword-level parallelism in the presence of control flow. In: Proceedings of the international symposium on code generation and optimization (CGO), San Jose, CA, USA, pp 165–175. https://doi.org/10.1109/cgo.2005.33

  22. Shin J, Hall MW, Chame J (2009) Evaluating compiler technology for control-flow optimizations for multimedia extension architectures. Microprocess Microsyst Embed Hardw Des 33(4):235–243. https://doi.org/10.1016/j.micpro.2009.02.002

    Article  Google Scholar 

  23. Sreraman N, Govindarajan R (2000) A vectorizing compiler for multimedia extensions. Int J Parallel Program 28:363–400. https://doi.org/10.1023/A:1007559022013

    Article  Google Scholar 

  24. Sujon MH, Whaley RC, Yi Q (2013) Vectorization past dependent branches through speculation. In: Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, PACT ’13. IEEE Press, Piscataway, NJ, USA, pp 353–362. http://dl.acm.org/citation.cfm?id=2523721.2523769

  25. Sun H, Fey F, Zhao J, Gorlatch S (2019) WCCV: Improving the vectorization of IF-statements with warp-coherent conditions. In: Proceedings of the 2018 International Conference on Supercomputing, ICS ’19. ACM, New York, NY, USA, pp 319–329. https://doi.org/10.1145/3330345.3331059

  26. Tanaka H, Ota Y, Matsumoto N, Hieda T, Takeuchi Y, Imai M (2010) A new compilation technique for SIMD code generation across basic block boundaries. In: 2010 15th Asia and South Pacific Design Automation Conference (ASP-DAC), pp 101–106. https://doi.org/10.1109/ASPDAC.2010.5419911

  27. Thomas J, Allen F, Cocke J (1971) A catalogue of optimizing transformations. Prentice-Hall, Englewood Cliffs

    Google Scholar 

  28. TOP500: https://www.top500.org/lists/2018/11/. Accessed 24 May 2019

Download references

Acknowledgements

This research is supported by the Chinese Scholarship Council (CSC) scholarship, and by the German Federal Ministry of Education and Research (BMBF) in the Project HPC2SE. Thanks are due to the National Supercomputing Center in Wuxi/China for providing access to the Sunway TaihuLight Supercomputer.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huihui Sun.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, H., Gorlatch, S. & Zhao, R. Vectorizing programs with IF-statements for processors with SIMD extensions. J Supercomput 76, 4731–4746 (2020). https://doi.org/10.1007/s11227-019-03057-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-019-03057-4

Keywords

Navigation