Dynamic SIMD Vector Lane Scheduling

Krzikalla, Olaf; Wende, Florian; Höhnerbach, Markus

doi:10.1007/978-3-319-46079-6_25

Olaf Krzikalla¹⁶,
Florian Wende¹⁷ &
Markus Höhnerbach¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9945))

Included in the following conference series:

International Conference on High Performance Computing

2573 Accesses
6 Citations

Abstract

A classical technique to vectorize code that contains control flow is a control-flow to data-flow conversion. In that approach statements are augmented with masks that denote whether a given vector lane participates in the statement’s execution or idles. If the scheduling of work to vector lanes is performed statically, then some of the vector lanes will run idle in case of control flow divergences or varying work intensities across the loop iterations. With an increasing number of vector lanes, the likelihood of divergences or heavily unbalanced work assignments increases and static scheduling leads to a poor resource utilization. In this paper, we investigate different approaches to dynamic SIMD vector lane scheduling using the Mandelbrot set algorithm as a test case. To overcome the limitations of static scheduling, idle vector lanes are assigned work items dynamically, thereby minimizing per-lane idle cycles. Our evaluation on the Knights Corner and Knights Landing platform shows, that our approaches can lead to considerable performance gains over a static work assignment. By using the AVX-512 vector compress and expand instruction, we are able to further improve the scheduling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Intel Intrinsics Guide: _mm512_mask_expand_epi32. Website. https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm512_mask_expand_epi32&techs=AVX_512&expand=2162,2162
Mandelbrot Algorithms with Dynamic SIMD Vector Lane Scheduling. https://github.com/IXPUG/WG_Vectorization/tree/master/dynamic-simd-scheduling
Outer Loop Vectorization via Intel Cilk Plus Array Notations. https://software.intel.com/en-us/articles/outer-loop-vectorization-via-intel-cilk-plus-array-notations
Cheng, Y., An, H., Chen, Z., Li, F., Wang, Z., Jiang, X., Peng, Y.: Understanding the SIMD efficiency of graph traversal on GPU. In: Sun, X., Qu, W., Stojmenovic, I., Zhou, W., Li, Z., Guo, H., Min, G., Yang, T., Wu, Y., Liu, L. (eds.) ICA3PP 2014. LNCS, vol. 8630, pp. 42–56. Springer, Heidelberg (2014). doi:10.1007/978-3-319-11197-1_4
Google Scholar
Fog, A.: VCL C++ vector class library. http://www.agner.org/optimize/vectorclass.pdf
Fung, W.W.L., Sham, I., Yuan, G., Aamodt, T.M.: Dynamic warp formation and scheduling for efficient GPU control flow. In: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 40, pp. 407–420. IEEE Computer Society, Washington, D.C., USA (2007)
Google Scholar
Krzikalla, O., Feldhoff, K., Müller-Pfefferkorn, R., Nagel, W.E.: Scout: a source-to-source transformator for SIMD-Optimizations. In: Alexander, M., et al. (eds.) Euro-Par 2011. LNCS, vol. 7156, pp. 137–145. Springer, Heidelberg (2012). doi:10.1007/978-3-642-29740-3_17
Chapter Google Scholar
Krzikalla, O., Feldhoff, K., Müller-Pfefferkorn, R., Nagel, W.: Auto-vectorization techniques for modern SIMD architectures. In: 16th International Workshop on Compilers for Parallel Computing (CPC 2012), Padova, Italy, January 2012
Google Scholar
Krzikalla, O., Zitzlsberger, G.: Code vectorization using intel array notation. In: Proceedings of the 3rd Workshop on Programming Models for SIMD/Vector Processing, WPMVP 2016, p. 6 (2011). Observation of strains. Infect Dis Ther. 3(1), 35–43.: 1–6: 8, New York, NY, USA, ACM (2016)
Google Scholar
Satish, N., Kim, C., Chhugani, J., Saito, H., Krishnaiyer, R., Smelyanskiy, M., Girkar, M., Dubey, P.: Can traditional programming bridge the ninja performance gap for parallel computing applications? In: Proceedings of the 39th Annual International Symposium on Computer Architecture, ISCA 2012, pp. 440–451. IEEE Computer Society, Washington, D.C., USA (2012)
Google Scholar

Download references

Acknowledgements

This work has been funded by SAXonPHI – Intel Parallel Computing Center Dresden at the Center for Information Services and High Performance Computing, TU Dresden, by the Research Center for Many-core HPC (IPCC) at Zuse Institute Berlin, and by the Intel Parallel Computing Center at RWTH Aachen University.

Author information

Authors and Affiliations

Technische Universität, Dresden, Germany
Olaf Krzikalla
Zuse Institute, Berlin, Germany
Florian Wende
RWTH University, Aachen, Germany
Markus Höhnerbach

Authors

Olaf Krzikalla
View author publications
You can also search for this author in PubMed Google Scholar
Florian Wende
View author publications
You can also search for this author in PubMed Google Scholar
Markus Höhnerbach
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Olaf Krzikalla .

Editor information

Editors and Affiliations

University of Delaware, Newark, Delaware, USA
Michela Taufer
Forschungszentrum Jülich, Jülich, Germany
Bernd Mohr
DKRZ, Hamburg, Germany
Julian M. Kunkel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Krzikalla, O., Wende, F., Höhnerbach, M. (2016). Dynamic SIMD Vector Lane Scheduling. In: Taufer, M., Mohr, B., Kunkel, J. (eds) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science(), vol 9945. Springer, Cham. https://doi.org/10.1007/978-3-319-46079-6_25

Download citation

DOI: https://doi.org/10.1007/978-3-319-46079-6_25
Published: 06 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46078-9
Online ISBN: 978-3-319-46079-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics