Skip to main content

Dynamic SIMD Vector Lane Scheduling

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9945))

Included in the following conference series:

Abstract

A classical technique to vectorize code that contains control flow is a control-flow to data-flow conversion. In that approach statements are augmented with masks that denote whether a given vector lane participates in the statement’s execution or idles. If the scheduling of work to vector lanes is performed statically, then some of the vector lanes will run idle in case of control flow divergences or varying work intensities across the loop iterations. With an increasing number of vector lanes, the likelihood of divergences or heavily unbalanced work assignments increases and static scheduling leads to a poor resource utilization. In this paper, we investigate different approaches to dynamic SIMD vector lane scheduling using the Mandelbrot set algorithm as a test case. To overcome the limitations of static scheduling, idle vector lanes are assigned work items dynamically, thereby minimizing per-lane idle cycles. Our evaluation on the Knights Corner and Knights Landing platform shows, that our approaches can lead to considerable performance gains over a static work assignment. By using the AVX-512 vector compress and expand instruction, we are able to further improve the scheduling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Intel Intrinsics Guide: _mm512_mask_expand_epi32. Website. https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm512_mask_expand_epi32&techs=AVX_512&expand=2162,2162

  2. Mandelbrot Algorithms with Dynamic SIMD Vector Lane Scheduling. https://github.com/IXPUG/WG_Vectorization/tree/master/dynamic-simd-scheduling

  3. Outer Loop Vectorization via Intel Cilk Plus Array Notations. https://software.intel.com/en-us/articles/outer-loop-vectorization-via-intel-cilk-plus-array-notations

  4. Cheng, Y., An, H., Chen, Z., Li, F., Wang, Z., Jiang, X., Peng, Y.: Understanding the SIMD efficiency of graph traversal on GPU. In: Sun, X., Qu, W., Stojmenovic, I., Zhou, W., Li, Z., Guo, H., Min, G., Yang, T., Wu, Y., Liu, L. (eds.) ICA3PP 2014. LNCS, vol. 8630, pp. 42–56. Springer, Heidelberg (2014). doi:10.1007/978-3-319-11197-1_4

    Google Scholar 

  5. Fog, A.: VCL C++ vector class library. http://www.agner.org/optimize/vectorclass.pdf

  6. Fung, W.W.L., Sham, I., Yuan, G., Aamodt, T.M.: Dynamic warp formation and scheduling for efficient GPU control flow. In: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 40, pp. 407–420. IEEE Computer Society, Washington, D.C., USA (2007)

    Google Scholar 

  7. Krzikalla, O., Feldhoff, K., Müller-Pfefferkorn, R., Nagel, W.E.: Scout: a source-to-source transformator for SIMD-Optimizations. In: Alexander, M., et al. (eds.) Euro-Par 2011. LNCS, vol. 7156, pp. 137–145. Springer, Heidelberg (2012). doi:10.1007/978-3-642-29740-3_17

    Chapter  Google Scholar 

  8. Krzikalla, O., Feldhoff, K., Müller-Pfefferkorn, R., Nagel, W.: Auto-vectorization techniques for modern SIMD architectures. In: 16th International Workshop on Compilers for Parallel Computing (CPC 2012), Padova, Italy, January 2012

    Google Scholar 

  9. Krzikalla, O., Zitzlsberger, G.: Code vectorization using intel array notation. In: Proceedings of the 3rd Workshop on Programming Models for SIMD/Vector Processing, WPMVP 2016, p. 6 (2011). Observation of strains. Infect Dis Ther. 3(1), 35–43.: 1–6: 8, New York, NY, USA, ACM (2016)

    Google Scholar 

  10. Satish, N., Kim, C., Chhugani, J., Saito, H., Krishnaiyer, R., Smelyanskiy, M., Girkar, M., Dubey, P.: Can traditional programming bridge the ninja performance gap for parallel computing applications? In: Proceedings of the 39th Annual International Symposium on Computer Architecture, ISCA 2012, pp. 440–451. IEEE Computer Society, Washington, D.C., USA (2012)

    Google Scholar 

Download references

Acknowledgements

This work has been funded by SAXonPHI – Intel Parallel Computing Center Dresden at the Center for Information Services and High Performance Computing, TU Dresden, by the Research Center for Many-core HPC (IPCC) at Zuse Institute Berlin, and by the Intel Parallel Computing Center at RWTH Aachen University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Olaf Krzikalla .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Krzikalla, O., Wende, F., Höhnerbach, M. (2016). Dynamic SIMD Vector Lane Scheduling. In: Taufer, M., Mohr, B., Kunkel, J. (eds) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science(), vol 9945. Springer, Cham. https://doi.org/10.1007/978-3-319-46079-6_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46079-6_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46078-9

  • Online ISBN: 978-3-319-46079-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics