Skip to main content

Optimizations of the Whole Function Vectorization Based on SIMD Characteristics

  • Conference paper
  • First Online:
Book cover Parallel Architecture, Algorithm and Programming (PAAP 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 729))

  • 1341 Accesses

Abstract

Vectorization for SIMD extensions is similar to programming for CUDA/OpenCL on GPU platforms. They are both Single Program Multiple Data (SPMD) programming models. However, SIMD extensions and GPU accelerators are different from each other in many aspects, such as memory access, divergence, etc. There are still optimization opportunities when using existing methods to implement vectorization for SIMD extensions. As a result, we propose a whole function vectorization optimization algorithm based on SIMD characteristics in this paper. First, we analyze some SIMD characteristics that may affect the whole function vectorization. These characteristics include instance versioning, instance regrouping and SIMD code optimization. We then implement a SIMD characteristics-based algorithm for whole function vectorization. In addition, we introduce a directive based method to help us fully exploit opportunities of this kind of vectorization. We choose nine benchmarks from multi-media and image processing applications to evaluate our technique. Compared with un-optimized codes, the speedup is 1.59 times faster in average on processor E5-2600 when the proposed technique is applied.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Gao, W., Zhao, R.C., Han, L., Pang, J., Rui, D.: Research on SIMD auto-vectorization compiling optimization. J. Softw. 26(6), 1265–1284 (2015)

    MathSciNet  Google Scholar 

  2. Huo, X., Ren, B., Agrawal, G.: A programming system for xeon phis with runtime SIMD parallelization. In: Proceedings of the 28th ACM International Conference on Supercomputing (ICS), pp. 283–292 (2014)

    Google Scholar 

  3. Chen, L., Jiang, P., Agrawal, G.: Exploiting recent SIMD architectural advances for irregular applications. In: Proceedings of the 14th International Symposium on Code Generation and Optimization (CGO) (2016)

    Google Scholar 

  4. Lei, R.A., Sierra, H.I.: A SIMD extension for C++. In: Proceedings of the 19th PPOPP Workshop on Programming models for SIMD/Vector processing (WPMVP), pp. 17–24 (2014)

    Google Scholar 

  5. Evans, G.C., Abraham, S., Kuhn, B.: Vector seeker: a tool for finding vector potential. In: Proceedings of the 19th PPOPP Workshop on Programming models for SIMD/Vector processing (WPMVP), pp. 41–48 (2014)

    Google Scholar 

  6. Wang, Y., Wang, D., Chen, S.: Iteration interleaving–based SIMD lane partition. ACM Trans. Architect. Code Optim. (TACO) 12(4) (2016)

    Google Scholar 

  7. Karrenberg, R., Hack, S.: Whole-Function vectorization. In: Proceedings of the 9th ACM International Symposium on Code Generation and Optimization (CGO), pp. 141–150 (2011)

    Google Scholar 

  8. Nuzman, D., Zaks, A.: Outer-loop vectorization-revisited for short SIMD architectures. In: Proceedings of the 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT) (2008)

    Google Scholar 

  9. Trifunovic, K., Nuzman, D., Cohen, A., et al.: Polyhedral-model guided loop-nest auto-vectorization. In: Proceedings of the 2009 International Conference on Parallel Architectures and Compilation Techniques (PACT) (2009)

    Google Scholar 

  10. Kong, M., Veras, R., Stock, K.: When polyhedral transformations meet SIMD code generation. In: Proceedings of the 2013 Conference on Programming Language Design and Implementation (PLDI) (2013)

    Google Scholar 

  11. Larsen, S., Amarasinghe, S.: Exploiting superword level parallelism with multimedia instruction sets. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 145–156 (2000)

    Google Scholar 

  12. Barik, R., Zhao, J., Sarkar, V.: Efficient selection of vector instructions using dynamic programming. In: Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (2010)

    Google Scholar 

  13. Liu, J., Zhang, Y., Kandemir, M.: A compiler framework for extracting superword level parallelism. In: Proceedings of the 2012 Conference on Programming Language Design and Implementation (PLDI) (2012)

    Google Scholar 

  14. Haque, M., Yi, Q.: Past dependent branches through speculation. In: Proceedings of the 22nd International Conference on Parallel Architecture and Compilation Techniques (PACT). IEEE Computer Society, Washington DC (2013)

    Google Scholar 

  15. Porpodas, V., Magni, A., Timothy, M.: PSLP: Padded SLP automatic vectorization. In: Proceedings of the 2015 Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO) (2015)

    Google Scholar 

  16. Porpodas, V., Jones, T.M.: Throttling automatic vectorization: when less is more. In: Proceedings of the 24th IEEE Computer Society International Conference on Parallel Architectures and Compilation Techniques (PACT) (2015)

    Google Scholar 

  17. Zhou, H., Xue, J.: Exploiting mixed SIMD parallelism by reducing data reorganization overhead. In: Proceedings of the 14th ACM International Symposium on Code Generation and Optimization (CGO) (2016)

    Google Scholar 

  18. Morad, A., Yavits, L., Kvatinsky, S.: Resistive GP-SIMD processing-in-memory. ACM Trans. Architect. Code Optim. (TACO) 12(4) (2016)

    Google Scholar 

  19. Chang, H., Sung, W.: Efficient vectorization of SIMD programs with non-aligned and irregular data access hardware. In: Proceedings of the 2008 International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES), pp. 167–176 (2008)

    Google Scholar 

  20. Eichenberger, A.E., Wu, P., O’Brien, K.: Vectorization for SIMD architectures with alignment constraints. In: Proceeding of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation (PLDI), pp. 82–93. ACM Press, New York (2004)

    Google Scholar 

  21. Ren, G., Wu, P., Padua, D.A.: Optimizing data permutations for SIMD devices. In: Proceedings of the 2006 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pp. 118–131 (2006)

    Google Scholar 

  22. Nuzman, D., Rosen, I.A.: Auto-vectorization of interleaved data for SIMD. In: Proceedings of the ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation (PLDI), pp. 132–143 (2006)

    Google Scholar 

  23. Sharma, N., Panda, P.R., Catthoor, F.: Array interleaving—an energy-efficient data layout transformation. ACM Trans. Design Autom. Electron. Syst. 20(3), 1–26 (2015)

    Article  Google Scholar 

  24. Asher, Y.B., Rotem, N.: Hybrid type legalization for a sparse SIMD instruction set. ACM Trans. Architect. Code Optim. (TACO) 10(3), 520–532 (2013)

    Google Scholar 

  25. Jie, S.J., Kapre, N.: Comparing soft and hard vector processing in FPGA-based embedded systems. In: Proceedings of the 24th Field Programmable Logic and Applications (FPL) (2014)

    Google Scholar 

  26. Pharr, M., Mark, W.R.: ispc: A SPMD compiler for high-performance CPU programming. In Innovative Parallel Computing, pp. 65–74 (2012)

    Google Scholar 

  27. Kerr, A., Diamos, G., Yalamanchili, S.: Dynamic compilation of data-parallel kernels for vector processors. In: Proceedings of the 10th International Symposium on Code Generation and Optimization (CGO), pp. 23–32 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yingying Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd

About this paper

Cite this paper

Li, Y., Gao, Y., Wang, D., Li, Y., Xu, J. (2017). Optimizations of the Whole Function Vectorization Based on SIMD Characteristics. In: Chen, G., Shen, H., Chen, M. (eds) Parallel Architecture, Algorithm and Programming. PAAP 2017. Communications in Computer and Information Science, vol 729. Springer, Singapore. https://doi.org/10.1007/978-981-10-6442-5_14

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-6442-5_14

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-6441-8

  • Online ISBN: 978-981-10-6442-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics