Optimizations of the Whole Function Vectorization Based on SIMD Characteristics

Li, Yingying; Gao, Yuchen; Wang, Dong; Li, Yanbing; Xu, Jinlong

doi:10.1007/978-981-10-6442-5_14

Yingying Li¹²,
Yuchen Gao¹²,
Dong Wang¹²,
Yanbing Li¹² &
…
Jinlong Xu¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 729))

Included in the following conference series:

International Symposium on Parallel Architecture, Algorithm and Programming

1341 Accesses

Abstract

Vectorization for SIMD extensions is similar to programming for CUDA/OpenCL on GPU platforms. They are both Single Program Multiple Data (SPMD) programming models. However, SIMD extensions and GPU accelerators are different from each other in many aspects, such as memory access, divergence, etc. There are still optimization opportunities when using existing methods to implement vectorization for SIMD extensions. As a result, we propose a whole function vectorization optimization algorithm based on SIMD characteristics in this paper. First, we analyze some SIMD characteristics that may affect the whole function vectorization. These characteristics include instance versioning, instance regrouping and SIMD code optimization. We then implement a SIMD characteristics-based algorithm for whole function vectorization. In addition, we introduce a directive based method to help us fully exploit opportunities of this kind of vectorization. We choose nine benchmarks from multi-media and image processing applications to evaluate our technique. Compared with un-optimized codes, the speedup is 1.59 times faster in average on processor E5-2600 when the proposed technique is applied.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Gao, W., Zhao, R.C., Han, L., Pang, J., Rui, D.: Research on SIMD auto-vectorization compiling optimization. J. Softw. 26(6), 1265–1284 (2015)
MathSciNet Google Scholar
Huo, X., Ren, B., Agrawal, G.: A programming system for xeon phis with runtime SIMD parallelization. In: Proceedings of the 28th ACM International Conference on Supercomputing (ICS), pp. 283–292 (2014)
Google Scholar
Chen, L., Jiang, P., Agrawal, G.: Exploiting recent SIMD architectural advances for irregular applications. In: Proceedings of the 14th International Symposium on Code Generation and Optimization (CGO) (2016)
Google Scholar
Lei, R.A., Sierra, H.I.: A SIMD extension for C++. In: Proceedings of the 19th PPOPP Workshop on Programming models for SIMD/Vector processing (WPMVP), pp. 17–24 (2014)
Google Scholar
Evans, G.C., Abraham, S., Kuhn, B.: Vector seeker: a tool for finding vector potential. In: Proceedings of the 19th PPOPP Workshop on Programming models for SIMD/Vector processing (WPMVP), pp. 41–48 (2014)
Google Scholar
Wang, Y., Wang, D., Chen, S.: Iteration interleaving–based SIMD lane partition. ACM Trans. Architect. Code Optim. (TACO) 12(4) (2016)
Google Scholar
Karrenberg, R., Hack, S.: Whole-Function vectorization. In: Proceedings of the 9th ACM International Symposium on Code Generation and Optimization (CGO), pp. 141–150 (2011)
Google Scholar
Nuzman, D., Zaks, A.: Outer-loop vectorization-revisited for short SIMD architectures. In: Proceedings of the 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT) (2008)
Google Scholar
Trifunovic, K., Nuzman, D., Cohen, A., et al.: Polyhedral-model guided loop-nest auto-vectorization. In: Proceedings of the 2009 International Conference on Parallel Architectures and Compilation Techniques (PACT) (2009)
Google Scholar
Kong, M., Veras, R., Stock, K.: When polyhedral transformations meet SIMD code generation. In: Proceedings of the 2013 Conference on Programming Language Design and Implementation (PLDI) (2013)
Google Scholar
Larsen, S., Amarasinghe, S.: Exploiting superword level parallelism with multimedia instruction sets. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 145–156 (2000)
Google Scholar
Barik, R., Zhao, J., Sarkar, V.: Efficient selection of vector instructions using dynamic programming. In: Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (2010)
Google Scholar
Liu, J., Zhang, Y., Kandemir, M.: A compiler framework for extracting superword level parallelism. In: Proceedings of the 2012 Conference on Programming Language Design and Implementation (PLDI) (2012)
Google Scholar
Haque, M., Yi, Q.: Past dependent branches through speculation. In: Proceedings of the 22nd International Conference on Parallel Architecture and Compilation Techniques (PACT). IEEE Computer Society, Washington DC (2013)
Google Scholar
Porpodas, V., Magni, A., Timothy, M.: PSLP: Padded SLP automatic vectorization. In: Proceedings of the 2015 Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO) (2015)
Google Scholar
Porpodas, V., Jones, T.M.: Throttling automatic vectorization: when less is more. In: Proceedings of the 24th IEEE Computer Society International Conference on Parallel Architectures and Compilation Techniques (PACT) (2015)
Google Scholar
Zhou, H., Xue, J.: Exploiting mixed SIMD parallelism by reducing data reorganization overhead. In: Proceedings of the 14th ACM International Symposium on Code Generation and Optimization (CGO) (2016)
Google Scholar
Morad, A., Yavits, L., Kvatinsky, S.: Resistive GP-SIMD processing-in-memory. ACM Trans. Architect. Code Optim. (TACO) 12(4) (2016)
Google Scholar
Chang, H., Sung, W.: Efficient vectorization of SIMD programs with non-aligned and irregular data access hardware. In: Proceedings of the 2008 International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES), pp. 167–176 (2008)
Google Scholar
Eichenberger, A.E., Wu, P., O’Brien, K.: Vectorization for SIMD architectures with alignment constraints. In: Proceeding of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation (PLDI), pp. 82–93. ACM Press, New York (2004)
Google Scholar
Ren, G., Wu, P., Padua, D.A.: Optimizing data permutations for SIMD devices. In: Proceedings of the 2006 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pp. 118–131 (2006)
Google Scholar
Nuzman, D., Rosen, I.A.: Auto-vectorization of interleaved data for SIMD. In: Proceedings of the ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation (PLDI), pp. 132–143 (2006)
Google Scholar
Sharma, N., Panda, P.R., Catthoor, F.: Array interleaving—an energy-efficient data layout transformation. ACM Trans. Design Autom. Electron. Syst. 20(3), 1–26 (2015)
Article Google Scholar
Asher, Y.B., Rotem, N.: Hybrid type legalization for a sparse SIMD instruction set. ACM Trans. Architect. Code Optim. (TACO) 10(3), 520–532 (2013)
Google Scholar
Jie, S.J., Kapre, N.: Comparing soft and hard vector processing in FPGA-based embedded systems. In: Proceedings of the 24th Field Programmable Logic and Applications (FPL) (2014)
Google Scholar
Pharr, M., Mark, W.R.: ispc: A SPMD compiler for high-performance CPU programming. In Innovative Parallel Computing, pp. 65–74 (2012)
Google Scholar
Kerr, A., Diamos, G., Yalamanchili, S.: Dynamic compilation of data-parallel kernels for vector processors. In: Proceedings of the 10th International Symposium on Code Generation and Optimization (CGO), pp. 23–32 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou, 450001, China
Yingying Li, Yuchen Gao, Dong Wang, Yanbing Li & Jinlong Xu

Authors

Yingying Li
View author publications
You can also search for this author in PubMed Google Scholar
Yuchen Gao
View author publications
You can also search for this author in PubMed Google Scholar
Dong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yanbing Li
View author publications
You can also search for this author in PubMed Google Scholar
Jinlong Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yingying Li .

Editor information

Editors and Affiliations

Nanjing University of Posts and Telecommunications, Nanjing, Jiangsu, China
Guoliang Chen
Sun Yat-sen University, Guangzhou, Guangdong, China
Hong Shen
Hainan University, Haikou, Hainan, China
Mingrui Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Y., Gao, Y., Wang, D., Li, Y., Xu, J. (2017). Optimizations of the Whole Function Vectorization Based on SIMD Characteristics. In: Chen, G., Shen, H., Chen, M. (eds) Parallel Architecture, Algorithm and Programming. PAAP 2017. Communications in Computer and Information Science, vol 729. Springer, Singapore. https://doi.org/10.1007/978-981-10-6442-5_14

Download citation

DOI: https://doi.org/10.1007/978-981-10-6442-5_14
Published: 06 October 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6441-8
Online ISBN: 978-981-10-6442-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics