Abstract
This paper evaluates auto-vectorizing capabilities of modern optimizing compilers Intel C/C++, GCC C/C++, LLVM/Clang and PGI C/C++ on Intel 64 and Intel Xeon Phi architectures. We use the Extended Test Suite for Vectorizing Compilers consisting of 151 loops. In this work, we estimate speedup by running the loops in scalar and vector modes for different data types and determine loop classes which the compilers used in the study fail to vectorize. We use the dual CPU system (NUMA, 2 x Intel Xeon E5-2620v4, Intel Broadwell microarchitecture) with the Intel Xeon Phi 3120A co-processor for our experiments.
This work is supported by Russian Foundation for Basic Research (projects 15-07-00048, 16-07-00712).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Maleki, S., Gao, Y., Garzaran, M.J., Wong, T., Padua, D.A.: An evaluation of vectorizing compilers. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, pp. 372–382 (2011)
Extended Test Suite for Vectorizing Compilers. http://polaris.cs.uiuc.edu/~maleki1/TSVC.tar.gz
Callahan, D., Dongarra, J., Levine, D.: Vectorizing compilers: a test suite and results. In: Proceedings of the ACM/IEEE Conference on Supercomputing, pp. 98–105 (1988)
Levine, D., Callahan, D., Dongarra, J.: A comparative study of automatic vectorizing compilers. J. Parallel Comput. 17, 1223–1244 (1991)
Konsor, P.: Avoiding AVX-SSE transition penalties. https://software.intel.com/en-us/articles/avoiding-avx-sse-transition-penalties
Jibaja, I., Jensen, P., Hu, N., Haghighat, M., McCutchan, J., Gohman, D., Blackburn, S., McKinley, K.: Vector parallelism in JavaScript: language and compiler support for SIMD. In: Proceedings of the International Conference on Parallel Architecture and Compilation, Techniques, pp. 407–418 (2015)
Program Vectorization: Theory, Methods, Implementation (1991)
Metzger, R.C., Wen, Z.: Automatic Algorithm Recognition and Replacement: A New Approach to Program Optimization. MIT Press, Cambridge (2000)
Rohou, E., Williams, K., Yuste, D.: Vectorization technology to improve interpreter performance. ACM Trans. Archit. Code Optim. 9(4), 26: 1–26: 22 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Moldovanova, O.V., Kurnosov, M.G. (2017). Auto-Vectorization of Loops on Intel 64 and Intel Xeon Phi: Analysis and Evaluation. In: Malyshkin, V. (eds) Parallel Computing Technologies. PaCT 2017. Lecture Notes in Computer Science(), vol 10421. Springer, Cham. https://doi.org/10.1007/978-3-319-62932-2_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-62932-2_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62931-5
Online ISBN: 978-3-319-62932-2
eBook Packages: Computer ScienceComputer Science (R0)