Abstract
Nowadays, almost all the processors are integrated with SIMD extensions, with which significant speedup is obtained for the programs in multimedia and scientific computation. The length of SIMD vector register has been increasing all the time. For instance, the original length of SIMD extension components is 64-bit in MMX. It then rises to 128-bit in SSE and further 256-bit in AVX. The new Intel Many Integrated Core (MIC) architecture supports 512-bits SIMD. Though a higher speedup is theoretically possible as the vector length increases, more complex and efficient instructions are required to support the vectorization. We analyze the vectorization performance of NPB and SPEC CPU2006 with the increase of vector length and different SIMD instruction sets of SSE, AVX, and IMCI, based on which some advice are given for the vector length and instruction set design.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ramachandran, A., Vienne, J., Van Der Wijngaart, R.: Performance evaluation of NAS parallel benchmarks on Intel Xeon Phi. In: 42nd International Conference on Parallel Processing (2013)
Pennycook, S., Hughes, C., Smelyanskiy, M., Jarvis, S.: Exploring simd for molecular dynamics, using intel xeon processors and intel xeon phi coprocessors. In: IPDPS (2013)
Liu, X., Smelyanskiy, M., Chow, E., Dubey, P.: Efficient sparse matrix-vector multiplication on x86-based many-core processors. In: Proceedings of the 27th International ACM Conference on Supercomputing, pp. 273–282. ACM (2013)
Huo, X., Ren, B., Agrawal, G.: A Programming system for Xeon Phis with runtime SIMD parallelization. In: ICS (2014)
Mytkowicz, T., Marron, M.: Single-Core Performance is Still Relevant in the Multi-Core Era
Park, Y., Park, J.J.K., Park, H.: Tailoring SIMD execution using heterogeneous hardware and dynamic configurability. In: Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (2012)
Nuzman, D., Zaks, A.: Outer-loop vectorization-revisited for short SIMD architectures. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT) (2008)
Trifunovic, K., Nuzman, D., Cohen, A., et al.: Polyhedral-model guided loop-nest auto-vectorization. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT) (2009)
Kong, M., Veras, R., Stock, K.: When polyhedral transformations meet SIMD code generation. In: Proceedings of the Conference on Programming Language Design and Implementation (PLDI) (2013)
Larsen, S., Amarasinghe, S.: Exploiting superword level parallelism with multimedia instruction sets. In: Proceedings of the Conference on Programming Language Design and Implementation (PLDI), pp. 145–156 (2000)
Liu, J., Zhang, Y., Kandemir, M.: A compiler framework for extracting superword level parallelism. In: Proceedings of the Conference on Programming Language Design and Implementation (PLDI) (2012)
Barik, R., Zhao, J., Sarkar, V.: Efficient selection of vector instructions using dynamic programming. In: Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (2010)
Rosen, I., Nuzman, D., Zaks, A.: Loop-aware SLP in GCC. In: Proceedings of GCC Developers’ Summit, pp. 131–142 (2007)
Kumar, R., MartÃnez, A.: Speculative dynamic vectorization for HW/SW codesigned processors. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT) (2012)
Karrenberg, R., Hack, S.: Whole-function vectorization. In: Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO) (2011)
Eichenberger, A.E., Peng, W., O’Brien, K.: Vectorization for SIMD architectures with alignment constraints. SIGPLAN 39(6), 82–93 (2004)
Kudriavtsev, A., Kogge, P.: Generation of permutations for SIMD processors. In: LCTES 2005, pp. 147–156. ACM, New York (2005)
Nuzman, D., Rosen, I., Zaks, A.: Auto-vectorization of interleaved data for simd. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2006, pp. 132–143. ACM, New York (2006)
Shin, J., Hall, M., Chame, J.: Superword-level parallelism in the presence of control flow. In: CGO (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Zhao, B., Gao, W., Zhao, R., Han, L., Sun, H., Li, Y. (2015). Performance Evaluation of NPB and SPEC CPU2006 on Various SIMD Extensions. In: Wang, Y., Xiong, H., Argamon, S., Li, X., Li, J. (eds) Big Data Computing and Communications. BigCom 2015. Lecture Notes in Computer Science(), vol 9196. Springer, Cham. https://doi.org/10.1007/978-3-319-22047-5_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-22047-5_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22046-8
Online ISBN: 978-3-319-22047-5
eBook Packages: Computer ScienceComputer Science (R0)