Performance Evaluation of NPB and SPEC CPU2006 on Various SIMD Extensions

Zhao, Bo; Gao, Wei; Zhao, Rongcai; Han, Lin; Sun, Huihui; Li, Yingying

doi:10.1007/978-3-319-22047-5_21

Bo Zhao¹⁸,
Wei Gao¹⁸,
Rongcai Zhao¹⁸,
Lin Han¹⁸,
Huihui Sun¹⁸ &
…
Yingying Li¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9196))

Included in the following conference series:

International Conference on Big Data Computing and Communications

2259 Accesses

Abstract

Nowadays, almost all the processors are integrated with SIMD extensions, with which significant speedup is obtained for the programs in multimedia and scientific computation. The length of SIMD vector register has been increasing all the time. For instance, the original length of SIMD extension components is 64-bit in MMX. It then rises to 128-bit in SSE and further 256-bit in AVX. The new Intel Many Integrated Core (MIC) architecture supports 512-bits SIMD. Though a higher speedup is theoretically possible as the vector length increases, more complex and efficient instructions are required to support the vectorization. We analyze the vectorization performance of NPB and SPEC CPU2006 with the increase of vector length and different SIMD instruction sets of SSE, AVX, and IMCI, based on which some advice are given for the vector length and instruction set design.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Extending OpenMP SIMD Support for Target Specific Code and Application to ARM SVE

Performance and power consumption analysis of Arm Scalable Vector Extension

Article 10 November 2020

ImSPU: Implicit Sharing of Computation Resources Between Vector and Scalar Processing Units

References

Ramachandran, A., Vienne, J., Van Der Wijngaart, R.: Performance evaluation of NAS parallel benchmarks on Intel Xeon Phi. In: 42nd International Conference on Parallel Processing (2013)
Google Scholar
Pennycook, S., Hughes, C., Smelyanskiy, M., Jarvis, S.: Exploring simd for molecular dynamics, using intel xeon processors and intel xeon phi coprocessors. In: IPDPS (2013)
Google Scholar
Liu, X., Smelyanskiy, M., Chow, E., Dubey, P.: Efficient sparse matrix-vector multiplication on x86-based many-core processors. In: Proceedings of the 27th International ACM Conference on Supercomputing, pp. 273–282. ACM (2013)
Google Scholar
Huo, X., Ren, B., Agrawal, G.: A Programming system for Xeon Phis with runtime SIMD parallelization. In: ICS (2014)
Google Scholar
Mytkowicz, T., Marron, M.: Single-Core Performance is Still Relevant in the Multi-Core Era
Google Scholar
Park, Y., Park, J.J.K., Park, H.: Tailoring SIMD execution using heterogeneous hardware and dynamic configurability. In: Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (2012)
Google Scholar
Nuzman, D., Zaks, A.: Outer-loop vectorization-revisited for short SIMD architectures. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT) (2008)
Google Scholar
Trifunovic, K., Nuzman, D., Cohen, A., et al.: Polyhedral-model guided loop-nest auto-vectorization. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT) (2009)
Google Scholar
Kong, M., Veras, R., Stock, K.: When polyhedral transformations meet SIMD code generation. In: Proceedings of the Conference on Programming Language Design and Implementation (PLDI) (2013)
Google Scholar
Larsen, S., Amarasinghe, S.: Exploiting superword level parallelism with multimedia instruction sets. In: Proceedings of the Conference on Programming Language Design and Implementation (PLDI), pp. 145–156 (2000)
Google Scholar
Liu, J., Zhang, Y., Kandemir, M.: A compiler framework for extracting superword level parallelism. In: Proceedings of the Conference on Programming Language Design and Implementation (PLDI) (2012)
Google Scholar
Barik, R., Zhao, J., Sarkar, V.: Efficient selection of vector instructions using dynamic programming. In: Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (2010)
Google Scholar
Rosen, I., Nuzman, D., Zaks, A.: Loop-aware SLP in GCC. In: Proceedings of GCC Developers’ Summit, pp. 131–142 (2007)
Google Scholar
Kumar, R., Martínez, A.: Speculative dynamic vectorization for HW/SW codesigned processors. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT) (2012)
Google Scholar
Karrenberg, R., Hack, S.: Whole-function vectorization. In: Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO) (2011)
Google Scholar
Eichenberger, A.E., Peng, W., O’Brien, K.: Vectorization for SIMD architectures with alignment constraints. SIGPLAN 39(6), 82–93 (2004)
Article Google Scholar
Kudriavtsev, A., Kogge, P.: Generation of permutations for SIMD processors. In: LCTES 2005, pp. 147–156. ACM, New York (2005)
Google Scholar
Nuzman, D., Rosen, I., Zaks, A.: Auto-vectorization of interleaved data for simd. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2006, pp. 132–143. ACM, New York (2006)
Google Scholar
Shin, J., Hall, M., Chame, J.: Superword-level parallelism in the presence of control flow. In: CGO (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou, China
Bo Zhao, Wei Gao, Rongcai Zhao, Lin Han, Huihui Sun & Yingying Li

Authors

Bo Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Wei Gao
View author publications
You can also search for this author in PubMed Google Scholar
Rongcai Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Lin Han
View author publications
You can also search for this author in PubMed Google Scholar
Huihui Sun
View author publications
You can also search for this author in PubMed Google Scholar
Yingying Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bo Zhao .

Editor information

Editors and Affiliations

University of North Carolina at Charlotte, Charlotte, North Carolina, USA
Yu Wang
Rutgers Business School, Newark, New Jersey, USA
Hui Xiong
Illinois Institute of Technology, Chicago, Illinois, USA
Shlomo Argamon
Illinois Institute of Technology, Chicago, Illinois, USA
XiangYang Li
Harbin Institute of Technology, Harbin, China
JianZhong Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, B., Gao, W., Zhao, R., Han, L., Sun, H., Li, Y. (2015). Performance Evaluation of NPB and SPEC CPU2006 on Various SIMD Extensions. In: Wang, Y., Xiong, H., Argamon, S., Li, X., Li, J. (eds) Big Data Computing and Communications. BigCom 2015. Lecture Notes in Computer Science(), vol 9196. Springer, Cham. https://doi.org/10.1007/978-3-319-22047-5_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-22047-5_21
Published: 24 July 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22046-8
Online ISBN: 978-3-319-22047-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics