Performance and energy impact of parallelization and vectorization techniques in modern microprocessors

Cebrián, Juan M.; Natvig, Lasse; Meyer, Jan Christian

doi:10.1007/s00607-013-0366-5

Performance and energy impact of parallelization and vectorization techniques in modern microprocessors

Published: 16 November 2013

Volume 96, pages 1179–1193, (2014)
Cite this article

Computing Aims and scope Submit manuscript

Juan M. Cebrián¹,
Lasse Natvig¹ &
Jan Christian Meyer²

509 Accesses
8 Citations
Explore all metrics

Abstract

While Moore’s law states that the number of transistors is approximately doubled every 2 years, powering these transistors simultaneously is only possible as long as Dennard scaling continues. Unfortunately, voltage scaling has slowed down in recent years, and microprocessor designers have hit what is known as the “utilization wall” or the “dark silicon” effect. Vectorization, parallelization, specialization and heterogeneity are the key approaches to deal with this utilization wall. However, how software developers can maximize energy efficiency of these architectures remains an open question. This paper presents an energy evaluation of parallelization using both physical and logical cores (i.e., SMT/Hyper-Threading), vectorization (SSE, Advanced Vector Extensions and NEON) and dynamic core reconfiguration [\(\hbox {Intel}^{\circledR }\)’s Turbo Boost Technology (TBT)]. The evaluation spans microprocessors for embedded, laptop, desktop and server markets, since there is a convergence among them towards energy efficiency. The analyzed processors include Intel’s Core\(^\mathrm{TM}\) i5 and i7 family and ARM\(^{\circledR }\)’s Cortex\(^\mathrm{TM}\) A9 and A15. Results show that software developers should prioritize vectorization over thread parallelism when possible, as it yields better energy efficiency, especially on the Intel platforms. Application scalability can be reduced drastically when using vectorization and threading simultaneously since vectorization increases pressure on the memory subsystem. Intel’s TBT further improves energy efficiency by an additional 10–20 % depending on the number of active threads.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scalability analysis of AVX-512 extensions

Article 23 April 2019

ParVec: vectorizing the PARSEC benchmark suite

Article 17 February 2015

Energy consumption model in multicore architectures with variable frequency

Article 17 June 2020

References

Anzt H, Castillo M, Fernández J, Heuveline V, Igual F, Mayo R, Quintana-Ortí E (2011) Optimization of power consumption in the iterative solution of sparse linear systems on graphics processors. Computer Science: Research and Development, pp 1–9
Association ESI, Information JE, Association TI, Association KSI, Association TSI, Association SI (2012) International Technology Roadmap for Semiconductors report. http://www.itrs.net/Links/2012ITRS/Home2012.htm
Bienia C (2011) Benchmarking modern multiprocessors. Ph.D. thesis, Princeton University, Princeton
Chandrakasan A, Brodersen R (1998) Low power CMOS design. IEEE Press, New York
Book Google Scholar
Chandrakasan AP, Bowhill WJ, Fox F (2000) Design of high-performance microprocessor circuits, 1st edn. Wiley-IEEE Press, New York
Book Google Scholar
Dennard R, Gaensslen F, Rideout V, Bassous E, LeBlanc A (1974) Design of ion-implanted mosfet’s with very small physical dimensions. Solid-State Circuits IEEE J 9(5):256–268. doi:10.1109/JSSC.1974.1050511
Google Scholar
Ferdman M, Adileh A, Kocberber O, Volos S, Alisafaee M, Jevdjic D, Kaynak C, Popescu AD, Ailamaki A, Falsafi B (2012) Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In: 17th international conference on architectural support for programming languages and operating systems (ASPLOS). Recognized as best paper by the program committee
Ghose S, Srinath S, Tse J (2011) Accelerating a PARSEC benchmark using portable subword SIMD. In: CS 5220: final Project, report
Intel (2008) White paper: Intel Turbo Boost Technology in Intel Core microarchitecture (Nehalem) based processors
Intel (2011) Avoiding AVX–SSE transition penalties
Intel (2011) Intel64 and IA-32 architectures optimization reference manual
Intel (2012) Intel64 and IA-32 architecture software development manual
Kim C, Satish N, Chhugani J, Saito H, Krishnaiyer R, Smelyanskiy M, Girkar M, Dubey P (2012) Technical report: closing the ninja performance gap through traditional programming and compiler technology
Li J, Martínez JF (2005) Power-performance considerations of parallel computing on chip multiprocessors. ACM Trans Archit Code Optim 2(4):397–422. doi:10.1145/1113841.1113844
Article Google Scholar
Macken P, Degrauwe M, Van Paemel M, Oguey H (1990) A voltage reduction technique for digital systems. In: Proceedings of the 37th IEEE international solid-state circuits conference. Digest of technical papers, pp 238–239. doi:10.1109/ISSCC.1990.110213
Molka D, Hackenberg D, Schöne R, Minartz T, Nagel W (2011) Flexible workload generation for HPC cluster efficiency benchmarking. Computer Science: Research and Development, pp 1–9. doi:10.1007/s00450-011-0194-9
Mucci PJ, Browne S, Deane C, Ho G (1999) PAPI: a portable interface to hardware performance counters. In: Proceedings of the department of defense HPCMP users group conference, pp 7–10
Sazeides Y, Kumar R, Tullsen DM, Constantinou T (2005) The danger of interval-based power efficiency metrics: When worst is best. In: Computer architecture letters, vol 4
Simunic T, Benini L, Acquaviva A, Glynn P, de Micheli G (2001) Dynamic voltage scaling and power management for portable systems. In: Proceedings on design automation conference, pp 524–529. doi:10.1109/DAC.2001.156195
Totoni E, Behzad B, Ghike S, Torrellas J (2012) Comparing the power and performance of Intel’s SCC to state-of-the-art CPUs and GPUs. In: IEEE international symposium on performance analysis of systems and software, vol 0, pp 78–87. doi:10.1109/ISPASS.2012.6189208

Download references

Acknowledgments

The authors gratefully acknowledge the support of the PRACE 2IP project, the NOTUR project, and the HiPEAC Network of Excellence.

Author information

Authors and Affiliations

Department of Computer and Information Science (IDI), NTNU, 7491 , Trondheim, Norway
Juan M. Cebrián & Lasse Natvig
High Performance Computing Section, IT Department, NTNU, 7491 , Trondheim, Norway
Jan Christian Meyer

Authors

Juan M. Cebrián
View author publications
You can also search for this author in PubMed Google Scholar
Lasse Natvig
View author publications
You can also search for this author in PubMed Google Scholar
Jan Christian Meyer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Juan M. Cebrián.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cebrián, J.M., Natvig, L. & Meyer, J.C. Performance and energy impact of parallelization and vectorization techniques in modern microprocessors. Computing 96, 1179–1193 (2014). https://doi.org/10.1007/s00607-013-0366-5

Download citation

Received: 15 March 2013
Accepted: 29 October 2013
Published: 16 November 2013
Issue Date: December 2014
DOI: https://doi.org/10.1007/s00607-013-0366-5

Keywords

Mathematics Subject Classification

68-04

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance and energy impact of parallelization and vectorization techniques in modern microprocessors

Abstract

Access this article

Similar content being viewed by others

Scalability analysis of AVX-512 extensions

ParVec: vectorizing the PARSEC benchmark suite

Energy consumption model in multicore architectures with variable frequency

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Performance and energy impact of parallelization and vectorization techniques in modern microprocessors

Abstract

Access this article

Similar content being viewed by others

Scalability analysis of AVX-512 extensions

ParVec: vectorizing the PARSEC benchmark suite

Energy consumption model in multicore architectures with variable frequency

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation