Performance and power consumption analysis of Arm Scalable Vector Extension

Odajima, Tetsuya; Kodama, Yuetsu; Sato, Mitsuhisa

doi:10.1007/s11227-020-03495-5

Performance and power consumption analysis of Arm Scalable Vector Extension

Published: 10 November 2020

Volume 77, pages 5757–5778, (2021)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Tetsuya Odajima¹,
Yuetsu Kodama¹ &
Mitsuhisa Sato¹

352 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

Modern CPUs not only have multiple cores but also support wide single instruction multiple data (SIMD). This trend is expected to grow in the future. In this paper, we examine the effect of the vector length and the number of out-of-order resources on the performance and the power consumption of programs having multiple vector lengths using the Arm Scalable Vector Extension. Based on the performed evaluation, we conclude that using a longer vector length with multicycle vector units leads to up to approximately 30% improvement in performance and 21% decrease in power consumption than when using a shorter vector length.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluating the Effectiveness of a Vector-Length-Agnostic Instruction Set

Performance Evaluation of NPB and SPEC CPU2006 on Various SIMD Extensions

Scalability analysis of AVX-512 extensions

Article 23 April 2019

References

Stephens N (2016) ARMv8-A next-generation vector architecture for HPC. In: 2016 IEEE Hot Chips 28 Symposium (HCS), pp 1–31
Stephens N, Biles S, Boettcher M, Eapen J, Eyole M, Gabrielli G, Horsnell M, Magklis G, Martinez A, Premillieu N, Reid A, Rico A, Walker P (2017) The ARM scalable vector extension. IEEE Micro 37(2):26–39
Article Google Scholar
Brash D, Stephens N (2017) ARM: scaling new heights. In: COOL Chips 20
Tairum Cruz M (2018) Performing SVE studies using the arm instruction emulator. In: 2018 IEEE International Conference on Cluster Computing (CLUSTER), pp 638–638
github gem5 simulator. https://github.com/gem5/gem5
The gem5 Simulator—a modular platform for computer-system architecture research. http://gem5.org/
Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S, Sen R, Sewell K, Shoaib M, Vaish N, Hill MD, Wood DA (2011) The Gem5 simulator. ACM SIGARCH Comput Arch News 39(2):1–7
Article Google Scholar
Li S, Ahn JH, Strong RD, Brockman JB, Tullsen DM, Jouppi NP (2009) McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In: 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 469–480
Yuetsu K, Tetsuya O, Akira A, Mitsuhisa S (2019) Evaluation of the RIKEN post-k processor simulator. arXiv:1904.06451
ThunderX2 Arm-based Processors. https://www.marvell.com/products/server-processors/thunderx2-arm-processors.html
Hammond SD, Hughes C, Levenhagen MJ, Vaughan CT, Younge AJ, Schwaller B, Aguilar MJ, Pedretti KT, Laros JH (2019) Evaluating the Marvell ThunderX2 Server Processor for HPC Workloads. In: 2019 International Conference on High Performance Computing Simulation (HPCS), pp 416–423
Yoshida T (2018) Fujitsu high performance CPU for the post-K computer. In: Hot Chips: A Symposium on High Performance Chips (HC30)
Rico Al, Joao JA, Adeniyi-Jones C, Van Hensbergen E (2017) ARM HPC ecosystem and the reemergence of vectors: invited paper. In: Proceedings of the Computing Frontiers Conference, CF’17, pp 329–334, New York, NY, USA. Association for Computing Machinery
Poenaru A, McIntosh-Smith S (2020) Evaluating the effectiveness of a vector-length-agnostic instruction set. In: Euro-Par 2020: Parallel Processing, pp 98–114. Springer International Publishing
Naffziger S, Lepak K, Paraschou M, Subramony M (2020) 2.2 AMD Chiplet architecture for high-performance server and desktop products. In: 2020 IEEE International Solid-State Circuits Conference—ISSCC), pp 44–45
Hisamoto D, Lee W-C, Kedzierski J, Takeuchi H, Asano K, Kuo C, Anderson E, King T-J, Bokor J, Hu C (2000) FinFET-a self-aligned double-gate MOSFET scalable to 20 nm. IEEE Trans Electron Devices 47(12):2320–2325
Article Google Scholar
Kuhn KJ (2012) Considerations for ultimate CMOS scaling. IEEE Trans Electron Devices 59(7):1813–1828
Article Google Scholar
Gem5 to McPAT parser. https://github.com/Dhruv-Acharya/Gem5ToMcPAT-Parser
Kodama Y, Odajima T, Matsuda M, Tsuji M, Lee J, Sato M (2017) Preliminary performance evaluation of application kernels using ARM SVE with multiple vector lengths. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER), pp 677–684
Arm Instruction Emulator. https://developer.arm.com/tools-and-software/server-and-hpc/compile/arm-instruction-emulator
Endo FA, Couroussé D, Charles H (2014) Micro-architectural simulation of in-order and out-of-order ARM microprocessors with gem5. In: 2014 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV), pp 266–273
Shao YS, Xi SL, Srinivasan V, Wei G, Brooks D (2016) Co-designing accelerators and SoC interfaces using gem5-Aladdin. In: 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 1–12
Lim J, Lakshminarayana NB, Kim H, Song W, Yalamanchili S, Sung W (2014) Power modeling for GPU architectures using McPAT. ACM Trans Des Autom Electron Syst 19(3):1–24
Article Google Scholar
Endo FA, Couroussé D, Charles H-P (2015) Micro-architectural simulation of embedded core heterogeneity with Gem5 and McPAT. In: Proceedings of the 2015 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, RAPIDO’15, New York, NY, USA, Association for Computing Machinery
Inoue H (2016) How SIMD width affects energy efficiency: a case study on sorting. In: 2016 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS XIX), pp 1–3
Inoue H (2017) Energy efficiency effects of vectorization in data reuse transformations for many-core processors—a case study. J Low Power Electron Appl 7(1):1–21
Article Google Scholar

Download references

Acknowledgements

This work is partially funded by MEXT’s program for the Development and Improvement for the Next Generation Ultra High-Speed Computer System, under its Subsidies for Operating the Specific Advanced Large Research Facilities.

Author information

Authors and Affiliations

RIKEN Center for Computational Science, Kobe, Japan
Tetsuya Odajima, Yuetsu Kodama & Mitsuhisa Sato

Authors

Tetsuya Odajima
View author publications
You can also search for this author in PubMed Google Scholar
Yuetsu Kodama
View author publications
You can also search for this author in PubMed Google Scholar
Mitsuhisa Sato
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tetsuya Odajima.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Odajima, T., Kodama, Y. & Sato, M. Performance and power consumption analysis of Arm Scalable Vector Extension. J Supercomput 77, 5757–5778 (2021). https://doi.org/10.1007/s11227-020-03495-5

Download citation

Accepted: 26 October 2020
Published: 10 November 2020
Issue Date: June 2021
DOI: https://doi.org/10.1007/s11227-020-03495-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance and power consumption analysis of Arm Scalable Vector Extension

Abstract

Access this article

Similar content being viewed by others

Evaluating the Effectiveness of a Vector-Length-Agnostic Instruction Set

Performance Evaluation of NPB and SPEC CPU2006 on Various SIMD Extensions

Scalability analysis of AVX-512 extensions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Performance and power consumption analysis of Arm Scalable Vector Extension

Abstract

Access this article

Similar content being viewed by others

Evaluating the Effectiveness of a Vector-Length-Agnostic Instruction Set

Performance Evaluation of NPB and SPEC CPU2006 on Various SIMD Extensions

Scalability analysis of AVX-512 extensions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation