Performance and energy consumption of the SIMD Gram–Schmidt process for vector orthogonalization

Jakobs, Thomas; Naumann, Billy; Rünger, Gudula

doi:10.1007/s11227-019-02839-0

Performance and energy consumption of the SIMD Gram–Schmidt process for vector orthogonalization

Published: 05 April 2019

Volume 76, pages 1999–2021, (2020)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

274 Accesses
Explore all metrics

Abstract

In linear algebra and numerical computing, the orthogonalization of a set of vectors is an important submethod. Thus, the efficient implementation on recent architectures is required to provide a useful kernel for high-performance applications. In this article, we consider the process of orthogonalizing a set of vectors with the Gram–Schmidt method and develop SIMD implementations for processors providing the Advanced Vector Extensions (AVX), which is a set of instructions for SIMD execution on recent Intel and AMD CPUs. Several SIMD implementations of the Gram–Schmidt process for vector orthogonalization are built, and their behavior with respect to performance and energy is investigated. Especially, different ways to implement the SIMD programs are proposed and several optimizations have been studied. As hardware platforms, the Intel Core, Xeon and Xeon Phi processors with the AVX versions AVX, AVX2 and AVX512 have been used.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Parallel Vectorized Algorithms for Computing Trigonometric Sums Using AVX-512 Extensions

Strategies for the Vectorized Block Conjugate Gradients Method

Sparse Linear Algebra on AMD and NVIDIA GPUs – The Race Is On

References

Björck Å (1967) Solving linear least squares problems by Gram–Schmidt orthogonalization. BIT Numer Math 7(1). https://doi.org/10.1007/BF01934122
Businger P, Golub GH (1965) Linear least squares solutions by householder transformations. Numer Math 7(3). https://doi.org/10.1007/BF01436084
Carretero J, Distefano S, Petcu D, Pop D, Rauber T, Rünger G, Singh DE (2015) Energy-efficient Algorithms for ultrascale systems. Supercomput Front Innov 2(2). https://doi.org/10.14529/jsfi150205
Cebrián JM, Jahre M, Natvig L (2014) Optimized hardware for suboptimal software: the case for SIMD-aware benchmarks. In: 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). https://doi.org/10.1109/ISPASS.2014.6844462
Cebrian JM, Jahre M, Natvig L (2015) ParVec: vectorizing the PARSEC benchmark suite. Computing 97(11). https://doi.org/10.1007/s00607-015-0444-y
Cebrián JM, Natvig L, Meyer JC (2014) Performance and energy impact of parallelization and vectorization techniques in modern microprocessors. Computing 96(12). https://doi.org/10.1007/s00607-013-0366-5
Crâşmariu V, Arvinte M, Enescu A, Ciochină S (2017) Optimized block-diagonalization precoding technique using givens rotations QR decomposition. In: 2017 25th European Signal Processing Conference (EUSIPCO). https://doi.org/10.23919/EUSIPCO.2017.8081328
Golub GH, Van Loan CF (2013) Matrix computations, 4th edn. Johns Hopkins University Press, Baltimore
MATH Google Scholar
Haidar A, Jagode H, YarKhan A, Vaccaro P, Tomov S, Dongarra J (2017) Power-aware computing: measurement, control, and performance analysis for Intel Xeon Phi. In: 2017 IEEE High Performance Extreme Computing Conference (HPEC). https://doi.org/10.1109/HPEC.2017.8091085
Hoffmann W (1989) Iterative algorithms for Gram–Schmidt orthogonalization. Computing 41:4. https://doi.org/10.1007/BF02241222
Article MathSciNet MATH Google Scholar
Ibrahim MEA, Rupp M, Fahmy HAH (2009) Code transformations and SIMD impact on embedded software energy/power consumption. In: 2009 International Conference on Computer Engineering Systems. https://doi.org/10.1109/ICCES.2009.5383317
Intel Corporation (2018) Intel 64 and IA-32 Architectures Software Developer’s Manual Combined Volumes: 1, 2a, 2b, 2c,2d, 3a, 3b, 3c, 3d and 4. Technical report, Intel Corporation. URL https://software.intel.com/en-us/articles/intel-sdm
Jakobs T, Hofmann M, Rünger G (2016) Reducing the power consumption of matrix multiplications by vectorization. In: 2016 IEEE International Conference on Computational Science and Engineering (CSE). https://doi.org/10.1109/CSE-EUC-DCABES.2016.187
Jakobs T, Rünger G (2018) Examining energy efficiency of vectorization techniques using a Gaussian elimination. In: International Conference on High Performance Computing & Simulation (HPCS 2018). IEEE. https://doi.org/10.1109/HPCS.2018.00054
Jakobs T, Rünger G (2018) On the energy consumption of Load/Store AVX instructions. In: Federated Conference on Computer Science and Information Systems (FedCSIS). https://doi.org/10.15439/2018F28
Kim C, Satish N, Chhugani J, Saito H, Krishnaiyer R, Smelyanskiy M, Girkar M, Dubey P (2013) Closing the ninja performance gap through traditional programming and compiler technology. Technical report, Intel Corporation. https://software.intel.com/sites/default/files/article/478267/intel-labs-closing-ninja-gap-paper.pdf
Rünger G, Schwind M (2005) Comparison of different parallel modified Gram–Schmidt algorithms. In: Euro-Par 2005 Parallel Process. https://doi.org/10.1109/IPDPS.2008.4536474
Stock K, Pouchet LN, Sadayappan P (2012) Using machine learning to improve automatic vectorization. ACM Trans Archit Code Optim 8(4). https://doi.org/10.1145/2086696.2086729

Download references

Acknowledgements

This work was supported by the German Ministry of Science and Education (BMBF) project SeASiTe, Grant No. 01IH16012B.

Author information

Authors and Affiliations

Faculty of Computer Science, Chemnitz University of Technology, Chemnitz, Germany
Thomas Jakobs, Billy Naumann & Gudula Rünger

Authors

Thomas Jakobs
View author publications
You can also search for this author inPubMed Google Scholar
Billy Naumann
View author publications
You can also search for this author inPubMed Google Scholar
Gudula Rünger
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Thomas Jakobs.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jakobs, T., Naumann, B. & Rünger, G. Performance and energy consumption of the SIMD Gram–Schmidt process for vector orthogonalization. J Supercomput 76, 1999–2021 (2020). https://doi.org/10.1007/s11227-019-02839-0

Download citation

Published: 05 April 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s11227-019-02839-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance and energy consumption of the SIMD Gram–Schmidt process for vector orthogonalization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Parallel Vectorized Algorithms for Computing Trigonometric Sums Using AVX-512 Extensions

Strategies for the Vectorized Block Conjugate Gradients Method

Sparse Linear Algebra on AMD and NVIDIA GPUs – The Race Is On

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now