Abstract
In this chapter, we explain the basic architecture and use of the linear algebra calculation libraries called BLAS and LAPACK. BLAS and LAPACK libraries are for carrying out vector and matrix operations on computers. They are used by many programs, and their implementations are optimized according to the computer they are run on. These libraries should be used whenever possible for linear algebra operations. This is because algorithms based directly on mathematical theorems in textbooks may be inefficient and their results may not have sufficient accuracy in practice. Moreover, programming such algorithms are bothersome. However, performance may suffer if you use a non-optimized library. In fact, the difference in performance between a non-optimized and optimized one is likely very large, so you should choose the fastest one for your computer. The availability of optimized BLAS and LAPACK libraries have improved remarkably. For example, they are now included in Linux distributions such as Ubuntu. In this chapter, we will refer to the libraries for Ubuntu 16.04 so that readers can easily try them out for themselves. Unfortunately, we will not mention GPU implementations on account of lack of space. However, the basic ideas are the same as presented in this chapter; therefore, we believe that readers will easily be able to utilize them as well.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Although Carl Friedrich Gauss is sometimes credited with the rediscovery, Isaac Newton, a 100 years earlier, wrote that textbooks of the day lacked a method for solving simultaneous equations and proceeded to publish one that became well circulated.
- 2.
There once was a time when the format varied from one manufacturer or vendor to the other; when data ceased being compatible or when a computer was replaced, it was necessary to change the program as well.
- 3.
In multicore environments, programs run in parallel in a light process called “threads.” Because different threads can access the same memory area at the same time. It may induce conflicts. In LAPACK 3.3, all the routines are now thread safe by removing such private variables.
- 4.
It looks very similar to the textbook implementation. However, in this case, we use sub-matrices instead of numbers. This algorithm makes use of the hierarchical structure of the memory cache; it is also suitable for multicore CPUs because of the independence of each sub-matrix \(C_{pq}\).
- 5.
The situation before 2010 was quite chaotic, because the source code was hidden by vendors.
- 6.
The calculation becomes difficult when the clock changes dynamically such as in the case of TurboBoost.
- 7.
AVX refers to Intel Advanced Vector Extensions which is an extension of the SIMD-type instructions succeeding SSE. It has a 256 bit width and can calculate additions and multiplications in one clock. It can store four double-precision values in 256 bits. It can calculate two multiplications per clock, so it is possible to perform eight operations in 1 clock.
- 8.
However, it is better to use the Xeon because it has more memory bandwidth despite it being more expensive to run on a Core i7.
References
Author unknown, The nine chapters on the mathematical art, around the 1st century BC to the second century AD
IEEE, IEEE standard for floating-point arithmetic, IEEE Std 754-2008, pp. 1–70 (2008)
N.J. Higham, SIAM: Society for Industrial and Applied Mathematics, 2nd edn. (2002)
S. Hauberg, J. W. Eaton, D. Bateman, GNU Octave Version 3.0.1 Manual: A High-level Interactive Language for Numerical Computations (CreateSpace Independent Publishing Platform, 2008)
MATLAB, version 7.10.0 (R2010a). The MathWorks Inc., Natick, Massachusetts (2010)
BLAS quick reference card, http://www.netlib.org/blas/blasqr.pdf
B. Kågström, P. Ling, C. Van Loan, GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark, ACM Trans. Math. Softw. 24(3), 268–302 (1998)
R.C. Whaley, J.J. Dongarra, in Proceedings of the 1998 ACM/IEEE Conference on Supercomputing, SC ’98, 1 (IEEE Computer Society, Washington, 1998)
K. Goto, R.A. van de Geijn, ACM Trans. Math. Softw. 34, 12:1 (2008)
X. Zhang, Q. Wang, Y. Zhang, in IEEE 18th International Conference on Parallel and Distributed Systems (ICPADS), vol. 17 (IEEE Computer Society, 2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Nakata, M. (2019). Basics and Practice of Linear Algebra Calculation Library BLAS and LAPACK. In: Geshi, M. (eds) The Art of High Performance Computing for Computational Science, Vol. 1. Springer, Singapore. https://doi.org/10.1007/978-981-13-6194-4_6
Download citation
DOI: https://doi.org/10.1007/978-981-13-6194-4_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-6193-7
Online ISBN: 978-981-13-6194-4
eBook Packages: Computer ScienceComputer Science (R0)