Avoiding Communication through a Multilevel LU Factorization

Donfack, Simplice; Grigori, Laura; Khabou, Amal

doi:10.1007/978-3-642-32820-6_55

Simplice Donfack¹⁹,
Laura Grigori¹⁹ &
Amal Khabou¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7484))

Included in the following conference series:

European Conference on Parallel Processing

3150 Accesses

Abstract

Due to the evolution of massively parallel computers towards deeper levels of parallelism and memory hierarchy, and due to the exponentially increasing ratio of the time required to transfer data, either through the memory hierarchy or between different compute units, to the time required to compute floating point operations, the algorithms are confronted with two challenges. They need not only to be able to exploit multiple levels of parallelism, but also to reduce the communication between the compute units at each level of the hierarchy of parallelism and between the different levels of the memory hierarchy.

In this paper we present an algorithm for performing the LU factorization of dense matrices that is suitable for computer systems with two levels of parallelism. This algorithm is able to minimize both the volume of communication and the number of messages transferred at every level of the two-level hierarchy of parallelism. We present its implementation for a cluster of multicore processors based on MPI and Pthreads. We show that this implementation leads to a better performance than routines implementing the LU factorization in well-known numerical libraries. For matrices that are tall and skinny, that is they have many more rows than columns, our algorithm outperforms the corresponding algorithm from ScaLAPACK by a factor of 4.5 on a cluster of 32 nodes, each node having two quad-core Intel Xeon EMT64 processors.

Download to read the full chapter text

Chapter PDF

Tall-and-Skinny QR Factorization for Clusters of GPUs Using High-Performance Building Blocks

ADELUS: A Performance-Portable Dense LU Solver for Distributed-Memory Hardware-Accelerated Systems

Introduction to Communication Avoiding Algorithms for Direct Methods of Factorization in Linear Algebra

Keywords

References

Agullo, E., Coti, C., Dongarra, J., Herault, T., Langem, J.: QR factorization of tall and skinny matrices in a grid computing environment. In: Parallel Distributed Processing Symposium (IPDPS), pp. 1–11. IEEE (2010)
Google Scholar
Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.: LAPACK Users’ Guide. SIAM, Philadelphia (1999)
Book Google Scholar
Blackford, L.S., Choi, J., Cleary, A., D’Azevedo, E., Demmel, J., Dhillon, I., Dongarra, J., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., Whaley, R.C.: Scalapack: A linear algebra library for message-passing computers. In: SIAM Conference on Parallel Processing (1997)
Google Scholar
Cannon, L.E.: A cellular computer to implement the Kalman filter algorithm. PhD thesis, Montana State University (1969)
Google Scholar
Cappello, F., Desprez, F., Dayde, M., Jeannot, E., Jegou, Y., Lanteri, S., Melab, N., Namyst, R., Primet, P.V.B., Richard, O., et al.: Grid5000: a nation wide experimental grid testbed. International Journal on High Performance Computing Applications 20(4), 481–494 (2006)
Article Google Scholar
Demmel, J., Grigori, L., Hoemmen, M., Langou, J.: Communication-optimal parallel and sequential QR and LU factorizations. Technical Report UCB/EECS-2008-89, University of California Berkeley, EECS Department, LAWN #204 (2008)
Google Scholar
Donfack, S., Grigori, L., Gupta, A.K.: Adapting communication-avoiding LU and QR factorizations to multicore architectures. In: IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE (2010)
Google Scholar
Elmroth, E., Gustavson, F.: New Serial and Parallel Recursive QR Factorization Algorithms for SMP Systems. In: Kågström, B., Elmroth, E., Waśniewski, J., Dongarra, J. (eds.) PARA 1998. LNCS, vol. 1541, pp. 120–128. Springer, Heidelberg (1998)
Chapter Google Scholar
Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache-oblivious algorithms. In: 40th Annual Symposium on Foundations of Computer Science, pp. 285–297 (1999)
Google Scholar
Van De Geijn, R.A., Watts, J.: SUMMA: Scalable Universal Matrix Multiplication Algorithm. Concurrency Practice and Experience 9(4), 255–274 (1997)
Article Google Scholar
Graham, S.L., Snir, M., Patterson, C.A.: Getting up to speed: The future of supercomputing. National Academies Press (2005)
Google Scholar
Grigori, L., Demmel, J., Xiang, H.: CALU: A communication optimal LU factorization algorithm. SIAM Journal on Matrix Analysis and Applications 32, 1317–1350 (2011)
Article MathSciNet MATH Google Scholar
Grigori, L., Demmel, J.W., Xiang, H.: Communication avoiding Gaussian elimination. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, p. 29. IEEE Press (2008)
Google Scholar
Hong, J.-W., Kung, H.T.: I/O complexity: The red-blue pebble game. In: Proceedings of the Thirteenth Annual ACM Symposium on Theory of Computing. ACM (1981)
Google Scholar
Irony, D., Toledo, S., Tiskin, A.: Communication lower bounds for distributed-memory matrix multiplication. Journal of Parallel and Distributed Computing 64(9), 1017–1026 (2004)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

INRIA Saclay-Ile de France, Laboratoire de Recherche en Informatique, Université Paris-Sud, France
Simplice Donfack, Laura Grigori & Amal Khabou

Authors

Simplice Donfack
View author publications
You can also search for this author in PubMed Google Scholar
Laura Grigori
View author publications
You can also search for this author in PubMed Google Scholar
Amal Khabou
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Patras, Computer Technology Institute and Press “Diophantus”,, N. Kazantzaki, 26504, Rio, Greece
Christos Kaklamanis
University of Patras, University Building B, 26504, Rio, Greece
Theodore Papatheodorou
Computer Technology Institute and Press “Diophantus”, University of Patras, N. Kazantzaki, 26504, Rio, Greece
Paul G. Spirakis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Donfack, S., Grigori, L., Khabou, A. (2012). Avoiding Communication through a Multilevel LU Factorization. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds) Euro-Par 2012 Parallel Processing. Euro-Par 2012. Lecture Notes in Computer Science, vol 7484. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32820-6_55

Download citation

DOI: https://doi.org/10.1007/978-3-642-32820-6_55
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32819-0
Online ISBN: 978-3-642-32820-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics