Elsevier

Parallel Computing

Volume 27, Issue 7, June 2001, Pages 913-923
Parallel Computing

External selective orthogonalization for the Lanczos algorithm in distributed memory environments

https://doi.org/10.1016/S0167-8191(01)00074-6Get rights and content

Abstract

The k-step explicit restart Lanczos algorithm, LExpRes, for the computation of a few of the extreme eigenpairs of large, usually sparse, symmetric matrices, computes one eigenpair at a time using a deflation technique in which each Lanczos vector generated is orthogonalized against all previously converged eigenvectors. The computation of the inner products associated with this external orthogonalization often creates a bottleneck in parallel distributed memory environments. In this paper methods are proposed which significantly reduce this computational overhead in LExpRes, thereby effectively improving its efficiency. The performances of these methods on the Cray-T3D and the Cray-T3E are assessed and critically compared with that of the original algorithm.

Introduction

The Lanczos algorithm is one of the principal methods for the computation of a few of the extreme eigenvalues and their corresponding eigenvectors of large, usually sparse, real symmetric matrices. Given a symmetric matrix A∈Rn×n, the standard Lanczos method generates a sequence of tridiagonal matrices TjRj×j and Lanczos vectors qjRn with the properties that Tj−1Rj−1×j−1 is a principal submatrix of Tj=QjTAQj where Qj=[q1,…,qj] is orthonormal and that for jn, the extreme eigenvalues of A are well approximated by the corresponding eigenvalues of the Lanczos matrices Tj [2].

However, one of the main drawbacks of the Lanczos method is that when the classical three-term recurrence is used in a machine environment orthogonality of the Lanczos vectors is lost. Consequently, spurious eigenvalues are generated and the process fails to terminate. A number of approaches have been suggested for overcoming this problem, one of which adopts a full orthogonalization scheme in which each newly generated Lanczos vector is orthogonalized against all of its predecessors [5]. This, however, is computationally expensive since it is not known in advance how many steps are required before an accurate solution is computed. In an attempt to further reduce this overhead researchers have also developed a number of implicit and explicit restart strategies which restart the process at certain points using better approximations to the required eigenvectors [6], [7], [8], [9], [10].

LExpRes is a k-step explicit restart variant of the Lanczos algorithm which incorporates full orthogonalization of the Lanczos vectors [9]. Further, in order to prevent the recomputation of eigenvalues that have already been computed, each newly computed Lanczos vector is also orthogonalized against all previously converged eigenvectors, a process known as full external orthogonalization. As the number of eigenvalues requested increases so does the computational expense of this process. In this paper an alternative selective external orthogonalization scheme is proposed which significantly reduces this overhead and which enables two new variants of LExpRes to be constructed, each of which may be efficiently implemented in a distributed memory MIMD environment. In this paper, a brief description of the LExpRes algorithm is presented in Section 2. Section 3 provides a detailed description of the external orthogonalization scheme and the two variants, LExpExt and LExpEst which incorporate this scheme. In the remaining sections implementation, numerical experiments, results, and conclusions are discussed.

Section snippets

Lanczos with explicit restart, LExpRes

In the explicit restart method, LExpRes, the p largest eigenvalues of A, together with their corresponding eigenvectors, are computed one at a time in descending order of magnitude. A brief description of LExpRes, in which a restart of the Lanczos process occurs after the computation of each eigenpair, is given below.

Suppose that approximated eigenpairs (λ1,x1),…,(λi,xi) are given for i<p. Let Xi=span{x1,…,xi} and let its orthogonal component in Rn be Xi. The Lanczos algorithm will converge to

Orthogonalization

LExpExt and LExpEst are variants of LExpRes which incorporate similar schemes to those of Grimes et al. [3] for reducing external orthogonalization. The differences occur in the function Lanczos_for_one_eig( ). In LExpExt (LExpEst) this function is modified so those external orthogonalization does not take place until the level (approximate level) of orthogonality between the required Ritz vector and the previously computed eigenvectors exceeds ϵ. Further, immediately the appropriate condition

Implementation

The algorithms were implemented on the CRAY-T3D and the CRAY-T3E using a reverse communication strategy in which control is returned to the user when a matrix–vector product of the form Aqj is required. The user provides the code for this operation which is optimized for the target machine. Thus, for the purpose of this paper, matrix–vector products are implemented using the shmem_get routine which is available in the Shared Memory Library (SHMEM), on all Cray MPP systems.

The algorithms also

Numerical experiments

The algorithms LExpExt and LExpEst have been implemented on the Cray-T3D and their performances have been compared with that of LExpRes using four sparse symmetric matrices selected from the Harwell Boeing collection [1]. The experiments primarily assess the effectiveness of the scheme for reducing the external orthogonalization described in Section 3. They also compare and analyse the efficiency with which the algorithms are implemented in a massively parallel, distributed memory environment,

Conclusion

The results of the numerical experiments show that, in general, the new algorithms require fewer inner products for external orthogonalization than the original. In particular, LExpExt proved to be superior in this respect whenever the number of eigenvalues requested exceeded four. However, the percentage reduction in the number of inner products required by both new algorithms proved to be the greatest when the number of eigenvalues requested was small. It was also observed that, in general,

References (10)

  • M. Szularz et al.

    Explicitly restarted Lanczos algorithms in an MPP environment

    Parallel Computing

    (1999)
  • I.S. Duff, R.G. Grimes, J.G. Lewis, User's Guide for the Harwell–Boeing Sparse Matrix Collection (release I), 1992,...
  • G. Golub et al.

    Matrix Computations

    (1989)
  • R.F. Grimes et al.

    A shifted block Lanczos algorithm for solving sparse symmetric generalized eigenproblems

    SIAM J. Matrix Anal. Appl.

    (1994)
  • W. Hoffman

    Iterative algorithms for Gram–Schmidt orthogonalization

    Computing

    (1989)
There are more references available in the full text version of this article.

Cited by (2)

This work was carried out using the facilities of the University of Edinburgh Parallel Computing Centre and the Computer Services for Academic Research at the University of Manchester.

View full text