Skip to main content

Performance Analysis of the Householder-Type Parallel Tall-Skinny QR Factorizations Toward Automatic Algorithm Selection

  • Conference paper
  • First Online:
High Performance Computing for Computational Science -- VECPAR 2014 (VECPAR 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8969))

Abstract

We consider computing tall-skinny QR factorizations on a large-scale parallel machine. We present a realistic performance model and analyze the difference of the parallel execution time between Householder QR and TSQR. Our analysis indicates the possibility that TSQR becomes slower than Householder QR as the number of columns of the target matrix increases. We aim for estimating the difference and selecting the faster algorithm by using models, which falls into auto-tuning. Numerical experiments on the K computer support our analysis and show our success in determining the faster algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Operated at the RIKEN Advanced Institute for Computational Science.

References

  1. Bai, Z., Demmel, J., Dongarra, J., Ruhe, A., van der Vorst, H.: Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide. SIAM, Philadelphi (2000)

    Book  Google Scholar 

  2. Gutknecht, M.H.: Block Krylov space methods for linear systems with multiple right-hand sides: An introduction (2006)

    Google Scholar 

  3. Sakurai, T., Sugiura, H.: A projection method for generalized eigenvalue problems using numerical integration. J. Comput. Appl. Math. 159, 119–128 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  4. Ballard, G., Demmel, J., Holtz, O., Schwartz, O.: Minimizing communication in numerical linear algebra. SIAM J. Matrix Anal. Appl. 32, 866–901 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  5. Demmel, J., Grigori, L., Hoemmen, M., Langou, J.: Communication-avoiding parallel and sequential QR factorizations. CoRR abs/0806.2159 (2008)

    Google Scholar 

  6. Demmel, J., Grigori, L., Hoemmen, M., Langou, J.: Communication-optimal parallel and sequential QR and LU factorizations. SIAM J. Sci. Comp 34, 206–239 (2012)

    Article  MathSciNet  Google Scholar 

  7. Golub, G.H., Van Loan, C.F.: Matrix Computations, 4th edn. The Johns Hopkins University Press, Baltimore (2012)

    Google Scholar 

  8. Agullo, E., Coti, C., Dongarra, J., Herault, T., Langou, J.: Qr factorization of tall and skinny matrices in a grid computing environment. In: 24th IEEE International Parallel and Distributed Processing Symposium, pp. 1–11. IEEE (2010)

    Google Scholar 

  9. Constantine, G., Gleich, D.: Tall and skinny qr factorizations in mapreduce architectures. In: 2nd international workshop on MapReduce and its applications. pp. 43–50 (2011)

    Google Scholar 

  10. Langou, J.: Computing the r of the qr factorization of tall and skinny matrices using MPI\_Reduce. arXiv:1002.4250 (2010)

  11. Song, F., Ltaief, H., Hadri, B., Dongarra, J.: Scalable tile communication-avoiding QR factorization on multicore cluster systems. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2010). pp. 1–11 (2010)

    Google Scholar 

  12. Dongarra, J., Faverge, M., HéRault, T., Jacquelin, M., Langou, J., Robert, Y.: Hierarchical QR factorization algorithms for multi-core clusters. Parallel Comput. 39, 212–232 (2013)

    Article  MathSciNet  Google Scholar 

  13. Ballard, G., Demmel, J., Grigori, L., Jacquelin, M., Nguyen, H.D., Solomonik, E.: Reconstructing Householder vectors from tall-skinny QR. Technical Report UCB/EECS-2013-175, EECS Department, University of California, Berkeley (2013)

    Google Scholar 

  14. Hoemmen, M.: A communication-avoiding, hybrid-parallel, rank-revealing orthogonalization method. In: 23th IEEE International Parallel and Distributed Processing Symposium, pp. 966–977. IEEE (2011)

    Google Scholar 

Download references

Acknowledgments

The authors would like to thank the anonymous referees for their valuable comments. The first author appreciates the fruitful discussion with Dr. Mark Hoemmen at iWAPT2014. This research was supported by JST, CREST and used computational resources of the K computer provided by the RIKEN AICS through the HPCI System Research project (Project ID:hp120170).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Takeshi Fukaya .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Fukaya, T., Imamura, T., Yamamoto, Y. (2015). Performance Analysis of the Householder-Type Parallel Tall-Skinny QR Factorizations Toward Automatic Algorithm Selection. In: Daydé, M., Marques, O., Nakajima, K. (eds) High Performance Computing for Computational Science -- VECPAR 2014. VECPAR 2014. Lecture Notes in Computer Science(), vol 8969. Springer, Cham. https://doi.org/10.1007/978-3-319-17353-5_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-17353-5_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-17352-8

  • Online ISBN: 978-3-319-17353-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics