Abstract
The QR decomposition is one of the fundamental matrix decompositions in data mining. A particularly challenging case of QR decomposition is to deal with the tall-and-skinny matrix. Tall-skinny QR has lots of applications such as Krylov subspace methods and some subspace projection methods. Furthermore, tall-skinny QR can accelerate the process of principal component analysis (PCA). Although algorithms like TSQR and Cholesky QR have been proposed for computing QR decompositions on tall-and-skinny matrices, none of these algorithms are suitable for being applied to the GPGPU, which has been increasingly used nowadays. In view of the limited memory in GPGPU and also the costly data transmission between CPU and GPGPU, we propose a novel R-initiated TSQR to make the computing of tall-and-skinny QR on the GPGPU efficient. Explicitly, our method is unique in that it utilizes Givens QR to take advantage of the existence of dual-triangular (DT) structure in submatrices in TSQR so as to significantly reduce the computation required. With the R-initiated method, our method can not only meet the memory limitation of GPGPU but also avoid large amounts of data transmission. Theoretical results are derived, showing the merit of the proposed method. The experimental results indicate that our method significantly outperforms the conventional TSQR.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bai, Z., Demmel, J., Dongarra, J., Ruhe, A., van der Vorst, H.: Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide. SIAM, Bangkok (2000)
Gutknecht, M.H.: Block Krylov space methods for linear systems with multiple right-hand sides: an introduction (2006)
Sakurai, T., Sugiura, H.: A projection method for generalized eigenvalue problems using numerical integration. J. Comput. Appl. Math. 159(1), 119–128 (2003)
Sharma, A., Paliwal, K.K., Imoto, S., Miyano, S.: Principal component analysis using QR decomposition. Int. J. Mach. Learn. Cybern. 4(6), 679–683 (2013)
Nguyen, H.D., Demmel, J.: Reproducible tall-skinny QR. In: 2015 IEEE 22nd Symposium on Computer Arithmetic (ARITH), pp. 152–159. IEEE (2015)
Yamamoto, Y.: Aggregation of the compact WY representations generated by the TSQR algorithm. In: Conference Talk Presented in SIAM Applied Linear Algebra (2012)
Fukaya, T., Nakatsukasa, Y., Yanagisawa, Y., Yamamoto, Y.: CholeskyQR2: a simple and communication-avoiding algorithm for computing a tall-skinny QR factorization on a large-scale parallel system. In: 2014 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA), pp. 31–38. IEEE (2014)
Volkov, V., Demmel, J.: LU, QR and Cholesky factorizations using vector capabilities of GPUS. Technical report, UCB/EECS-2008-49, vol. 49, EECS Department, University of California, Berkeley (2008)
Kerr, A., Campbell, D., Richards, M.: QR decomposition on GPUS. In: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, pp. 71–78. ACM (2009)
Humphrey, J.R., Price, D.K., Spagnoli, K.E., Paolini, A.L., Kelmelis, E.J.: CULA: hybrid GPU accelerated linear algebra routines. In: SPIE Defense, Security, and Sensing, pp. 502–770. International Society for Optics and Photonics (2010)
Anderson, M., Ballard, G., Demmel, J., Keutzer, K.: Communication-avoiding QR decomposition for GPUS. In: 2011 IEEE International Parallel & Distributed Processing Symposium (IPDPS), pp. 48–58. IEEE (2011)
Demmel, J., Grigori, L., Hoemmen, M., Langou, J.: Communication-optimal parallel and sequential QR and LU factorizations. SIAM J. Sci. Comput. 34(1), A206–A239 (2012)
Constantine, P.G., Gleich, D.F.: Tall and skinny QR factorizations in MapReduce architectures. In: Proceedings of the Second International Workshop on MapReduce and Its Applications, pp. 43–50. ACM (2011)
Ballard, G., Demmel, J., Grigori, L., Jacquelin, M., Knight, N., Nguyen, H.D.: Reconstructing householder vectors from tall-skinny QR. J. Parallel Distrib. Comput. 85, 3–31 (2015)
Ballard, G., Demmel, J., Grigori, L., Jacquelin, M., Nguyen, H.D., Solomonik, E.: Reconstructing householder vectors from tall-skinny QR. In: 2014 IEEE 28th International Parallel and Distributed Processing Symposium, pp. 1159–1170. IEEE (2014)
Benson, A.R., Gleich, D.F., Demmel, J.: Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures. In: 2013 IEEE International Conference on Big Data, pp. 264–272. IEEE (2013)
Schreiber, R., Van Loan, C.: A storage-efficient WY representation for products of householder transformations. SIAM J. Sci. Stat. Comput. 10(1), 53–57 (1989)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Cheng, NY., Chen, MS. (2019). Exploring Dual-Triangular Structure for Efficient R-Initiated Tall-Skinny QR on GPGPU. In: Yang, Q., Zhou, ZH., Gong, Z., Zhang, ML., Huang, SJ. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2019. Lecture Notes in Computer Science(), vol 11440. Springer, Cham. https://doi.org/10.1007/978-3-030-16145-3_45
Download citation
DOI: https://doi.org/10.1007/978-3-030-16145-3_45
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-16144-6
Online ISBN: 978-3-030-16145-3
eBook Packages: Computer ScienceComputer Science (R0)