Abstract
The development of single cell RNA sequencing (scRNA-seq) has enabled innovative approaches to investigating mRNA abundances. In our study, we are interested in extracting the systematic patterns of scRNA-seq data in an unsupervised manner, thus we have developed two extensions of robust principal component analysis (RPCA). First, we present a truncated version of RPCA (tRPCA), that is much faster and memory efficient. Second, we introduce a noise reduction in tRPCA with \(L_2\) regularization (tRPCAL2). Unlike RPCA that only considers a low-rank L and sparse S matrices, the proposed method can also extract a noise E matrix inherent in modern genomic data. We demonstrate its usefulness by applying our methods on the peripheral blood mononuclear cell (PBMC) scRNA-seq data. Particularly, the clustering of a low-rank L matrix showcases better classification of unlabeled single cells. Overall, the proposed variants are well-suited for high-dimensional and noisy data that are routinely generated in genomics.
K. Gogolewski and M. SykulskiâThese authors equally contributed to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Novelli, G., Ciccacci, C., Borgiani, P., Amati, M.P., Abadie, E.: Genetic tests and genomic biomarkers: regulation, qualification and validation. Clin. Cases Miner. Bone Metab. 5(2), 149â154 (2008)
Wills, Q.F., et al.: Single-cell gene expression analysis reveals genetic associations masked in whole-tissue experiments. Nat. Biotechnol. 31(8), 748â752 (2013)
Gogolewski, K., Wronowska, W., Lech, A., Lesyng, B., Gambin, A.: Inferring molecular processes heterogeneity from transcriptional data. Biomed Res. Int. 2017, 14 p. (2017). https://doi.org/10.1155/2017/6961786. Article no. 6961786
Wang, Y., Navin, N.E.: Advances and applications of single-cell sequencing technologies. Mol. Cell 58(4), 598â609 (2015)
Ramskold, D., et al.: Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells. Nat. Biotechnol. 30(8), 777â782 (2012)
Jolliffe, I.T.: Principal Component Analysis. Springer, Heidelberg (2002). https://doi.org/10.1007/b98835
Bartholomew, D.J., Knott, M., Moustaki, I.: Latent Variable Models and Factor Analysis: A Unified Approach. Wiley Series in Probability and Statistics (2011)
Chung, N.C., Storey, J.D.: Statistical significance of variables driving systematic variation in high-dimensional data. Bioinformatics 31(4), 545â554 (2015)
Leek, J.T.: Asymptotic conditional singular value decomposition for high-dimensional genomic data. Biometrics 67, 344â352 (2010)
Chu, L.F., et al.: Single-cell RNA-seq reveals novel regulators of human embryonic stem cell differentiation to definitive endoderm. Genome Biol. 17(1), 173 (2016)
Usoskin, D., et al.: Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing. Nat. Neurosci. 18(1), 145â153 (2015)
Ilicic, T., et al.: Classification of low quality cells from single-cell RNA-seq data. Genome Biol. 17, 29 (2016)
CandĂšs, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM 58(3), 11:1â11:37 (2011)
Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788â791 (1999)
Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. JCGS 15(2), 262â286 (2006)
Witten, D.M., Tibshirani, R., Hastie, T.: A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10(3), 515â534 (2009)
Yuan, X., Yang, J.: Sparse and Low-Rank Matrix Decomposition Via Alternating Direction Methods. optimization-online.org (2009)
Sykulski, M.: RPCA: RobustPCA: Decompose a Matrix into Low-Rank and Sparse Components (2015). R package version 0.2.3
Baglama, J., Reichel, L., Lewis, B.W.: irlba: Fast Truncated Singular Value Decomposition and Principal Components Analysis for Large Dense and Sparse Matrices (2018). R package version 2.3.2
Basu, S., Campbell, H.M., Dittel, B.N., Ray, A.: Purification of specific cell population by fluorescence activated cell sorting (FACS). J. Vis. Exp. 10(41) (2010)
Zheng, G.X., et al.: Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017)
van der Maaten, L.: Accelerating t-SNE using tree-based algorithms. J. Mach. Learn. Res. 15, 3221â3245 (2014)
Ohkawa, T., et al.: Systematic characterization of human CD8+ T cells with natural killer cell markers in comparison with natural killer cells and normal CD8+ T cells. Immunology 103(3), 281â290 (2001)
Ziegler-Heitbrock, L., et al.: Nomenclature of monocytes and dendritic cells in blood. Blood 116(16), 74â80 (2010)
Chu, P.G., Arber, D.A.: CD79: a review. Appl. Immunohistochem. Mol. Morphol. 9(2), 97â106 (2001)
Adachi, M., Ryo, R., Sato, T., Yamaguchi, N.: Platelet factor 4 gene expression in a human megakaryocytic leukemia cell line (CMK) and its differentiated subclone (CMK11-5). Exp. Hematol. 19(9), 923â927 (1991)
Acknowledgements
This work was supported by the Polish National Science Centre grant no. 2016/21/N/ST6/01507 and no. 2016/23/D/ST6/03613. The authors thank B. Miasojedow, Ph.D. for comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Gogolewski, K., Sykulski, M., Chung, N.C., Gambin, A. (2018). Truncated Robust Principal Component Analysis and Noise Reduction for Single Cell RNA-seq Data. In: Zhang, F., Cai, Z., Skums, P., Zhang, S. (eds) Bioinformatics Research and Applications. ISBRA 2018. Lecture Notes in Computer Science(), vol 10847. Springer, Cham. https://doi.org/10.1007/978-3-319-94968-0_32
Download citation
DOI: https://doi.org/10.1007/978-3-319-94968-0_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94967-3
Online ISBN: 978-3-319-94968-0
eBook Packages: Computer ScienceComputer Science (R0)