Skip to main content

Truncated Robust Principal Component Analysis and Noise Reduction for Single Cell RNA-seq Data

  • Conference paper
  • First Online:
Bioinformatics Research and Applications (ISBRA 2018)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 10847))

Included in the following conference series:

  • 1351 Accesses

Abstract

The development of single cell RNA sequencing (scRNA-seq) has enabled innovative approaches to investigating mRNA abundances. In our study, we are interested in extracting the systematic patterns of scRNA-seq data in an unsupervised manner, thus we have developed two extensions of robust principal component analysis (RPCA). First, we present a truncated version of RPCA (tRPCA), that is much faster and memory efficient. Second, we introduce a noise reduction in tRPCA with \(L_2\) regularization (tRPCAL2). Unlike RPCA that only considers a low-rank L and sparse S matrices, the proposed method can also extract a noise E matrix inherent in modern genomic data. We demonstrate its usefulness by applying our methods on the peripheral blood mononuclear cell (PBMC) scRNA-seq data. Particularly, the clustering of a low-rank L matrix showcases better classification of unlabeled single cells. Overall, the proposed variants are well-suited for high-dimensional and noisy data that are routinely generated in genomics.

K. Gogolewski and M. Sykulski—These authors equally contributed to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Novelli, G., Ciccacci, C., Borgiani, P., Amati, M.P., Abadie, E.: Genetic tests and genomic biomarkers: regulation, qualification and validation. Clin. Cases Miner. Bone Metab. 5(2), 149–154 (2008)

    Google Scholar 

  2. Wills, Q.F., et al.: Single-cell gene expression analysis reveals genetic associations masked in whole-tissue experiments. Nat. Biotechnol. 31(8), 748–752 (2013)

    Article  Google Scholar 

  3. Gogolewski, K., Wronowska, W., Lech, A., Lesyng, B., Gambin, A.: Inferring molecular processes heterogeneity from transcriptional data. Biomed Res. Int. 2017, 14 p. (2017). https://doi.org/10.1155/2017/6961786. Article no. 6961786

    Article  Google Scholar 

  4. Wang, Y., Navin, N.E.: Advances and applications of single-cell sequencing technologies. Mol. Cell 58(4), 598–609 (2015)

    Article  Google Scholar 

  5. Ramskold, D., et al.: Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells. Nat. Biotechnol. 30(8), 777–782 (2012)

    Article  Google Scholar 

  6. Jolliffe, I.T.: Principal Component Analysis. Springer, Heidelberg (2002). https://doi.org/10.1007/b98835

    Book  MATH  Google Scholar 

  7. Bartholomew, D.J., Knott, M., Moustaki, I.: Latent Variable Models and Factor Analysis: A Unified Approach. Wiley Series in Probability and Statistics (2011)

    Google Scholar 

  8. Chung, N.C., Storey, J.D.: Statistical significance of variables driving systematic variation in high-dimensional data. Bioinformatics 31(4), 545–554 (2015)

    Article  Google Scholar 

  9. Leek, J.T.: Asymptotic conditional singular value decomposition for high-dimensional genomic data. Biometrics 67, 344–352 (2010)

    Article  MathSciNet  Google Scholar 

  10. Chu, L.F., et al.: Single-cell RNA-seq reveals novel regulators of human embryonic stem cell differentiation to definitive endoderm. Genome Biol. 17(1), 173 (2016)

    Article  Google Scholar 

  11. Usoskin, D., et al.: Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing. Nat. Neurosci. 18(1), 145–153 (2015)

    Article  Google Scholar 

  12. Ilicic, T., et al.: Classification of low quality cells from single-cell RNA-seq data. Genome Biol. 17, 29 (2016)

    Article  Google Scholar 

  13. Candùs, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM 58(3), 11:1–11:37 (2011)

    Article  MathSciNet  Google Scholar 

  14. Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)

    Article  Google Scholar 

  15. Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. JCGS 15(2), 262–286 (2006)

    MathSciNet  Google Scholar 

  16. Witten, D.M., Tibshirani, R., Hastie, T.: A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10(3), 515–534 (2009)

    Article  Google Scholar 

  17. Yuan, X., Yang, J.: Sparse and Low-Rank Matrix Decomposition Via Alternating Direction Methods. optimization-online.org (2009)

    Google Scholar 

  18. Sykulski, M.: RPCA: RobustPCA: Decompose a Matrix into Low-Rank and Sparse Components (2015). R package version 0.2.3

    Google Scholar 

  19. Baglama, J., Reichel, L., Lewis, B.W.: irlba: Fast Truncated Singular Value Decomposition and Principal Components Analysis for Large Dense and Sparse Matrices (2018). R package version 2.3.2

    Google Scholar 

  20. Basu, S., Campbell, H.M., Dittel, B.N., Ray, A.: Purification of specific cell population by fluorescence activated cell sorting (FACS). J. Vis. Exp. 10(41) (2010)

    Google Scholar 

  21. Zheng, G.X., et al.: Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017)

    Article  Google Scholar 

  22. van der Maaten, L.: Accelerating t-SNE using tree-based algorithms. J. Mach. Learn. Res. 15, 3221–3245 (2014)

    MathSciNet  MATH  Google Scholar 

  23. Ohkawa, T., et al.: Systematic characterization of human CD8+ T cells with natural killer cell markers in comparison with natural killer cells and normal CD8+ T cells. Immunology 103(3), 281–290 (2001)

    Article  Google Scholar 

  24. Ziegler-Heitbrock, L., et al.: Nomenclature of monocytes and dendritic cells in blood. Blood 116(16), 74–80 (2010)

    Article  Google Scholar 

  25. Chu, P.G., Arber, D.A.: CD79: a review. Appl. Immunohistochem. Mol. Morphol. 9(2), 97–106 (2001)

    Google Scholar 

  26. Adachi, M., Ryo, R., Sato, T., Yamaguchi, N.: Platelet factor 4 gene expression in a human megakaryocytic leukemia cell line (CMK) and its differentiated subclone (CMK11-5). Exp. Hematol. 19(9), 923–927 (1991)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the Polish National Science Centre grant no. 2016/21/N/ST6/01507 and no. 2016/23/D/ST6/03613. The authors thank B. Miasojedow, Ph.D. for comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Krzysztof Gogolewski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gogolewski, K., Sykulski, M., Chung, N.C., Gambin, A. (2018). Truncated Robust Principal Component Analysis and Noise Reduction for Single Cell RNA-seq Data. In: Zhang, F., Cai, Z., Skums, P., Zhang, S. (eds) Bioinformatics Research and Applications. ISBRA 2018. Lecture Notes in Computer Science(), vol 10847. Springer, Cham. https://doi.org/10.1007/978-3-319-94968-0_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-94968-0_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-94967-3

  • Online ISBN: 978-3-319-94968-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics