Skip to main content

Recent Machine Learning Approaches for Single-Cell RNA-seq Data Analysis

  • Chapter
  • First Online:

Part of the book series: Studies in Computational Intelligence ((SCI,volume 891))

Abstract

DNA sequencing has become an extremely popular assay with researchers claiming that in the distant future, the DNA sequencing impact will be equal to the microscope impact. Single-cell RNA-seq (scRNA-seq) is an emerging DNA-sequencing technology with promising capabilities, but with major computational challenges due to the large-scaled generated data. Given the fact that sequencing costs are constantly decreasing, the volume and complexity of the data generated by these technologies will be constantly increasing. Toward this direction, major computational challenges are posed at the cell level, in particular, when focusing on the ultra-high dimensionality aspect of the scRNA-seq data. The main challenges are related to three pillars of machine learning (ML) analysis, classification, clustering, and visualization methods. Although there has been remarkable progress in ML methods for single-cell RNA-seq data analysis, numerous questions are still unresolved. This review records the state-of-the-art classification, clustering, and visualization methods tailored for single-cell transcriptomics data.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Achlioptas, D.: Database-friendly random projections. In: Proceedings of the Twentieth ACM Symposium on Principles of Database Systems, pp. 274–281. ACM Press (2001)

    Google Scholar 

  2. Amir, E.A.D., Davis, K.L., Tadmor, M.D., Simonds, E.F., Levine, J.H., Bendall, S.C., Shenfeld, D.K., Krishnaswamy, S., Nolan, G.P., Pe’er, D.: ViSVE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia. Nat. Biotechnol. 31(6), 545 (2013)

    Article  Google Scholar 

  3. Andreu-Perez, J., Poon, C.C., Merrifield, R.D., Wong, S.T., Yang, G.Z.: Big data for health. IEEE J. Biomed. Health Inf. 19(4), 1193–1208 (2015)

    Article  Google Scholar 

  4. Andrews, T.S., Hemberg, M.: Identifying cell populations with scRNASeq. Mol. Aspects Med. 59, 114–122 (2018)

    Article  Google Scholar 

  5. Angerer, P., Simon, L., Tritschler, S., Wolf, F.A., Fischer, D., Theis, F.J.: Single cells make big data: new challenges and opportunities in transcriptomics. Curr. Opin. Syst. Biol. 4, 85–91 (2017)

    Article  Google Scholar 

  6. Becht, E., McInnes, L., Healy, J., Dutertre, C.A., Kwok, I.W., Ng, L.G., Ginhoux, F., Newell, E.W.: Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37(1), 38 (2019)

    Article  Google Scholar 

  7. Behbehani, G.K., Bendall, S.C., Clutter, M.R., Fantl, W.J., Nolan, G.P.: Single-cell mass cytometry adapted to measurements of the cell cycle. Cytometry Part A 81(7), 552–566 (2012)

    Article  Google Scholar 

  8. Bendall, S.C., Davis, K.L., Amir, E.A.D., Tadmor, M.D., Simonds, E.F., Chen, T.J., Shenfeld, D.K., Nolan, G.P., Pe’er, D.: Single-cell trajectory detection uncovers progression and regulatory coordination in human B cell development. Cell 157(3), 714–725 (2014)

    Article  Google Scholar 

  9. Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 245–250. ACM Press (2001)

    Google Scholar 

  10. Brown, M.P., Grundy, W.N., Lin, D., Cristianini, N., Sugnet, C.W., Furey, T.S., Ares, M., Haussler, D.: Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl. Acad. Sci. 97(1), 262–267 (2000)

    Article  Google Scholar 

  11. Buettner, F., Natarajan, K.N., Casale, F.P., Proserpio, V., Scialdone, A., Theis, F.J., Teichmann, S.A., Marioni, J.C., Stegle, O.: Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nat. Biotechnol. 33(2), 155 (2015)

    Article  Google Scholar 

  12. Camara, P.G.: Methods and challenges in the analysis of single-cell RNA-sequencing data. Curr. Opin. Syst. Biol. 7, 47–53 (2018)

    Article  Google Scholar 

  13. Cannings, T.I., Samworth, R.J.: Random projection ensemble classification. J. R. Stat. Soc. Ser. B Stat. Methodol. 79(4), 959–1035 (2017). https://doi.org/10.1111/rssb.12228. https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/rssb.12228

  14. Chen, J., Schlitzer, A., Chakarov, S., Ginhoux, F., Poidinger, M.: Mpath maps multi-branching single-cell trajectories revealing progenitor cell progression during development. Nat. Commun. 7, 11988 (2016)

    Article  Google Scholar 

  15. Cokus, S.J., Feng, S., Zhang, X., Chen, Z., Merriman, B., Haudenschild, C.D., Pradhan, S., Nelson, S.F., Pellegrini, M., Jacobsen, S.E.: Shotgun bisulphite sequencing of the arabidopsis genome reveals DNA methylation patterning. Nature 452(7184), 215 (2008)

    Article  Google Scholar 

  16. Dimitrakopoulou, K., Vrahatis, A.G., Wilk, E., Tsakalidis, A.K., Bezerianos, A.: Olympus: an automated hybrid clustering method in time series gene expression. Case study: host response after influenza a (H1N1) infection. Comput. Methods Prog. Biomed. 111(3), 650–661 (2013)

    Article  Google Scholar 

  17. Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc. 97(457), 77–87 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  18. Eberwine, J., Sul, J.Y., Bartfai, T., Kim, J.: The promise of single-cell sequencing. Nat. Methods 11(1), 25 (2014)

    Article  Google Scholar 

  19. Fonseca, N.A., Rung, J., Brazma, A., Marioni, J.C.: Tools for mapping high-throughput sequencing data. Bioinformatics 28(24), 3169–3177 (2012)

    Article  Google Scholar 

  20. Ghahramani, A., Watt, F.M., Luscombe, N.M.: Generative adversarial networks uncover epidermal regulators and predict single cell perturbations. bioRxiv, p. 262501 (2018)

    Google Scholar 

  21. Gross, A., Schoendube, J., Zimmermann, S., Steeb, M., Zengerle, R., Koltay, P.: Technologies for single-cell isolation. Int. J. Mol. Sci. 16(8), 16897–16919 (2015)

    Article  Google Scholar 

  22. Grün, D., Lyubimova, A., Kester, L., Wiebrands, K., Basak, O., Sasaki, N., Clevers, H., van Oudenaarden, A.: Single-cell messenger RNA sequencing reveals rare intestinal cell types. Nature 525(7568), 251 (2015)

    Article  Google Scholar 

  23. Guo, M., Wang, H., Potter, S.S., Whitsett, J.A., Xu, Y.: Sincera: a pipeline for single-cell RNA-seq profiling analysis. PLoS Comput. Biol. 11(11), e1004575 (2015)

    Article  Google Scholar 

  24. Hedlund, E., Deng, Q.: Single-cell RNA sequencing: technical advancements and biological applications. Mol. Aspects Med. 59, 36–46 (2018)

    Article  Google Scholar 

  25. Huang, X., Liu, S., Wu, L., Jiang, M., Hou, Y.: High throughput single cell RNA sequencing, bioinformatics analysis and applications. In: Single cell biomedicine, pp. 33–43. Springer (2018)

    Google Scholar 

  26. Hwang, B., Lee, J.H., Bang, D.: Single-cell RNA sequencing technologies and bioinformatics pipelines. Exp. Mol. Med. 50(8), 96 (2018)

    Article  Google Scholar 

  27. Ilicic, T., Kim, J.K., Kolodziejczyk, A.A., Bagger, F.O., McCarthy, D.J., Marioni, J.C., Teichmann, S.A.: Classification of low quality cells from single-cell RNA-seq data. Genome Biol. 17(1), 29 (2016)

    Article  Google Scholar 

  28. Jiang, L., Chen, H., Pinello, L., Yuan, G.C.: Giniclust: detecting rare cell types from single-cell gene expression data with gini index. Genome Biol. 17(1), 144 (2016)

    Article  Google Scholar 

  29. Kanter, I., Kalisky, T.: Single cell transcriptomics: methods and applications. Front. Oncol. 5, 53 (2015)

    Article  Google Scholar 

  30. Khalfaoui, B., Vert, J.P.: Droplasso: a robust variant of lasso for single cell RNA-seq data. arXiv preprint arXiv:1802.09381 (2018)

  31. Kharchenko, P.V., Silberstein, L., Scadden, D.T.: Bayesian approach to single-cell differential expression analysis. Nat. Methods 11(7), 740 (2014)

    Article  Google Scholar 

  32. Kiselev, V.Y., Andrews, T.S., Hemberg, M.: Challenges in unsupervised clustering of single-cell RNA-seq data. Nat. Rev. Gen., 1 (2019)

    Google Scholar 

  33. Kiselev, V.Y., Kirschner, K., Schaub, M.T., Andrews, T., Yiu, A., Chandra, T., Natarajan, K.N., Reik, W., Barahona, M., Green, A.R., et al.: SC3: consensus clustering of single-cell RNA-seq data. Nat. Methods 14(5), 483 (2017)

    Article  Google Scholar 

  34. Kobak, D., Berens, P.: The art of using t-SNE for single-cell transcriptomics. bioRxiv, p. 453449 (2018)

    Google Scholar 

  35. Kolodziejczyk, A.A., Kim, J.K., Svensson, V., Marioni, J.C., Teichmann, S.A.: The technology and biology of single-cell RNA sequencing. Mol. Cell 58(4), 610–620 (2015)

    Article  Google Scholar 

  36. Kumar, A., Sabharwal, Y., Sen, S.: A simple linear time \((1+ {\epsilon } )\) -approximation algorithm for k-means clustering in any dimensions. In: 2013 IEEE 54th Annual Symposium on Foundations of Computer Science 0, 454–462. http://doi.ieeecomputersociety.org/10.1109/FOCS.2004.7 (2004)

  37. Lieberman-Aiden, E., Van Berkum, N.L., Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B.R., Sabo, P.J., Dorschner, M.O., et al.: Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326(5950), 289–293 (2009)

    Article  Google Scholar 

  38. Liu, S., Maljovec, D., Wang, B., Bremer, P.T., Pascucci, V.: Visualizing high-dimensional data: advances in the past decade. IEEE Trans. Visual Comput. Graphics 23(3), 1249–1268 (2017)

    Article  Google Scholar 

  39. Luo, J., Wu, M., Gopukumar, D., Zhao, Y.: Big data application in biomedical research and health care: a literature review. Biomed. Inform. Insights 8, BII-S31559 (2016)

    Article  Google Scholar 

  40. Maaten, L.V.D., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)

    MATH  Google Scholar 

  41. MacArthur, J., Bowler, E., Cerezo, M., Gil, L., Hall, P., Hastings, E., Junkins, H., McMahon, A., Milano, A., Morales, J., et al.: The new NHGRI-EBI catalog of published genome-wide association studies (GWAS catalog). Nucleic Acids Res. 45(D1), D896–D901 (2016)

    Article  Google Scholar 

  42. Macosko, E.Z., Basu, A., Satija, R., Nemesh, J., Shekhar, K., Goldman, M., Tirosh, I., Bialas, A.R., Kamitaki, N., Martersteck, E.M., et al.: Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161(5), 1202–1214 (2015)

    Article  Google Scholar 

  43. Mardis, E.R.: DNA sequencing technologies: 2006–2016. Nat. Protoc. 12(2), 213 (2017)

    Article  Google Scholar 

  44. McInnes, L., Healy, J., Melville, J.: UMAP: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018)

  45. Moussa, M., Măndoiu, I.I.: Single cell RNA-seq data clustering using TF-IDF based methods. BMC Genom. 19(6), 127 (2018)

    Google Scholar 

  46. Nusrat, S., Harbig, T., Gehlenborg, N.: Tasks, techniques, and tools for genomic data visualization. arXiv preprint arXiv:1905.02853 (2019)

  47. Ozsolak, F., Milos, P.M.: RNA sequencing: advances, challenges and opportunities. Nat. Rev. Genet. 12(2), 87 (2011)

    Article  Google Scholar 

  48. Papadimitriou, C.H., Raghavan, P., Tamaki, H., Vempala, S.: Latent semantic indexing: a probabilistic analysis. In: Proceedings of the 17th ACM Symposium on the Principles of Database Systems, pp. 159–168 (1998)

    Google Scholar 

  49. Park, P.J.: Chip-seq: advantages and challenges of a maturing technology. Nat. Rev. Genet. 10(10), 669 (2009)

    Article  Google Scholar 

  50. Pennisi, E.: Will computers crash genomics? (2011)

    Google Scholar 

  51. Pierson, E., Yau, C.: ZIFA: dimensionality reduction for zero-inflated single-cell gene expression analysis. Genome Biol. 16(1), 241 (2015)

    Article  Google Scholar 

  52. Poirion, O.B., Zhu, X., Ching, T., Garmire, L.: Single-cell transcriptomics bioinformatics and computational challenges. Front. Genet. 7, 163 (2016)

    Article  Google Scholar 

  53. Popescu, M., Keller, J.M.: Random projections fuzzy k-nearest neighbor (RPFKNN) for big data classification. In: 2016 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 1813–1817. IEEE (2016)

    Google Scholar 

  54. Qu, Z., Lau, C.W., Nguyen, Q.V., Zhou, Y., Catchpoole, D.R.: Visual analytics of genomic and cancer data: a systematic review. Cancer Inf. 18, 1176935119835546 (2019)

    Google Scholar 

  55. Regev, A., Teichmann, S.A., Lander, E.S., Amit, I., Benoist, C., Birney, E., Bodenmiller, B., Campbell, P., Carninci, P., Clatworthy, M., et al.: Science forum: the human cell atlas. Elife 6, e27041 (2017)

    Article  Google Scholar 

  56. Reuter, J.A., Spacek, D.V., Snyder, M.P.: High-throughput sequencing technologies. Mol. Cell 58(4), 586–597 (2015)

    Article  Google Scholar 

  57. Rostom, R., Svensson, V., Teichmann, S.A., Kar, G.: Computational approaches for interpreting SCRNA-seq data. FEBS Lett. 591(15), 2213–2225 (2017)

    Article  Google Scholar 

  58. Scialdone, A., Natarajan, K.N., Saraiva, L.R., Proserpio, V., Teichmann, S.A., Stegle, O., Marioni, J.C., Buettner, F.: Computational assignment of cell-cycle stage from single-cell transcriptome data. Methods 85, 54–61 (2015)

    Article  Google Scholar 

  59. Setty, M., Tadmor, M.D., Reich-Zeliger, S., Angel, O., Salame, T.M., Kathail, P., Choi, K., Bendall, S., Friedman, N., Pe’er, D.: Wishbone identifies bifurcating developmental trajectories from single-cell data. Nat. Biotechnol. 34(6), 637 (2016)

    Article  Google Scholar 

  60. Shapiro, E., Biezuner, T., Linnarsson, S.: Single-cell sequencing-based technologies will revolutionize whole-organism science. Nat. Rev. Genet. 14(9), 618 (2013)

    Article  Google Scholar 

  61. Shendure, J., Balasubramanian, S., Church, G.M., Gilbert, W., Rogers, J., Schloss, J.A., Waterston, R.H.: DNA sequencing at 40: past, present and future. Nature 550(7676), 345 (2017)

    Article  Google Scholar 

  62. Stegle, O., Teichmann, S.A., Marioni, J.C.: Computational and analytical challenges in single-cell transcriptomics. Nat. Rev. Genet. 16(3), 133 (2015)

    Article  Google Scholar 

  63. Svensson, V., Vento-Tormo, R., Teichmann, S.A.: Exponential scaling of single-cell RNA-seq in the past decade. Nat. Protoc. 13(4), 599 (2018)

    Article  Google Scholar 

  64. Tang, F., Barbacioru, C., Wang, Y., Nordman, E., Lee, C., Xu, N., Wang, X., Bodeau, J., Tuch, B.B., Siddiqui, A., et al.: mRNA-seq whole-transcriptome analysis of a single cell. Nat. Methods 6(5), 377 (2009)

    Article  Google Scholar 

  65. Tang, J., Liu, J., Zhang, M., Mei, Q.: Visualizing large-scale and high-dimensional data. In: Proceedings of the 25th International Conference on World wide web, pp. 287–297. International World Wide Web Conferences Steering Committee (2016)

    Google Scholar 

  66. Tasoulis, S.K., Vrahatis, A.G., Georgakopoulos, S.V., Plagianakos, V.P.: Biomedical data ensemble classification using random projections. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 166–172 (2018). https://doi.org/10.1109/BigData.2018.8622606

  67. Tasoulis, S.K., Vrahatis, A.G., Georgakopoulos, S.V., Plagianakos, V.P.: Visualizing high-dimensional single-cell RNA-sequencing data through multiple random projections. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 5448–5450. IEEE (2018)

    Google Scholar 

  68. Todorov, H., Saeys, Y.: Computational approaches for high-throughput single-cell data analysis. FEBS J. 286(8), 1451–1467 (2018)

    Article  Google Scholar 

  69. Van Dijk, D., Sharma, R., Nainys, J., Yim, K., Kathail, P., Carr, A.J., Burdziak, C., Moon, K.R., Chaffer, C.L., Pattabiraman, D., et al.: Recovering gene interactions from single-cell data using data diffusion. Cell 174(3), 716–729 (2018)

    Article  Google Scholar 

  70. Vrahatis, A.G., Tasoulis, S.K., Dimitrakopoulos, G.N., Plagianakos, V.P.: Visualizing high-dimensional single-cell RNA-seq data via random projections and geodesic distances. In: 2019 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), pp. 1–6. IEEE (2019)

    Google Scholar 

  71. Wang, B., Zhu, J., Pierson, E., Ramazzotti, D., Batzoglou, S.: Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning. Nat. Methods 14(4), 414 (2017)

    Article  Google Scholar 

  72. Weinreb, C., Wolock, S., Klein, A.M.: Spring: a kinetic interface for visualizing high dimensional single-cell expression data. Bioinformatics 34(7), 1246–1248 (2017)

    Article  Google Scholar 

  73. Wetterstrand, K.A.: DNA sequencing costs: data from the NHGRI genome sequencing program (GSP). 2013. http://www.genome.gov/sequencingcosts (2016)

  74. Witten, D.M., et al.: Classification and clustering of sequencing data using a poisson model. Ann. Appl. Stat. 5(4), 2493–2518 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  75. Wolf, F.A., Angerer, P., Theis, F.J.: Scanpy: large-scale single-cell gene expression data analysis. Genome Biol. 19(1), 15 (2018)

    Article  Google Scholar 

  76. Wu, Y., Tamayo, P., Zhang, K.: Visualizing and interpreting single-cell gene expression datasets with similarity weighted nonnegative embedding. Cell Syst. 7(6), 656–666 (2018)

    Article  Google Scholar 

  77. Xu, C., Su, Z.: Identification of cell types from single-cell transcriptomes using a novel clustering method. Bioinformatics 31(12), 1974–1980 (2015)

    Article  Google Scholar 

  78. Zhao, Y., Tasoulis, S., Roos, T.: Manifold visualization via short walks. In: Proceedings of the Eurographics/IEEE VGTC Conference on Visualization: Short Papers, pp. 85–89. Eurographics Association (2016)

    Google Scholar 

Download references

Acknowledgements

This project has received funding from the Hellenic Foundation for Research and Innovation (HFRI) and the General Secretariat for Research and Technology (GSRT), under grant agreement No 1901.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aristidis G. Vrahatis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer-Verlag GmbH Germany, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Vrahatis, A.G., Tasoulis, S.K., Maglogiannis, I., Plagianakos, V.P. (2020). Recent Machine Learning Approaches for Single-Cell RNA-seq Data Analysis. In: Maglogiannis, I., Brahnam, S., Jain, L. (eds) Advanced Computational Intelligence in Healthcare-7. Studies in Computational Intelligence, vol 891. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-61114-2_5

Download citation

Publish with us

Policies and ethics