Skip to main content

ScSSC: Semi-supervised Single Cell Clustering Based on 2D Embedding

  • Conference paper
  • First Online:
Intelligent Computing Theories and Application (ICIC 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12838))

Included in the following conference series:

Abstract

In recent years, with the development of single-cell RNA sequencing (scRNA-seq) technology, more and more scRNA-seq data has been generated. Corresponding analysis methods such as clustering analysis are also proposed, which effectively distinguish the cell types and reveal the cell diversity. However, due to more than ten thousand genes for normal species, the dimension of scRNA-seq data is very high. Meanwhile, there exist many zero counts in scRNA-seq data. They all increase the difficulty of clustering analysis of scRNA-seq data. This paper proposes ScSSC, a semi-supervised clustering method based on 2D embedding. ScSSC uses the autoencoder for pre-training to construct the network and applies the community discovery algorithm to label cells. Then a semi-supervised network is used to clustering the data after training. The clustering results of three public data sets show that ScSSC has better performance than other clustering methods.

N. Shi and Y. Wu—Contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Kulkarni, A., Anderson, A.G., Merullo, D.P., Konopka, G.: Beyond bulk: a review of single cell transcriptomics methodologies and applications. Curr. Opin. Biotechnol. 58, 129–136 (2019)

    Article  Google Scholar 

  2. Lake, B., Chen, S., Hoshi, M., Plongthongkum, N., Jain, S.: A single-nucleus RNA-sequencing pipeline to decipher the molecular anatomy and pathophysiology of human kidneys. Nat. Commun. 10(1), 2832 (2019)

    Article  Google Scholar 

  3. Lee, E.J., et al.: A validated single-cell-based strategy to identify diagnostic and therapeutic targets in complex diseases. Genome Med. 11, 47 (2019). https://doi.org/10.1186/s13073-019-0657-3

  4. Zhang, P., Yang, M., Zhang, Y., Xiao, S., Li, S.: Dissecting the single-cell transcriptome network underlying gastric premalignant lesions and early gastric cancer. Cell Rep. 27(6), 1934-1947.e1935 (2019)

    Article  Google Scholar 

  5. Mereu, E., Lafzi, A., Moutinho, C., Ziegenhain, C., Heyn, H.: Benchmarking single-cell RNA sequencing protocols for cell atlas projects. Nature Biotech. 38(6), 747–755 (2020)

    Google Scholar 

  6. Jla, B., Wca, B., Zsa, C.: Single-cell sequencing technologies: current and future. J. Genet. Genomics 41(10), 513–528 (2014)

    Article  Google Scholar 

  7. Xu, C., Su, Z.: Identification of cell types from single-cell transcriptomes using a novel clustering method. Bioinformatics 12, 1974–1980 (2015)

    Article  Google Scholar 

  8. Wang, B., Zhu, J., Pierson, E., Ramazzotti, D., Batzoglou, S.: Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning. Nat. Methods 14(4), 414 (2017)

    Article  Google Scholar 

  9. Lin, P., Troup, M., Ho, J.: CIDR: ultrafast and accurate clustering through imputation for single-cell RNA-seq data. Genome Biol. 18(1), 59 (2017)

    Article  Google Scholar 

  10. Arisdakessian, C., Poirion, O., Yunits, B., Zhu, X., Garmire, L.X.: DeepImpute: an accurate, fast, and scalable deep neural network method to impute single-cell RNA-seq data. Genome Biol. 20(1), 1–14 (2018)

    Google Scholar 

  11. Tian, T., Wan, J., Song, Q., Wei, Z.: Clustering single-cell RNA-seq data with a model-based deep learning approach. Nature Mach. Intell. 1(4), 191 (2019)

    Article  Google Scholar 

  12. Williamson, J.M., Lin, H.M., Lyles, R.H., Hightower, A.W.: Power calculations for ZIP and ZINB models. J. Data Sci. 5(4), 519–534 (2007)

    Article  Google Scholar 

  13. Yue, D., Feng, B., Dai, Q., Wu, L., Altschuler, S.: Massive single-cell RNA-seq analysis and imputation via deep learning (2018)

    Google Scholar 

  14. Li, J., Jiang, W., Han, H., Liu, J., Liu, B., Wang, Y.: ScGSLC: an unsupervised graph similarity learning framework for single-cell RNA-seq data clustering. Comput. Biol. Chem. 90, 107415 (2021)

    Google Scholar 

  15. Schwartz, G.W., Zhou, Y., Petrovic, J., Fasolino, M., Faryabi, R.B.: TooManyCells identifies and visualizes relationships of single-cell clades. Nature Methods 17(4), 1–9 (2020)

    Google Scholar 

  16. Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)

    Article  Google Scholar 

  17. Ding, J., Condon, A., Shah, S.P.: Interpretable dimensionality reduction of single cell transcriptome data with deep generative models. Nat. Commun. 9(1), 2002 (2018)

    Article  Google Scholar 

  18. Lin, C., Siddhartha, J., Hannah, K., Ziv, B.J.: Using neural networks for reducing the dimensions of single-cell RNA-Seq data. Nucleic Acids Res. 2017(17), e156 (2017)

    Google Scholar 

  19. Guo, X., Long, G., Liu, X., Yin, J.: Improved deep embedded clustering with local structure preservation. In: IJCAI (2017)

    Google Scholar 

  20. Xie, J., Girshick, R., Farhadi, A.: Unsupervised deep embedding for clustering analysis. Computer Science (2015)

    Google Scholar 

  21. Goolam, M., et al.: Heterogeneity in Oct4 and Sox2 targets biases cell fate in 4-cell mouse embryos. Cell 165(1), 61–74 (2016)

    Article  Google Scholar 

  22. Pollen, A., et al.: Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex. Nat. Biotechnol. 32, 1053–1058 (2014)

    Google Scholar 

  23. Kolodziejczyk, A.A., et al.: Single cell RNA-sequencing of pluripotent states unlocks modular transcriptional variation. Cell Stem Cell 17(4), 471–485 (2015)

    Article  Google Scholar 

  24. Nguyen, T.H., Prifti, E., Chevaleyre, Y., Sokolovska, N., Zucker, J.D.: Disease classification in metagenomics with 2D embeddings and deep learning (2018)

    Google Scholar 

  25. Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5186), 504–507 (2006)

    Google Scholar 

  26. Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. Theor. Exp. 2008(10), P10008 (2008)

    Google Scholar 

  27. Rubinstein, R.: The cross-entropy method for combinatorial and continuous optimization. Methodol. Comput. Appl. Probab. 1(2), 127–190 (1999)

    Article  MathSciNet  Google Scholar 

  28. Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3(3), 583–617 (2002)

    MathSciNet  MATH  Google Scholar 

  29. William, M.: Rand: objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)

    Article  Google Scholar 

  30. Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)

    Google Scholar 

  31. Todorov, H., Fournier, D., Gerber, S.: Principal components analysis: theory and application to gene expression data analysis. Genomics Comput. Biol. 4(2), 100041 (2018)

    Google Scholar 

  32. Macqueen, J.: Some methods for classification and analysis of MultiVariate observations. In: Proceedings of Berkeley Symposium on Mathematical Statistics & Probability (1965)

    Google Scholar 

  33. Warren, A.G., Brorson, H., Borud, L.J., Slavin, A.M.D.: A comprehensive review. Ann. Plast. Surg. 59(4), 464–472

    Google Scholar 

  34. Wang, B., Ramazzotti, D., De Sano, L., Zhu, J., Pierson, E.: SIMLR: a tool for large‐scale genomic analyses by multi‐kernel learning. Proteomics 11(3), 333 (2018)

    Google Scholar 

  35. Kiselev, V.Y., et al.: SC3: consensus clustering of single-cell RNA-seq data. Nature Methods 14(5), 483–486 (2017)

    Google Scholar 

  36. Baran, P.M., Dennis, K.: Random forest based similarity learning for single cell RNA sequencing data. Bioinformatics 13, i79–i88 (2018)

    Google Scholar 

  37. Pyramid Match Kernels: Discriminative Classification with Sets of Image Features (2006)

    Google Scholar 

  38. Grauman, K., Darrell, T.: The pyramid match kernel: discriminative classification with sets of image features. In: Tenth IEEE International Conference on Computer Vision (2005)

    Google Scholar 

  39. Laurens, V.D.M., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(2605), 2579–2605 (2008)

    MATH  Google Scholar 

Download references

Acknowledgements

This work was supported by the grants from the National Key Research Program (2017YFC1201201, 2018YFC0910504 and 2017YFC0907503), Shenzhen Science and Technology Program the university stable support program (20200821222112001) from JL.

Funding

The authors declare that they have no competing interests.

Author information

Authors and Affiliations

Authors

Contributions

NS, YW and JL designed the study, performed bioinformatics analysis and drafted the manuscript. All of the authors performed the analysis and participated in the revision of the manuscript. JL conceived of the study, participated in its design and coordination and drafted the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Junyi Li .

Editor information

Editors and Affiliations

Ethics declarations

All additional files are available at: https://github.com/NaiLeShi/-ScSSC.

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shi, N., Wu, Y., Du, L., Liu, B., Wang, Y., Li, J. (2021). ScSSC: Semi-supervised Single Cell Clustering Based on 2D Embedding. In: Huang, DS., Jo, KH., Li, J., Gribova, V., Premaratne, P. (eds) Intelligent Computing Theories and Application. ICIC 2021. Lecture Notes in Computer Science(), vol 12838. Springer, Cham. https://doi.org/10.1007/978-3-030-84532-2_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-84532-2_43

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-84531-5

  • Online ISBN: 978-3-030-84532-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics