Skip to main content
Log in

EinImpute: a local and gene-based approach to imputation of dropout events in ScRNA-seq data

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Single cell RNA sequencing (scRNA-seq) technology is recognized as a new technology that enables the analysis of the structure of every cell in different tissues with high resolution. Typically, the results of the scRNA-seq experiment consist of a large volume of randomly-generated data with biological and technical noise. Dropout events are an example of these noises where no RNA is measured for a gene that is truly expressed in the cell. This causes non-zero entries to appear as zero in the gene expression matrix. This paper presents a novel method to imputation of dropout events that improves the final results of visualizing and identifying cell populations in scRNA-seq data. The proposed method, which we called EinImpute, is a local method that first clusters the cells. Next, independently in each cluster, a KNN graph of genes is generated that discovers potential trajectories between genes. Finally, using these gene networks and linear regression models, data reconstruction is performed and the dropout events are estimated. The overall improvement rate obtained by EinImpute based on different clustering methods was 16.57% on average. Among other imputation methods, DrImpute ranked the second with an improvement rate of 4.04%. Also, the visualization results using tSNE show that the EinImpute compared to other well-known imputation method can represent cells with a far better quality through visualization techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Afshar S, Mosleh M, Kheyrandish M (2013) Presenting a new multiclass classifier based on learning automata. Neurocomputing 104:97–104

    Article  Google Scholar 

  • Arthur D, Vassilvitskii S (2007) K-means++: The advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '07, Philadelphia, PA, USA, pp 1027–1035

  • Buettner F, Natarajan K, Casale F, Proserpio V, Scialdone A, Theis F et al (2015) Computational analysis of cell-to-cell heterogeneity in single-cell RNA-Sequencing data reveals hidden subpopulations of cells. Nat Biotech 33(2):155–160

    Article  Google Scholar 

  • Dijk D, Sharma R, Nainys J, Yim K, Kathail P, Carr AJ et al (2018) MAGIC: recovering gene interaction from single-cell data using data diffusion. Cell 174(3):716–729

    Article  Google Scholar 

  • Einipour A, Mosleh M, Ansari-Asl K (2020a) FSPAM: a feature construction method to identifying cell populations in ScRNA-seq data. CMES 122(1):377–397

    Article  Google Scholar 

  • Einipour A, Mosleh M, Ansari-Asl K (2020b) A graph-based clustering approach to identify cell populations in single-Cell RNA sequencing data. J Health Biomed Inform 7(1):60–72

    Google Scholar 

  • Gong W, Kwak I, Koyano-Nakagawa N, Garry D (2018) Drimpute: imputing dropout events in single cell RNA sequencing data. BMC Bioinformatics 19:220

    Article  Google Scholar 

  • Grün D, Lyubimova A, Kester L, Wiebrands K, Basak O, Sasaki N et al (2015) Single-cell messenger RNA sequencing reveals rare intestinal cell types. Nature 525(7568):251–255

    Article  Google Scholar 

  • Guo M, Wang H, Potter SS, Whitsett JA, Xu Y (2015) Sincera: a pipeline for single-cell RNA-seq profiling analysis. PLoS Comput Biology 11(11):e1004575

    Article  Google Scholar 

  • Kiselev VY, Kirschner K, Schaub MT, Andrews T, Yiu A, Chandra T et al (2017) SC3: consensus clustering of single-cell RNA-seq data. Nat Methods 14(5):483–486

    Article  Google Scholar 

  • Kolodziejczyk AA, Kim JK, Tsang JC, Ilicic T, Henriksson J, Natarajan KN et al (2015) Single cell rna-sequencing of pluripotent states unlocks modular transcriptional variation. Cell Stem Cell 17(4):471–485

    Article  Google Scholar 

  • Li WV, Li JJ (2018) An accurate and robust imputation method scImpute for single cell rna-seq data. Nat Commun 9(1):997

    Article  Google Scholar 

  • Lin P, Troup M, Ho JW (2017) CIDR: Ultrafast and accurate clustering through imputation for single-cell RNA-seq data. Genome Biol 18(1):59

    Article  Google Scholar 

  • Liu SJ, Nowakowski TJ, Pollen AA, Lui JH, Horlbeck MA, Attenello FJ et al (2016) Single-cell analysis of long non-coding RNAs in the developing human neocortex. Genome Biol 17:67

    Article  Google Scholar 

  • Maaten LVD, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605

    MATH  Google Scholar 

  • Markus M, Matthias H, von Ulrike L (2009) Optimal construction of k-nearest-neighbor graphs for identifying noisy clusters. Theoret Comput Sci 410(19):1749–1764

    Article  MathSciNet  MATH  Google Scholar 

  • Nelson AC, Mould AW, Bikoff EK, Robertson EJ (2016) Single-cell RNA-seq reveals cell type-specific transcriptional signatures at the maternal–foetal interface during pregnancy. Nat Commun 7:11414

    Article  Google Scholar 

  • Pierson E, Yau C (2015) ZIFA: dimensionality reduction for zero-inflated single-cell gene expression analysis. Genome Biol 16(241):1–10

    Google Scholar 

  • Pouyan MB, Jindal V, Birjandtalab J, Nourani M (2016a) Single and multi-subject clustering of flow cytometry data for cell-type identification and anomaly detection. BMC Med Genomics 9:41

    Article  Google Scholar 

  • Pouyan MB, Jindal V, Nourani M (2016b) Clinical outcome prediction using single-cell data. IEEE Trans Biomed Circuits Syst 10(5):1012–1022

    Article  Google Scholar 

  • Prabhakaran S, Azizi E, Carr A, Peer D (2016) Dirichlet process mixture model for correcting technical variation in single-cell gene expression data. JMLR Workshop Conf Proc 48:1070–1079

    Google Scholar 

  • Segerstolpe Å, Palasantza A, Eliasson P, Andersson EM, Andreasson AC, Sun X et al (2016) Single-cell transcriptome profiling of human pancreatic islets in health and type 2 diabetes. Cell Metab 24(4):593–607

    Article  Google Scholar 

  • Tang F, Barbacioru C, Wang Y, Nordman E, Lee C, Xu N et al (2009) mRNA-Seq whole-transcriptome analysis of a single cell. Nat Methods 6(5):377–382

    Article  Google Scholar 

  • Tasic B, Menon V, Nguyen TN, Kim TK, Jarsky T, Yao Z et al (2016) Adult mouse cortical cell taxonomy revealed by single cell transcriptomics. Nature Neurosci 19(2):335–346

    Article  Google Scholar 

  • Usoskin D, Furlan A, Islam S, Abdo H, Lӧnnerberg P, Lou D et al (2015) Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing. Nat Neurosci 18(1):145–153

    Article  Google Scholar 

  • Villani AC, Satija R, Reynolds G, Sarkizova S, Shekhar K, Fletcher J et al (2017) Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes, and progenitors. Science 356(6335):eaah4573

    Article  Google Scholar 

  • Wang B, Mezlini AM, Demir F, Fiume M, Tu Z, Brudno M et al (2014) Similarity network fusion for aggregating data types on a genomic scale. Nat Methods 11(3):333–337

    Article  Google Scholar 

  • Xu C, Su Z (2015) Identification of cell types from single-cell transcriptomes using a novel clustering method. Bioinformatics 31(12):1974–1980

    Article  Google Scholar 

  • Yanglan G, Ning L, Guobing Z, Yongchang X, Jihong G (2018) Identification of cancer subtypes from single-cell RNA-seq data using a consensus clustering method. BMC Med Genomics 11(Suppl 6):117

    Google Scholar 

  • Zhang L, Zhang S (2020) Comparison of computational methods for imputing single-cell RNA-sequencing data. IEEE/ACM Trans Comput Biol Bioinform 17(2):376–389

    Google Scholar 

  • Žurauskienė J, Yau C (2016) pcaReduce: hierarchical clustering of single cell transcriptional profiles. BMC Bioinformatics 17:140

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Mosleh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Einipour, A., Mosleh, M. & Ansari-Asl, K. EinImpute: a local and gene-based approach to imputation of dropout events in ScRNA-seq data. J Ambient Intell Human Comput 14, 3225–3237 (2023). https://doi.org/10.1007/s12652-021-03463-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-021-03463-8

Keywords

Navigation