Abstract
Single cell RNA sequencing (scRNA-seq) technology is recognized as a new technology that enables the analysis of the structure of every cell in different tissues with high resolution. Typically, the results of the scRNA-seq experiment consist of a large volume of randomly-generated data with biological and technical noise. Dropout events are an example of these noises where no RNA is measured for a gene that is truly expressed in the cell. This causes non-zero entries to appear as zero in the gene expression matrix. This paper presents a novel method to imputation of dropout events that improves the final results of visualizing and identifying cell populations in scRNA-seq data. The proposed method, which we called EinImpute, is a local method that first clusters the cells. Next, independently in each cluster, a KNN graph of genes is generated that discovers potential trajectories between genes. Finally, using these gene networks and linear regression models, data reconstruction is performed and the dropout events are estimated. The overall improvement rate obtained by EinImpute based on different clustering methods was 16.57% on average. Among other imputation methods, DrImpute ranked the second with an improvement rate of 4.04%. Also, the visualization results using tSNE show that the EinImpute compared to other well-known imputation method can represent cells with a far better quality through visualization techniques.
Similar content being viewed by others
References
Afshar S, Mosleh M, Kheyrandish M (2013) Presenting a new multiclass classifier based on learning automata. Neurocomputing 104:97–104
Arthur D, Vassilvitskii S (2007) K-means++: The advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '07, Philadelphia, PA, USA, pp 1027–1035
Buettner F, Natarajan K, Casale F, Proserpio V, Scialdone A, Theis F et al (2015) Computational analysis of cell-to-cell heterogeneity in single-cell RNA-Sequencing data reveals hidden subpopulations of cells. Nat Biotech 33(2):155–160
Dijk D, Sharma R, Nainys J, Yim K, Kathail P, Carr AJ et al (2018) MAGIC: recovering gene interaction from single-cell data using data diffusion. Cell 174(3):716–729
Einipour A, Mosleh M, Ansari-Asl K (2020a) FSPAM: a feature construction method to identifying cell populations in ScRNA-seq data. CMES 122(1):377–397
Einipour A, Mosleh M, Ansari-Asl K (2020b) A graph-based clustering approach to identify cell populations in single-Cell RNA sequencing data. J Health Biomed Inform 7(1):60–72
Gong W, Kwak I, Koyano-Nakagawa N, Garry D (2018) Drimpute: imputing dropout events in single cell RNA sequencing data. BMC Bioinformatics 19:220
Grün D, Lyubimova A, Kester L, Wiebrands K, Basak O, Sasaki N et al (2015) Single-cell messenger RNA sequencing reveals rare intestinal cell types. Nature 525(7568):251–255
Guo M, Wang H, Potter SS, Whitsett JA, Xu Y (2015) Sincera: a pipeline for single-cell RNA-seq profiling analysis. PLoS Comput Biology 11(11):e1004575
Kiselev VY, Kirschner K, Schaub MT, Andrews T, Yiu A, Chandra T et al (2017) SC3: consensus clustering of single-cell RNA-seq data. Nat Methods 14(5):483–486
Kolodziejczyk AA, Kim JK, Tsang JC, Ilicic T, Henriksson J, Natarajan KN et al (2015) Single cell rna-sequencing of pluripotent states unlocks modular transcriptional variation. Cell Stem Cell 17(4):471–485
Li WV, Li JJ (2018) An accurate and robust imputation method scImpute for single cell rna-seq data. Nat Commun 9(1):997
Lin P, Troup M, Ho JW (2017) CIDR: Ultrafast and accurate clustering through imputation for single-cell RNA-seq data. Genome Biol 18(1):59
Liu SJ, Nowakowski TJ, Pollen AA, Lui JH, Horlbeck MA, Attenello FJ et al (2016) Single-cell analysis of long non-coding RNAs in the developing human neocortex. Genome Biol 17:67
Maaten LVD, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605
Markus M, Matthias H, von Ulrike L (2009) Optimal construction of k-nearest-neighbor graphs for identifying noisy clusters. Theoret Comput Sci 410(19):1749–1764
Nelson AC, Mould AW, Bikoff EK, Robertson EJ (2016) Single-cell RNA-seq reveals cell type-specific transcriptional signatures at the maternal–foetal interface during pregnancy. Nat Commun 7:11414
Pierson E, Yau C (2015) ZIFA: dimensionality reduction for zero-inflated single-cell gene expression analysis. Genome Biol 16(241):1–10
Pouyan MB, Jindal V, Birjandtalab J, Nourani M (2016a) Single and multi-subject clustering of flow cytometry data for cell-type identification and anomaly detection. BMC Med Genomics 9:41
Pouyan MB, Jindal V, Nourani M (2016b) Clinical outcome prediction using single-cell data. IEEE Trans Biomed Circuits Syst 10(5):1012–1022
Prabhakaran S, Azizi E, Carr A, Peer D (2016) Dirichlet process mixture model for correcting technical variation in single-cell gene expression data. JMLR Workshop Conf Proc 48:1070–1079
Segerstolpe Å, Palasantza A, Eliasson P, Andersson EM, Andreasson AC, Sun X et al (2016) Single-cell transcriptome profiling of human pancreatic islets in health and type 2 diabetes. Cell Metab 24(4):593–607
Tang F, Barbacioru C, Wang Y, Nordman E, Lee C, Xu N et al (2009) mRNA-Seq whole-transcriptome analysis of a single cell. Nat Methods 6(5):377–382
Tasic B, Menon V, Nguyen TN, Kim TK, Jarsky T, Yao Z et al (2016) Adult mouse cortical cell taxonomy revealed by single cell transcriptomics. Nature Neurosci 19(2):335–346
Usoskin D, Furlan A, Islam S, Abdo H, Lӧnnerberg P, Lou D et al (2015) Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing. Nat Neurosci 18(1):145–153
Villani AC, Satija R, Reynolds G, Sarkizova S, Shekhar K, Fletcher J et al (2017) Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes, and progenitors. Science 356(6335):eaah4573
Wang B, Mezlini AM, Demir F, Fiume M, Tu Z, Brudno M et al (2014) Similarity network fusion for aggregating data types on a genomic scale. Nat Methods 11(3):333–337
Xu C, Su Z (2015) Identification of cell types from single-cell transcriptomes using a novel clustering method. Bioinformatics 31(12):1974–1980
Yanglan G, Ning L, Guobing Z, Yongchang X, Jihong G (2018) Identification of cancer subtypes from single-cell RNA-seq data using a consensus clustering method. BMC Med Genomics 11(Suppl 6):117
Zhang L, Zhang S (2020) Comparison of computational methods for imputing single-cell RNA-sequencing data. IEEE/ACM Trans Comput Biol Bioinform 17(2):376–389
Žurauskienė J, Yau C (2016) pcaReduce: hierarchical clustering of single cell transcriptional profiles. BMC Bioinformatics 17:140
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Einipour, A., Mosleh, M. & Ansari-Asl, K. EinImpute: a local and gene-based approach to imputation of dropout events in ScRNA-seq data. J Ambient Intell Human Comput 14, 3225–3237 (2023). https://doi.org/10.1007/s12652-021-03463-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-021-03463-8