Abstract
Single-cell ribonucleic acid sequencing (scRNA-seq) allows researchers to study cell heterogeneity and diversity at the individual cell level. Cell clustering is an essential component of scRNA-seq data processing. However, the high dimensionality and high noise characteristics of scRNA-seq data may pose problems during data processing. Although many methods are available for scRNA-seq clustering analysis, most of them ignore the topological relationships of scRNA-seq data and do not fully utilize the potential associations between cells. In this study, we present scGAD, a graph attention autoencoder model with a dual decoder structure for clustering scRNA-seq data. We utilize a graph attention autoencoder with two decoders to learn feature representations of cells in latent space. To ensure that the learned latent feature representation maintains node properties and graph structure, we use an inner product decoder and a learnable graph attention decoder to reconstruct graph structure and node properties, respectively. On the 12 real scRNA-seq datasets, the average NMI and ARI scores of scGAD are 0.762 and 0.695, respectively, outperforming state-of-the-art single-cell clustering approaches. Biological analysis shows that the cell labels predicted by scGAD can assist in the downstream analysis of scRNA-seq data.
Similar content being viewed by others
Data Availability
The datasets used in this study can be found at https://github.com/ZzzOctopus/scGAD.
Code Availability
scGAD is implemented in Python and the source code may be found on https://github.com/ZzzOctopus/scGAD.
References
Clarke ZA et al (2021) Tutorial: guidelines for annotating single-cell transcriptomic maps using automated and manual methods. Nat Protoc 16:2749–2764
Kiselev VY, Andrews TS, Hemberg M (2019) Challenges in unsupervised clustering of single-cell rna-seq data. Nat Rev Genet 20:273–282
Qian K, Fu S, Li H, Li WV (2022) scinsight for interpreting single-cell gene expression from biologically heterogeneous data. Genome Biol 23:1–23
Sheng J, Li WV (2021) Selecting gene features for unsupervised analysis of single-cell gene expression data. Brief Bioinform 22:bbab295
Li WV (2022) Phitest for analyzing the homogeneity of single-cell populations. Bioinformatics 38:2639–2641
Li Y et al (2022) Cellular heterogeneity and immune microenvironment revealed by single-cell transcriptome in venous malformation and cavernous venous malformation. J Mol Cell Biol 162:130–143
Geldhof V et al (2022) Single cell atlas identifies lipid-processing and immunomodulatory endothelial cells in healthy and malignant breast. Nat Commun 13:5511
Twigger A-J et al (2022) Transcriptional changes in the mammary gland during lactation revealed by single cell sequencing of cells from human milk. Nat Commun 13:562
Dai H, Li L, Zeng T, Chen L (2019) Cell-specific network constructed by single-cell rna sequencing data. Nucleic Acids Res 47:e62–e62
Petegrosso R, Li Z, Kuang R (2020) Machine learning and statistical methods for clustering single-cell rna-sequencing data. Brief Bioinform 21:1209–1223
Qi R, Ma A, Ma Q, Zou Q (2020) Clustering and classification methods for single-cell rna-sequencing data. Brief Bioinform 21:1196–1208
Wang B, Zhu J, Pierson E, Ramazzotti D, Batzoglou S (2017) Visualization and analysis of single-cell rna-seq data by kernel-based similarity learning. Nat Methods 14:414–416
Kiselev VY et al (2017) Sc3: consensus clustering of single-cell rna-seq data. Nat Methods 14:483–486
Cui Y et al (2021) Consensus clustering of single-cell rna-seq data by enhancing network affinity. Brief Bioinform 22:bbab236
Deng Y, Bao F, Dai Q, Wu LF, Altschuler SJ (2019) Scalable analysis of cell-type composition from single-cell transcriptomics using deep recurrent learning. Nat Methods 16:311–314
Deng Y, Bao F, Dai Q, Wu LF, Altschuler SJ (2019) Scalable analysis of cell-type composition from single-cell transcriptomics using deep recurrent learning. Nat Methods 16:311–314
Yu B et al (2021) scgmai: a gaussian mixture model for clustering single-cell rna-seq data based on deep autoencoder. Brief Bioinform 22:bbaa316
Choi Y, Li R, Quon G (2023) sivae: interpretable deep generative models for single-cell transcriptomes. Genome Biol 24:29
Grønbech CH et al (2020) scvae: variational auto-encoders for single-cell gene expression data. Bioinformatics 36:4415–4422
Wang H-Y, Zhao J-P, Zheng C-H, Su Y-S (2023) scgmaae: Gaussian mixture adversarial autoencoders for diversification analysis of scrna-seq data. Brief Bioinform 24:bbac585
Xie J, Girshick R, Farhadi A (2016) Unsupervised deep embedding for clustering analysis, pp 478–487 (PMLR)
Tian T, Wan J, Song Q, Wei Z (2019) Clustering single-cell rna-seq data with a model-based deep learning approach. Nat Mach Intell 1:191–198
Chen L, Wang W, Zhai Y, Deng M (2020) Deep soft k-means clustering with self-training for single-cell rna sequence data. NAR Genom Bioinform 2:lqaa039
He X et al (2023) scace: an adaptive embedding and clustering method for single-cell gene expression data. Bioinformatics 39:btad546
Wang J et al (2021) scgnn is a novel graph neural network framework for single-cell rna-seq analyses. Nat Commun 12:1882
Gan Y, Huang X, Zou G, Zhou S, Guan J (2022) Deep structural clustering for single-cell rna-seq data jointly through autoencoder and graph neural network. Brief Bioinform 23:bbac018
Cheng Y, Ma X (2022) scgac: a graph attentional architecture for clustering single-cell rna-seq data. Bioinformatics 38:2187–2193
Ting DT et al (2014) Single-cell rna sequencing identifies extracellular matrix gene expression by pancreatic circulating tumor cells. Cell reports 8:1905–1918
Buettner F et al (2015) Computational analysis of cell-to-cell heterogeneity in single-cell rna-sequencing data reveals hidden subpopulations of cells. Nat Biotechnol 33:155–160
Pollen AA et al (2014) Low-coverage single-cell mrna sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex. Nat Biotechnol 32:1053–1058
Darmanis S et al (2015) A survey of human brain transcriptome diversity at the single cell level. Proc Natl Acad Sci 112:7285–7290
Kolodziejczyk AA et al (2015) Single cell rna-sequencing of pluripotent states unlocks modular transcriptional variation. Cell Stem Cell 17:471–485
Baron M et al (2016) A single-cell transcriptomic map of the human and mouse pancreas reveals inter-and intra-cell population structure. Cell Syst 3:346–360
Muraro MJ et al (2016) A single-cell transcriptome atlas of the human pancreas. Cell Syst 3:385–394
Klein AM et al (2015) Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161:1187–1201
Han X et al (2018) Mapping the mouse cell atlas by microwell-seq. Cell 172:1091–1107
Zheng GX et al (2017) Massively parallel digital transcriptional profiling of single cells. Nat Commun 8:14049
Young MD et al (2018) Single-cell transcriptomes from human kidneys reveal the cellular identity of renal tumors. Science 361:594–599
Schaum N et al (2018) Single-cell transcriptomics of 20 mouse organs creates a tabula muris: The tabula muris consortium. Nature 562:367
Wolf FA, Angerer P, Theis FJ (2018) Scanpy: large-scale single-cell gene expression data analysis. Genome Biol 19:1–5
Funding
This work was supported by the National Key Research and Development Project of China (2021YFA1000102, 2021YFA1000103).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing of interest
The authors have no competing interests to declare that are relevant to the content of this article.
Financial interests
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, S., Zhang, Y., Zhang, Y. et al. Graph attention autoencoder model with dual decoder for clustering single-cell RNA sequencing data. Appl Intell 54, 5136–5146 (2024). https://doi.org/10.1007/s10489-024-05442-w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-024-05442-w