  Article
  Published:

Clustering single-cell RNA-seq data with a model-based deep learning approach


Single-cell RNA sequencing (scRNA-seq) promises to provide higher resolution of cellular differences than bulk RNA sequencing. Clustering transcriptomes profiled by scRNA-seq has been routinely conducted to reveal cell heterogeneity and diversity. However, clustering analysis of scRNA-seq data remains a statistical and computational challenge, due to the pervasive dropout events obscuring the data matrix with prevailing ‘false’ zero count observations. Here, we have developed scDeepCluster, a single-cell model-based deep embedded clustering method, which simultaneously learns feature representation and clustering via explicit modelling of scRNA-seq data generation. Based on testing extensive simulated data and real datasets from four representative single-cell sequencing platforms, scDeepCluster outperformed state-of-the-art methods under various clustering performance metrics and exhibited improved scalability, with running time increasing linearly with sample size. Its accuracy and efficiency make scDeepCluster a promising algorithm for clustering large-scale scRNA-seq data.

Fig. 1: Network architecture of scDeepCluster.
Fig. 2: Simulation on evaluation.
Fig. 3: Benchmark results on four real scRNA-seq datasets with true labels.
Fig. 4: Applying scDeepCluster on various down-sampled simulated data.

Data availability

The scRNA-seq data that support the findings of this study are available in GitHub:

Code availability

The source code, weights of trained models and the real scRNA-seq data used for experiments of scDeepCluster are available in GitHub:


Z.W. and Q.S. conceived and supervised the project. Z.W. led the study. T.T. designed the methods and conducted the experiments with input from J.W. T.T., J.W. and Z.W. wrote the manuscript. All authors approved the manuscript.

Supplementary information

Supplementary Information

Figures, table and notes

