Abstract
Over the past decade, high-throughput sequencing technologies have driven a dramatic increase in single-cell RNA sequencing (scRNA-seq) data. The study of scRNA-seq data has widened the scope and depth of researchers’ understanding of cellular heterogeneity. A prerequisite for studying heterogeneous cell populations is accurate cell type identification. However, the highly noisy and high-dimensional nature of scRNA-seq data poses a challenge to existing methods to further improve the success rate of cell type identification. Principal component analysis (PCA) is an important data analysis technique that is widely used to identify cell subpopulations. On the basis of PCA, we propose correntropy-based hypergraph regularized sparse PCA (CHLPCA) for accurate cell type identification. In addition to using correntropy to reduce the effect of noise, CHLPCA also considers higher-order relationships between samples by constructing the hypergraph, which compensates for the lack of local structure capture ability of PCA. Furthermore, we introduce the L2,1/5-norm into the model to enhance the interpretability of principal components (PCs), which further improves the model performance. CHLPCA has superior clustering accuracy and outperforms the best comparative method by 5.13% and 8.00% for ACC and NMI metrics, respectively. The results of clustering visualization experiments also confirm that CHLPCA can better perform the cell type recognition task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Raman, P., et al.: A comparison of survival analysis methods for cancer gene expression RNA-sequencing data. Cancer Genet. 235, 1–12 (2019)
Park, S., Zhao, H.: Spectral clustering based on learning similarity matrix. Bioinformatics 34, 2069–2076 (2018)
Zheng, R., Li, M., Liang, Z., Wu, F.-X., Pan, Y., Wang, J.: SinNLRR: a robust subspace clustering method for cell type detection by non-negative and low-rank representation. Bioinformatics 35, 3642–3650 (2019)
Wang, B., Zhu, J., Pierson, E., Ramazzotti, D., Batzoglou, S.: Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning. Nat. Methods 14, 414–416 (2017)
Abdi, H., Williams, L.J.: Principal component analysis. Wiley Interdiscip. Rev.: Comput. Stat. 2, 433–459 (2010)
Lall, S., Sinha, D., Bandyopadhyay, S., Sengupta, D.: Structure-aware principal component analysis for single-cell RNA-seq data. J. Comput. Biol. 25, 1365–1373 (2018)
Pierson, E., Yau, C.: ZIFA: dimensionality reduction for zero-inflated single-cell gene expression analysis. Genome Biol. 16, 1–10 (2015)
Liu, W., Pokharel, P.P., Principe, J.C.: Correntropy: properties and applications in non-Gaussian signal processing. IEEE Trans. Sig. Process. 55, 5286–5298 (2007)
He, R., Hu, B.-G., Zheng, W.-S., Kong, X.-W.: Robust principal component analysis based on maximum correntropy criterion. IEEE Trans. Image Process. 20, 1485–1494 (2011)
Yu, N., Wu, M.-J., Liu, J.-X., Zheng, C.-H., Xu, Y.: Correntropy-based hypergraph regularized NMF for clustering and feature selection on multi-cancer integrated data. IEEE Trans. Cybern. 51, 3952–3963 (2020)
Wang, T.-G., Shang, J.-L., Liu, J.-X., Li, F., Yuan, S., Wang, J.: Joint L2,p-norm and random walk graph constrained PCA for single-cell RNA-seq data. Comput. Methods Biomech. Biomed. Eng. 1–14 (2023)
Nikolova, M., Chan, R.H.: The equivalence of half-quadratic minimization and the gradient linearization iteration. IEEE Trans. Image Process. 16, 1623–1627 (2007)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3, 1–122 (2011)
Cai, D., He, X., Han, J.: Document clustering using locality preserving indexing. IEEE Trans. Knowl. Data Eng. 17, 1624–1637 (2005)
McDaid, A.F., Greene, D., Hurley, N.: Normalized mutual information to evaluate overlapping community finding algorithms. arXiv preprint arXiv:1110.2515 (2011)
Zheng, G.X., et al.: Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017)
Pollen, A.A., et al.: Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex. Nat. Biotechnol. 32, 1053–1058 (2014)
Grover, A., et al.: Single-cell RNA sequencing reveals molecular and functional platelet bias of aged haematopoietic stem cells. Nat. Commun. 7, 11075 (2016)
Buettner, F., et al.: Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nat. Biotechnol. 33, 155–160 (2015)
Engel, I., et al.: Innate-like functions of natural killer T cell subsets result from highly divergent gene programs. Nat. Immunol. 17, 728–739 (2016)
Deng, Q., Ramsköld, D., Reinius, B., Sandberg, R.: Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science 343, 193–196 (2014)
Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. J. Roy. Stat. Soc.: Ser. B (Stat. Methodol.) 63, 411–423 (2001)
Jiang, B., Ding, C., Luo, B., Tang, J.: Graph-Laplacian PCA: closed-form solution and robustness. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3492–3498. (2011)
Zhang, W., Xue, X., Zheng, X., Fan, Z.: NMFLRR: clustering scRNA-seq data by integrating nonnegative matrix factorization with low rank representation. IEEE J. Biomed. Health Inform. 26, 1394–1405 (2021)
Feng, C.-M., Gao, Y.-L., Liu, J.-X., Zheng, C.-H., Yu, J.: PCA based on graph Laplacian regularization and P-norm for gene selection and clustering. IEEE Trans. Nanobiosci. 16, 257–265 (2017)
Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9 (2008)
Van Der Maaten, L.: Fast optimization for t-SNE. In: Neural Information Processing Systems (NIPS) 2010 Workshop on Challenges in Data Visualization. Citeseer (2010)
Acknowledgment
This work is supported by the National Natural Science Foundation of China (Grant Nos. 62172253).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wang, TG., Kong, XZ., Li, SJ., Wang, J. (2023). CHLPCA: Correntropy-Based Hypergraph Regularized Sparse PCA for Single-Cell Type Identification. In: Guo, X., Mangul, S., Patterson, M., Zelikovsky, A. (eds) Bioinformatics Research and Applications. ISBRA 2023. Lecture Notes in Computer Science(), vol 14248. Springer, Singapore. https://doi.org/10.1007/978-981-99-7074-2_44
Download citation
DOI: https://doi.org/10.1007/978-981-99-7074-2_44
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7073-5
Online ISBN: 978-981-99-7074-2
eBook Packages: Computer ScienceComputer Science (R0)