Abstract
Huge amounts of genes in single-cell RNA sequencing (scRNA-seq) data may influence the performance of data clustering. To obtain high-quality genes for data clustering, the study proposes a novel gene selection algorithm based on Fisher score and genetic algorithms with dynamic crossover (abbreviated as FDCGA). To reduce time and space complexity, FDCGA first employs Fisher score to gain the preliminary candidate genes and then utilizes genetic algorithms with dynamic crossover to select beneficial genes to data clustering and analysis. The experimental results conducted on several publicly real-world scRNA-seq datasets demonstrate that FDCGA outperforms the other several competitors in terms of both NMI and ARI metrics and possesses significant optimization performances. The experimental convergence shows that the fitness of FDCGA can increase and converge to a fixed state versus the number of iterations. The statistical analysis demonstrates that FDCGA statistically significantly outperforms the other competing methods.





Similar content being viewed by others
References
Zhang J, Zhang G, Li Z, Qu L, Wen C-F (2021) Feature selection in a neighborhood decision information system with application to single cell rna data classification. Appl Soft Comput 113:107876. https://doi.org/10.1016/j.asoc.2021.107876
Whitley D (1994) A genetic algorithm tutorial. Stat Comput 4:65–85. https://doi.org/10.1007/BF00175354
Nakisa B, Rastgoo MN, Tjondronegoro D, Chandran V (2018) Evolutionary computation algorithms for feature selection of eeg-based emotion recognition using mobile sensors. Expert Syst Appl 93:143–155. https://doi.org/10.1016/j.eswa.2017.09.062
Hancer E, Xue B, Zhang M, Karaboga D, Akay B (2018) Pareto front feature selection based on artificial bee colony optimization. Inf Sci 422:462–479. https://doi.org/10.1016/j.ins.2017.09.028
Jain I, Jain VK, Jain R (2018) Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification. Appl Soft Comput 62:203–215. https://doi.org/10.1016/j.asoc.2017.09.038
Wang H, Jing X, Niu B (2017) A discrete bacterial algorithm for feature selection in classification of microarray gene expression cancer data. Knowl-Based Syst 126:8–19. https://doi.org/10.1016/j.knosys.2017.04.004
Ghaemi M, Feizi-Derakhshi M-R (2016) Feature selection using forest optimization algorithm. Pattern Recogn 60:121–129. https://doi.org/10.1016/j.patcog.2016.05.012
Pashaei E, Aydin N (2017) Binary black hole algorithm for feature selection and classification on biological data. Appl Soft Comput 56:94–106. https://doi.org/10.1016/j.asoc.2017.03.002
Yilmaz Eroglu D, Kilic K (2017) A novel hybrid genetic local search algorithm for feature selection and weighting with an application in strategic decision making in innovation management. Inf Sci 405:18–32. https://doi.org/10.1016/j.ins.2017.04.009
Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11:341–356. https://doi.org/10.1007/BF01001956
Sun L, Zhang X, Qian Y, Xu J, Zhang S (2019) Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification. Inf Sci 502:18–41. https://doi.org/10.1016/j.ins.2019.05.072
Yang J, Liu YL, Feng CS, Zhu GQ (2016) Applying the fisher score to identify Alzheimer’s disease-related genes. Genet Mol Res 15(2):1–9. https://doi.org/10.4238/gmr.15028798
Gu Q, Li Z, Han J (2012) Generalized fisher score for feature selection. arXiv preprint arXiv:1202.3725
Dai C, Wang Y, Ye M, Xue X, Liu H (2016) An orthogonal evolutionary algorithm with learning automata for multiobjective optimization. IEEE Trans Cybern 46(12):3306–3319. https://doi.org/10.1109/TCYB.2015.2503433
Xue X, Wang Y (2016) Using memetic algorithm for instance coreference resolution. IEEE Trans Knowl Data Eng 28(2):580–591. https://doi.org/10.1109/tkde.2015.2475755
Xue X, Zhang J (2021) Matching large-scale biomedical ontologies with central concept based partitioning algorithm and adaptive compact evolutionary algorithm. Appl Soft Comput 106:107343. https://doi.org/10.1016/j.asoc.2021.107343
Zhang J, Feng J, Wu F-X (2020) Finding community of brain networks based on neighbor index and dpso with dynamic crossover. Curr Bioinform 15(4):287–299. https://doi.org/10.2174/1574893614666191017100657
Zhang J, Wang Y, Feng J (2014) A hybrid clustering algorithm based on pso with dynamic crossover. Soft Comput 18(5):961–979. https://doi.org/10.1007/s00500-013-1115-6
Estévez PA, Tesmer M, Perez CA, Zurada JM (2009) Normalized mutual information feature selection. IEEE Trans Neural Netw 20(2):189–201. https://doi.org/10.1109/TNN.2008.2005601
Xu C, Su Z (2015) Identification of cell types from single-cell transcriptomes using a novel clustering method. Bioinformatics 31(12):1974–1980. https://doi.org/10.1093/bioinformatics/btv088
Ramsköld D, Luo S, Wang Y-C, Li R, Deng Q, Faridani OR, Daniels GA, Khrebtukova I, Loring JF, Laurent LC (2012) Full-length mrna-seq from single-cell levels of RNA and individual circulating tumor cells. Nat Biotechnol 30(8):777–782. https://doi.org/10.1038/nbt.2282
Biase F, Cao X, Zhong S (2014) Cell fate inclination within 2-cell and 4-cell mouse embryos revealed by single-cell rna sequencing. Genome Res 24(11):1787–1796. https://doi.org/10.1101/gr.177725.114
Ting DT, Wittner BS, Ligorio M, Jordan NV, Shah AM, Miyamoto DT, Aceto N, Bersani F, Brannigan BW, Xega K (2014) Single-cell rna sequencing identifies extracellular matrix gene expression by pancreatic circulating tumor cells. Cell Rep 8(6):1905–1918. https://doi.org/10.1016/j.celrep.2014.08.029
Chung W, Eum HH, Lee H-O, Lee K-M, Lee H-B, Kim K-T, Ryu HS, Kim S, Lee JE, Park YH (2017) Single-cell rna-seq enables comprehensive tumour and immune cell profiling in primary breast cancer. Nat Commun 8:15081. https://doi.org/10.1038/ncomms15081
Su X, Shi Y, Zou X, Lu Z-N, Xie G, Yang JY, Wu C-C, Cui X-F, He K-Y, Luo Q (2017) Single-cell rna-seq analysis reveals dynamic trajectories during mouse liver development. BMC Genom 946(1):1–14. https://doi.org/10.1186/s12864-017-4342-x
Deng Q, Ramsköld D, Reinius B, Sandberg R (2014) Single-cell rna-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science 343(6167):193–196. https://doi.org/10.1126/science.1245316
Fan X, Zhang X, Wu X, Guo H, Hu Y, Tang F, Huang Y (2015) Single-cell rna-seq transcriptome analysis of linear and circular rnas in mouse preimplantation embryos. Genome Biol 148(1):1–17. https://doi.org/10.1186/s13059-015-0706-1
Treutlein B, Brownfield DG, Wu AR, Neff NF, Mantalas GL, Espinoza FH, Desai TJ, Krasnow MA, Quake SR (2014) Reconstructing lineage hierarchies of the distal lung epithelium using single-cell rna-seq. Nature 509(7500):371–375. https://doi.org/10.1038/nature13173
Yan L, Yang M, Guo H, Yang L, Wu J, Li R, Liu P, Lian Y, Zheng X, Yan J (2013) Single-cell rna-seq profiling of human preimplantation embryos and embryonic stem cells. Nat Struct Mol Biol 20(9):1131–1139. https://doi.org/10.1038/nsmb.2660
Darmanis S, Sloan SA, Zhang Y, Enge M, Caneda C, Shuer LM, Gephart MGH, Barres BA, Quake SR (2015) A survey of human brain transcriptome diversity at the single cell level. Proc Natl Acad Sci 112(23):7285–7290. https://doi.org/10.1073/pnas.1507125112
Lipowski A, Lipowska D (2012) Roulette-wheel selection via stochastic acceptance. Phys A 391(6):2193–2196. https://doi.org/10.1016/j.physa.2011.12.004
Ho-Huu V, Nguyen-Thoi T, Truong-Khac T, Le-Anh L, Vo-Duy T (2018) An improved differential evolution based on roulette wheel selection for shape and size optimization of truss structures with frequency constraints. Neural Comput Appl 29(1):167–185. https://doi.org/10.1007/s00521-016-2426-1
Wang B, Zhu J, Pierson E, Ramazzotti D, Batzoglou S (2017) Visualization and analysis of single-cell rna-seq data by kernel-based similarity learning. Nat Methods 14(4):414–419. https://doi.org/10.1038/nmeth.4207
Jiang H, Sohn LL, Huang H, Chen L (2018) Single cell clustering based on cell-pair differentiability correlation and variance analysis. Bioinformatics 34(21):3684–3694. https://doi.org/10.1093/bioinformatics/bty390
Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11(1):86–92. https://doi.org/10.1214/aoms/1177731944
Dunn O (1961) Multiple comparisons among means. J Am Stat Assoc 56:52–64. https://doi.org/10.1080/01621459.1961.10482090
Tasic B, Menon V, Nguyen TN, Kim TK, Jarsky T, Yao Z, Levi B, Gray LT, Sorensen SA, Dolbeare T (2016) Adult mouse cortical cell taxonomy revealed by single cell transcriptomics. Nat Neurosci 19(2):335–346. https://doi.org/10.1038/nn.4216
Muraro MJ, Dharmadhikari G, Grün D, Groen N, Dielen T, Jansen E, van Gurp L, Engelse MA, Carlotti F, de Koning EJ (2016) A single-cell transcriptome atlas of the human pancreas. Cell Syst 3(4):385–394. https://doi.org/10.1016/j.cels.2016.09.002
Marques S, Zeisel A, Codeluppi S, van Bruggen D, Falcão AM, Xiao L, Li H, Häring M, Hochgerner H, Romanov RA (2016) Oligodendrocyte heterogeneity in the mouse juvenile and adult central nervous system. Science 352(6291):1326–1329. https://doi.org/10.1126/science.aaf6463
Chen R, Wu X, Jiang L, Zhang Y (2017) Single-cell rna-seq reveals hypothalamic cell diversity. Cell Rep 18(13):3227–3241. https://doi.org/10.1016/j.celrep.2017.03.004
Acknowledgements
We would like to thank anonymous reviewers and all authors of the cited references.
Funding
This research was supported in part by the Natural Science Foundation of Guangxi Province grant number 2021GXNSFAA220076, National Natural Science Foundation of China Grant No. 62141207, and Key Fields Project of Universities in Guangdong Province Grant Nos. 2021ZDZX4109, 2020ZDZX3119.
Author information
Authors and Affiliations
Contributions
JHF and JZ designed the study; XSZ implemented the experiments; JHW addressed the data analysis; JHF and JZ wrote and revised the manuscript. All authors reviewed the manuscript.
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare no potential conflicts of interest with respect to the study, authorship, and publication of this article.
Data availability
The datasets supporting this study are publicly available and can be obtained from EMBL-EBI (https://www.ebi.ac.uk/) or the NCBI Gene Expression Omnibus (GEO) repository (https://www.ncbi.nlm.nih.gov/geo/).
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Feng, J., Zhang, J., Zhu, X. et al. Gene selection and clustering of single-cell data based on Fisher score and genetic algorithm. J Supercomput 79, 7067–7093 (2023). https://doi.org/10.1007/s11227-022-04920-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-022-04920-7