Skip to main content

Advertisement

Log in

Gene selection and clustering of single-cell data based on Fisher score and genetic algorithm

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Huge amounts of genes in single-cell RNA sequencing (scRNA-seq) data may influence the performance of data clustering. To obtain high-quality genes for data clustering, the study proposes a novel gene selection algorithm based on Fisher score and genetic algorithms with dynamic crossover (abbreviated as FDCGA). To reduce time and space complexity, FDCGA first employs Fisher score to gain the preliminary candidate genes and then utilizes genetic algorithms with dynamic crossover to select beneficial genes to data clustering and analysis. The experimental results conducted on several publicly real-world scRNA-seq datasets demonstrate that FDCGA outperforms the other several competitors in terms of both NMI and ARI metrics and possesses significant optimization performances. The experimental convergence shows that the fitness of FDCGA can increase and converge to a fixed state versus the number of iterations. The statistical analysis demonstrates that FDCGA statistically significantly outperforms the other competing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Zhang J, Zhang G, Li Z, Qu L, Wen C-F (2021) Feature selection in a neighborhood decision information system with application to single cell rna data classification. Appl Soft Comput 113:107876. https://doi.org/10.1016/j.asoc.2021.107876

    Article  Google Scholar 

  2. Whitley D (1994) A genetic algorithm tutorial. Stat Comput 4:65–85. https://doi.org/10.1007/BF00175354

    Article  Google Scholar 

  3. Nakisa B, Rastgoo MN, Tjondronegoro D, Chandran V (2018) Evolutionary computation algorithms for feature selection of eeg-based emotion recognition using mobile sensors. Expert Syst Appl 93:143–155. https://doi.org/10.1016/j.eswa.2017.09.062

    Article  Google Scholar 

  4. Hancer E, Xue B, Zhang M, Karaboga D, Akay B (2018) Pareto front feature selection based on artificial bee colony optimization. Inf Sci 422:462–479. https://doi.org/10.1016/j.ins.2017.09.028

    Article  Google Scholar 

  5. Jain I, Jain VK, Jain R (2018) Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification. Appl Soft Comput 62:203–215. https://doi.org/10.1016/j.asoc.2017.09.038

    Article  Google Scholar 

  6. Wang H, Jing X, Niu B (2017) A discrete bacterial algorithm for feature selection in classification of microarray gene expression cancer data. Knowl-Based Syst 126:8–19. https://doi.org/10.1016/j.knosys.2017.04.004

    Article  Google Scholar 

  7. Ghaemi M, Feizi-Derakhshi M-R (2016) Feature selection using forest optimization algorithm. Pattern Recogn 60:121–129. https://doi.org/10.1016/j.patcog.2016.05.012

    Article  Google Scholar 

  8. Pashaei E, Aydin N (2017) Binary black hole algorithm for feature selection and classification on biological data. Appl Soft Comput 56:94–106. https://doi.org/10.1016/j.asoc.2017.03.002

    Article  Google Scholar 

  9. Yilmaz Eroglu D, Kilic K (2017) A novel hybrid genetic local search algorithm for feature selection and weighting with an application in strategic decision making in innovation management. Inf Sci 405:18–32. https://doi.org/10.1016/j.ins.2017.04.009

    Article  Google Scholar 

  10. Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11:341–356. https://doi.org/10.1007/BF01001956

    Article  MATH  Google Scholar 

  11. Sun L, Zhang X, Qian Y, Xu J, Zhang S (2019) Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification. Inf Sci 502:18–41. https://doi.org/10.1016/j.ins.2019.05.072

    Article  MathSciNet  MATH  Google Scholar 

  12. Yang J, Liu YL, Feng CS, Zhu GQ (2016) Applying the fisher score to identify Alzheimer’s disease-related genes. Genet Mol Res 15(2):1–9. https://doi.org/10.4238/gmr.15028798

    Article  Google Scholar 

  13. Gu Q, Li Z, Han J (2012) Generalized fisher score for feature selection. arXiv preprint arXiv:1202.3725

  14. Dai C, Wang Y, Ye M, Xue X, Liu H (2016) An orthogonal evolutionary algorithm with learning automata for multiobjective optimization. IEEE Trans Cybern 46(12):3306–3319. https://doi.org/10.1109/TCYB.2015.2503433

    Article  Google Scholar 

  15. Xue X, Wang Y (2016) Using memetic algorithm for instance coreference resolution. IEEE Trans Knowl Data Eng 28(2):580–591. https://doi.org/10.1109/tkde.2015.2475755

    Article  Google Scholar 

  16. Xue X, Zhang J (2021) Matching large-scale biomedical ontologies with central concept based partitioning algorithm and adaptive compact evolutionary algorithm. Appl Soft Comput 106:107343. https://doi.org/10.1016/j.asoc.2021.107343

    Article  Google Scholar 

  17. Zhang J, Feng J, Wu F-X (2020) Finding community of brain networks based on neighbor index and dpso with dynamic crossover. Curr Bioinform 15(4):287–299. https://doi.org/10.2174/1574893614666191017100657

    Article  Google Scholar 

  18. Zhang J, Wang Y, Feng J (2014) A hybrid clustering algorithm based on pso with dynamic crossover. Soft Comput 18(5):961–979. https://doi.org/10.1007/s00500-013-1115-6

    Article  Google Scholar 

  19. Estévez PA, Tesmer M, Perez CA, Zurada JM (2009) Normalized mutual information feature selection. IEEE Trans Neural Netw 20(2):189–201. https://doi.org/10.1109/TNN.2008.2005601

    Article  Google Scholar 

  20. Xu C, Su Z (2015) Identification of cell types from single-cell transcriptomes using a novel clustering method. Bioinformatics 31(12):1974–1980. https://doi.org/10.1093/bioinformatics/btv088

    Article  Google Scholar 

  21. Ramsköld D, Luo S, Wang Y-C, Li R, Deng Q, Faridani OR, Daniels GA, Khrebtukova I, Loring JF, Laurent LC (2012) Full-length mrna-seq from single-cell levels of RNA and individual circulating tumor cells. Nat Biotechnol 30(8):777–782. https://doi.org/10.1038/nbt.2282

    Article  Google Scholar 

  22. Biase F, Cao X, Zhong S (2014) Cell fate inclination within 2-cell and 4-cell mouse embryos revealed by single-cell rna sequencing. Genome Res 24(11):1787–1796. https://doi.org/10.1101/gr.177725.114

    Article  Google Scholar 

  23. Ting DT, Wittner BS, Ligorio M, Jordan NV, Shah AM, Miyamoto DT, Aceto N, Bersani F, Brannigan BW, Xega K (2014) Single-cell rna sequencing identifies extracellular matrix gene expression by pancreatic circulating tumor cells. Cell Rep 8(6):1905–1918. https://doi.org/10.1016/j.celrep.2014.08.029

    Article  Google Scholar 

  24. Chung W, Eum HH, Lee H-O, Lee K-M, Lee H-B, Kim K-T, Ryu HS, Kim S, Lee JE, Park YH (2017) Single-cell rna-seq enables comprehensive tumour and immune cell profiling in primary breast cancer. Nat Commun 8:15081. https://doi.org/10.1038/ncomms15081

    Article  Google Scholar 

  25. Su X, Shi Y, Zou X, Lu Z-N, Xie G, Yang JY, Wu C-C, Cui X-F, He K-Y, Luo Q (2017) Single-cell rna-seq analysis reveals dynamic trajectories during mouse liver development. BMC Genom 946(1):1–14. https://doi.org/10.1186/s12864-017-4342-x

    Article  Google Scholar 

  26. Deng Q, Ramsköld D, Reinius B, Sandberg R (2014) Single-cell rna-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science 343(6167):193–196. https://doi.org/10.1126/science.1245316

    Article  Google Scholar 

  27. Fan X, Zhang X, Wu X, Guo H, Hu Y, Tang F, Huang Y (2015) Single-cell rna-seq transcriptome analysis of linear and circular rnas in mouse preimplantation embryos. Genome Biol 148(1):1–17. https://doi.org/10.1186/s13059-015-0706-1

    Article  Google Scholar 

  28. Treutlein B, Brownfield DG, Wu AR, Neff NF, Mantalas GL, Espinoza FH, Desai TJ, Krasnow MA, Quake SR (2014) Reconstructing lineage hierarchies of the distal lung epithelium using single-cell rna-seq. Nature 509(7500):371–375. https://doi.org/10.1038/nature13173

    Article  Google Scholar 

  29. Yan L, Yang M, Guo H, Yang L, Wu J, Li R, Liu P, Lian Y, Zheng X, Yan J (2013) Single-cell rna-seq profiling of human preimplantation embryos and embryonic stem cells. Nat Struct Mol Biol 20(9):1131–1139. https://doi.org/10.1038/nsmb.2660

    Article  Google Scholar 

  30. Darmanis S, Sloan SA, Zhang Y, Enge M, Caneda C, Shuer LM, Gephart MGH, Barres BA, Quake SR (2015) A survey of human brain transcriptome diversity at the single cell level. Proc Natl Acad Sci 112(23):7285–7290. https://doi.org/10.1073/pnas.1507125112

    Article  Google Scholar 

  31. Lipowski A, Lipowska D (2012) Roulette-wheel selection via stochastic acceptance. Phys A 391(6):2193–2196. https://doi.org/10.1016/j.physa.2011.12.004

    Article  Google Scholar 

  32. Ho-Huu V, Nguyen-Thoi T, Truong-Khac T, Le-Anh L, Vo-Duy T (2018) An improved differential evolution based on roulette wheel selection for shape and size optimization of truss structures with frequency constraints. Neural Comput Appl 29(1):167–185. https://doi.org/10.1007/s00521-016-2426-1

    Article  Google Scholar 

  33. Wang B, Zhu J, Pierson E, Ramazzotti D, Batzoglou S (2017) Visualization and analysis of single-cell rna-seq data by kernel-based similarity learning. Nat Methods 14(4):414–419. https://doi.org/10.1038/nmeth.4207

    Article  Google Scholar 

  34. Jiang H, Sohn LL, Huang H, Chen L (2018) Single cell clustering based on cell-pair differentiability correlation and variance analysis. Bioinformatics 34(21):3684–3694. https://doi.org/10.1093/bioinformatics/bty390

    Article  Google Scholar 

  35. Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11(1):86–92. https://doi.org/10.1214/aoms/1177731944

    Article  MathSciNet  MATH  Google Scholar 

  36. Dunn O (1961) Multiple comparisons among means. J Am Stat Assoc 56:52–64. https://doi.org/10.1080/01621459.1961.10482090

    Article  MathSciNet  MATH  Google Scholar 

  37. Tasic B, Menon V, Nguyen TN, Kim TK, Jarsky T, Yao Z, Levi B, Gray LT, Sorensen SA, Dolbeare T (2016) Adult mouse cortical cell taxonomy revealed by single cell transcriptomics. Nat Neurosci 19(2):335–346. https://doi.org/10.1038/nn.4216

    Article  Google Scholar 

  38. Muraro MJ, Dharmadhikari G, Grün D, Groen N, Dielen T, Jansen E, van Gurp L, Engelse MA, Carlotti F, de Koning EJ (2016) A single-cell transcriptome atlas of the human pancreas. Cell Syst 3(4):385–394. https://doi.org/10.1016/j.cels.2016.09.002

    Article  Google Scholar 

  39. Marques S, Zeisel A, Codeluppi S, van Bruggen D, Falcão AM, Xiao L, Li H, Häring M, Hochgerner H, Romanov RA (2016) Oligodendrocyte heterogeneity in the mouse juvenile and adult central nervous system. Science 352(6291):1326–1329. https://doi.org/10.1126/science.aaf6463

    Article  Google Scholar 

  40. Chen R, Wu X, Jiang L, Zhang Y (2017) Single-cell rna-seq reveals hypothalamic cell diversity. Cell Rep 18(13):3227–3241. https://doi.org/10.1016/j.celrep.2017.03.004

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank anonymous reviewers and all authors of the cited references.

Funding

This research was supported in part by the Natural Science Foundation of Guangxi Province grant number 2021GXNSFAA220076, National Natural Science Foundation of China Grant No. 62141207, and Key Fields Project of Universities in Guangdong Province Grant Nos. 2021ZDZX4109, 2020ZDZX3119.

Author information

Authors and Affiliations

Authors

Contributions

JHF and JZ designed the study; XSZ implemented the experiments; JHW addressed the data analysis; JHF and JZ wrote and revised the manuscript. All authors reviewed the manuscript.

Corresponding authors

Correspondence to Jie Zhang or Xiaoshu Zhu.

Ethics declarations

Conflict of interest

The authors declare no potential conflicts of interest with respect to the study, authorship, and publication of this article.

Data availability

The datasets supporting this study are publicly available and can be obtained from EMBL-EBI (https://www.ebi.ac.uk/) or the NCBI Gene Expression Omnibus (GEO) repository (https://www.ncbi.nlm.nih.gov/geo/).

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Feng, J., Zhang, J., Zhu, X. et al. Gene selection and clustering of single-cell data based on Fisher score and genetic algorithm. J Supercomput 79, 7067–7093 (2023). https://doi.org/10.1007/s11227-022-04920-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-022-04920-7

Keywords