Skip to main content

Feature Selection in Single-Cell RNA-seq Data via a Genetic Algorithm

  • Conference paper
  • First Online:
Learning and Intelligent Optimization (LION 2021)

Abstract

Big data methods prevail in the biomedical domain leading to effective and scalable data-driven approaches. Biomedical data are known for their ultra-high dimensionality, especially the ones coming from molecular biology experiments. This property is also included in the emerging technique of single-cell RNA-sequencing (scRNA-seq), where we obtain sequence information from individual cells. A reliable way to uncover their complexity is by using Machine Learning approaches, including dimensional reduction and feature selection methods. Although the first choice has had remarkable progress in scRNA-seq data, only the latter can offer deeper interpretability at the gene level since it highlights the dominant gene features in the given data. Towards tackling this challenge, we propose a feature selection framework that utilizes genetic optimization principles and identifies low-dimensional combinations of gene lists in order to enhance classification performance of any off-the-shelf classifier (e.g., LDA or SVM). Our intuition is that by identifying an optimal genes subset, we can enhance the prediction power of scRNA-seq data even if these genes are unrelated to each other. We showcase our proposed framework’s effectiveness in two real scRNA-seq experiments with gene dimensions up to 36708. Our framework can identify very low-dimensional subsets of genes (less than 200) while boosting the classifiers’ performance. Finally, we provide a biological interpretation of the selected genes, thus providing evidence of our method’s utility towards explainable artificial intelligence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We define \(\mathbb {B}\) as the space of Boolean variables.

  2. 2.

    We have 20 runs/replicates.

References

  1. Alba, E., Garcia-Nieto, J., Jourdan, L., Talbi, E.G.: Gene selection in cancer classification using PSO/SVM and GA/SVM hybrid algorithms. In: 2007 IEEE Congress on Evolutionary Computation, pp. 284–290. IEEE (2007)

    Google Scholar 

  2. Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46(3), 175–185 (1992)

    MathSciNet  Google Scholar 

  3. Andrews, T.S., Hemberg, M.: M3drop: dropout-based feature selection for scrnaseq. Bioinformatics 35(16), 2865–2867 (2019)

    Article  Google Scholar 

  4. Athar, A., et al.: Arrayexpress update-from bulk to single-cell expression data. Nucleic Acids Res. 47(D1), D711–D715 (2019)

    Article  Google Scholar 

  5. Becht, E., et al.: Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37(1), 38 (2019)

    Article  Google Scholar 

  6. Brown, M.P., et al.: Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Nat. Acad. Sci. 97(1), 262–267 (2000)

    Article  Google Scholar 

  7. Buettner, F., et al.: Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nat. Biotechnol. 33(2), 155–160 (2015)

    Article  Google Scholar 

  8. Chattopadhyay, A., Lu, T.P.: Gene-gene interaction: the curse of dimensionality. Ann. Transl. Med. 7(24) (2019)

    Google Scholar 

  9. Chatzilygeroudis, K., Hatzilygeroudis, I., Perikos, I.: Machine learning basics. In: Intelligent Computing for Interactive System Design: Statistics, Digital Signal Processing, and Machine Learning in Practice, pp. 143–193 (2021)

    Google Scholar 

  10. Clough, E., Barrett, T.: The gene expression omnibus database. In: Mathé, E., Davis, S. (eds.) Statistical Genomics. MMB, vol. 1418, pp. 93–110. Springer, New York (2016). https://doi.org/10.1007/978-1-4939-3578-9_5

    Chapter  Google Scholar 

  11. Collins, F.S., Morgan, M., Patrinos, A.: The human genome project: lessons from large-scale biology. Science 300(5617), 286–290 (2003)

    Article  Google Scholar 

  12. Dhaenens, C., Jourdan, L.: Metaheuristics for data mining. 4OR 17(2), 115–139 (2019). https://doi.org/10.1007/s10288-019-00402-4

    Article  MathSciNet  MATH  Google Scholar 

  13. Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc. 97(457), 77–87 (2002)

    Article  MathSciNet  Google Scholar 

  14. Estévez, P.A., Caballero, R.E.: A Niching genetic algorithm for selecting features for neural network classifiers. In: Niklasson, L., Bodén, M., Ziemke, T. (eds.) ICANN 1998. PNC, pp. 311–316. Springer, London (1998). https://doi.org/10.1007/978-1-4471-1599-1_45

    Chapter  Google Scholar 

  15. Feng, Z., et al.: scTIM: seeking cell-type-indicative marker from single cell RNA-seq data by consensus optimization. Bioinformatics 36(8), 2474–2485 (2020)

    Article  Google Scholar 

  16. Hedlund, E., Deng, Q.: Single-cell RNA sequencing: technical advancements and biological applications. Mol. Aspects Med. 59, 36–46 (2018)

    Article  Google Scholar 

  17. Hong, J.H., Cho, S.B.: Efficient huge-scale feature selection with speciated genetic algorithm. Pattern Recogn. Lett. 27(2), 143–150 (2006)

    Article  Google Scholar 

  18. Huang, X., Liu, S., Wu, L., Jiang, M., Hou, Y.: High throughput single cell RNA sequencing, bioinformatics analysis and applications. In: Gu, J., Wang, X. (eds.) Single Cell Biomedicine. AEMB, vol. 1068, pp. 33–43. Springer, Singapore (2018). https://doi.org/10.1007/978-981-13-0502-3_4

    Chapter  Google Scholar 

  19. Khalifa, N.E.M., Taha, M.H.N., Ali, D.E., Slowik, A., Hassanien, A.E.: Artificial intelligence technique for gene expression by tumor RNA-seq data: a novel optimized deep learning approach. IEEE Access 8, 22874–22883 (2020)

    Article  Google Scholar 

  20. Liang, S., Ma, A., Yang, S., Wang, Y., Ma, Q.: A review of matched-pairs feature selection methods for gene expression data analysis. Comput. Struct. Biotechnol. J. 16, 88–97 (2018)

    Article  Google Scholar 

  21. Linderman, G.C., Rachh, M., Hoskins, J.G., Steinerberger, S., Kluger, Y.: Fast interpolation-based t-SNE for improved visualization of single-cell RNA-seq data. Nat. Methods 16(3), 243–245 (2019)

    Article  Google Scholar 

  22. van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)

    MATH  Google Scholar 

  23. McLachlan, G.J.: Discriminant Analysis and Statistical Pattern Recognition, vol. 544. John Wiley & Sons, New York (2004)

    Google Scholar 

  24. Moon, M., Nakai, K.: Stable feature selection based on the ensemble l 1-norm support vector machine for biomarker discovery. BMC Genom. 17(13), 65–74 (2016)

    Google Scholar 

  25. Poirion, O.B., Zhu, X., Ching, T., Garmire, L.: Single-cell transcriptomics bioinformatics and computational challenges. Front. Genet. 7, 163 (2016)

    Article  Google Scholar 

  26. Qi, R., Ma, A., Ma, Q., Zou, Q.: Clustering and classification methods for single-cell RNA-sequencing data. Briefings Bioinform. 21(4), 1196–1208 (2020)

    Article  Google Scholar 

  27. Regev, A., et al.: Science forum: the human cell atlas. Elife 6, e27041 (2017)

    Article  Google Scholar 

  28. Scialdone, A., et al.: Computational assignment of cell-cycle stage from single-cell transcriptome data. Methods 85, 54–61 (2015)

    Article  Google Scholar 

  29. Shendure, J., et al.: DNA sequencing at 40: past, present and future. Nature 550(7676), 345 (2017)

    Article  Google Scholar 

  30. Taguchi, Y.: Principal component analysis-based unsupervised feature extraction applied to single-cell gene expression analysis. In: Huang, D.-S., Jo, K.-H., Zhang, X.-L. (eds.) ICIC 2018. LNCS, vol. 10955, pp. 816–826. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-95933-7_90

    Chapter  Google Scholar 

  31. Townes, F.W., Hicks, S.C., Aryee, M.J., Irizarry, R.A.: Feature selection and dimension reduction for single-cell RNA-seq based on a multinomial model. Genome Biol. 20(1), 1–16 (2019)

    Article  Google Scholar 

  32. Treutlein, B., et al.: Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq. Nature 509(7500), 371 (2014)

    Article  Google Scholar 

  33. Vrahatis, A.G., Tasoulis, S.K., Maglogiannis, I., Plagianakos, V.P.: Recent machine learning approaches for single-cell RNA-seq data analysis. In: Maglogiannis, I., Brahnam, S., Jain, L.C. (eds.) Advanced Computational Intelligence in Healthcare-7. SCI, vol. 891, pp. 65–79. Springer, Heidelberg (2020). https://doi.org/10.1007/978-3-662-61114-2_5

    Chapter  Google Scholar 

  34. Wang, B., Zhu, J., Pierson, E., Ramazzotti, D., Batzoglou, S.: Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning. Nat. Methods 14(4), 414 (2017)

    Article  Google Scholar 

  35. Witten, D.M., et al.: Classification and clustering of sequencing data using a Poisson model. Ann. Appl. Stat. 5(4), 2493–2518 (2011)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Konstantinos I. Chatzilygeroudis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chatzilygeroudis, K.I., Vrahatis, A.G., Tasoulis, S.K., Vrahatis, M.N. (2021). Feature Selection in Single-Cell RNA-seq Data via a Genetic Algorithm. In: Simos, D.E., Pardalos, P.M., Kotsireas, I.S. (eds) Learning and Intelligent Optimization. LION 2021. Lecture Notes in Computer Science(), vol 12931. Springer, Cham. https://doi.org/10.1007/978-3-030-92121-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-92121-7_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-92120-0

  • Online ISBN: 978-3-030-92121-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics