Skip to main content

From Omics Data to Candidate Genes: An Innovative Machine Learning Approach for Biomarker Identification

  • Conference paper
  • First Online:
Artificial Intelligence and Soft Computing (ICAISC 2024)

Abstract

Advances in high-throughput technologies have accelerated omics data research. The recent explosion of omics, namely transcriptomic data, has opened new opportunities for the discovery of novel biomarkers with potential to be incorporated into clinical practice. However, due to their extreme complexity, gaining useful insights is particularly challenging. Hence, the application of machine learning techniques on transcriptomic data emerges as a highly promising area for the discovery of new biomarkers. For exploring the potential of these techniques, this paper proposes a novel approach to process gene expression data with the aim of finding candidate gene signatures. Our methodology consists of an ensemble feature selection strategy based on the Boruta, SVM-RFE and LASSO methods, complemented by a second feature selection based on the gene importance calculated by the Random Forest, XGBoost, Support Vector Machine, Logistic Regression and AdaBoost methods. Performing simulations with a dataset of atopic dermatitis patients, our proposal resulted in an 8-gene signature with high AUC (0.839) and accuracy (0.8462) values.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Deng, Y., et al.: Potential genetic biomarkers predict adverse pregnancy outcome during early and mid-pregnancy in women with systemic lupus erythematosus. Front. Endocrinol. 13 (2022). https://doi.org/10.3389/fendo.2022.957010

  2. Department of Health & Social Care: Genome UK: 2022 to 2025 implementation plan for England (2022). https://www.gov.uk/government/publications/genome-uk-2022-to-2025-implementation-plan-for-england/genome-uk-2022-to-2025-implementation-plan-for-england. Accessed 07 Dec 2023

  3. Glaab, E.: Using prior knowledge from cellular pathways and molecular networks for diagnostic specimen classification. Brief. Bioinform. 17, 440–452 (2016). https://doi.org/10.1093/bib/bbv044

    Article  MATH  Google Scholar 

  4. Glaab, E., Rauschenberger, A., Banzi, R., Gerardi, C., Garcia, P., Demotes, J.: Biomarker discovery studies for patient stratification using machine learning analysis of omics data: a scoping review. BMJ Open 11 (2021). https://doi.org/10.1136/bmjopen-2021-053674

  5. Ha, M.K., et al.: Blood transcriptomics to facilitate diagnosis and stratification in pediatric rheumatic diseases – a proof of concept study. Pediatr. Rheumatol. (2022). https://doi.org/10.1186/s12969-022-00747-x

  6. Hira, Z.M., Gillies, D.F.: A review of feature selection and feature extraction methods applied on microarray data. Adv. Bioinform. 2015 (2015). https://doi.org/10.1155/2015/198363

  7. Jović, A., Brkić, K., Bogunović, N.: A review of feature selection methods with applications. In: 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pp. 1200–1205 (2015)

    Google Scholar 

  8. Kaur, P., Singh, A., Chana, I.: Computational techniques and tools for omics data analysis: state-of-the-art, challenges, and future directions. Arch. Comput. Methods Eng. 28(7), 4595–4631 (2021). https://doi.org/10.1007/s11831-021-09547-0

    Article  MATH  Google Scholar 

  9. Kumari, S., Kumar, D., Mittal, M.: An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier. Int. J. Cogn. Comput. Eng. 2, 40–46 (2021). https://doi.org/10.1016/j.ijcce.2021.01.001

    Article  MATH  Google Scholar 

  10. Li, Q., Wang, P., Yuan, J., Zhou, Y., Mei, Y., Ye, M.: A two-stage hybrid gene selection algorithm combined with machine learning models to predict the rupture status in intracranial aneurysms. Front. Neuroscience 16 (2022). https://doi.org/10.3389/fnins.2022.1034971

  11. Lim, N., et al.: Curation of over 10 000 transcriptomic studies to enable data reuse. Database 2021 (2021). https://doi.org/10.1093/database/baab006

  12. Martorell-Marugán, J., et al.: Deep learning in omics data analysis and precision medicine. Codon Publications 37–53. (2019). https://doi.org/10.15586/computationalbiology.2019.ch3

  13. Monaco, A., et al.: A primer on machine learning techniques for genomic applications. Comput. Struct. Biotechnol. J. 19, 4345–4359 (2021). https://doi.org/10.1016/j.csbj.2021.07.021

    Article  MATH  Google Scholar 

  14. Naithani, N., Sinha, S., Misra, P., Vasudevan, B., Sahu, R.: Precision medicine: concept and tools. Med. J. Armed Forces India 77, 249–257 (2021). https://doi.org/10.1016/j.mjafi.2021.06.021

    Article  Google Scholar 

  15. Olivier, M., Asmis, R., Hawkins, G.A., Howard, T.D., Cox, L.A.: The need for multi-omics biomarker signatures in precision medicine. Int. J. Mol. Sci. 20 (2019). https://doi.org/10.3390/ijms20194781

  16. Pudjihartono, N., Fadason, T., Kempa-Liehr, A.W., O’Sullivan, J.M.: A review of feature selection methods for machine learning-based disease risk prediction. Front. Bioinform. 2 (2022). https://doi.org/10.3389/fbinf.2022.927312

  17. Shi, K., Lin, W., Zhao, X.M.: Identifying molecular biomarkers for diseases with machine learning based on integrative omics. IEEE/ACM Trans. Comput. Biol. Bioinf. 18, 2514–2525 (2021). https://doi.org/10.1109/TCBB.2020.2986387

    Article  MATH  Google Scholar 

  18. Srinivas, A., Mosiganti, J.P.: A brain stroke detection model using soft voting based ensemble machine learning classifier. Meas. Sensors 29 (2023). https://doi.org/10.1016/j.measen.2023.100871

Download references

Acknowledgments

This work has been supported by FCT – Fundação para a Ciência e Tecnologia within the R&D Units Project Scope: UIDB/00319/2020, and the PhD grant: 2022.12728.BD.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ana Duarte .

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Duarte, A., Belo, O. (2025). From Omics Data to Candidate Genes: An Innovative Machine Learning Approach for Biomarker Identification. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2024. Lecture Notes in Computer Science(), vol 15166. Springer, Cham. https://doi.org/10.1007/978-3-031-81596-6_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-81596-6_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-81595-9

  • Online ISBN: 978-3-031-81596-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics