Abstract
Advances in high-throughput technologies have accelerated omics data research. The recent explosion of omics, namely transcriptomic data, has opened new opportunities for the discovery of novel biomarkers with potential to be incorporated into clinical practice. However, due to their extreme complexity, gaining useful insights is particularly challenging. Hence, the application of machine learning techniques on transcriptomic data emerges as a highly promising area for the discovery of new biomarkers. For exploring the potential of these techniques, this paper proposes a novel approach to process gene expression data with the aim of finding candidate gene signatures. Our methodology consists of an ensemble feature selection strategy based on the Boruta, SVM-RFE and LASSO methods, complemented by a second feature selection based on the gene importance calculated by the Random Forest, XGBoost, Support Vector Machine, Logistic Regression and AdaBoost methods. Performing simulations with a dataset of atopic dermatitis patients, our proposal resulted in an 8-gene signature with high AUC (0.839) and accuracy (0.8462) values.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Deng, Y., et al.: Potential genetic biomarkers predict adverse pregnancy outcome during early and mid-pregnancy in women with systemic lupus erythematosus. Front. Endocrinol. 13 (2022). https://doi.org/10.3389/fendo.2022.957010
Department of Health & Social Care: Genome UK: 2022 to 2025 implementation plan for England (2022). https://www.gov.uk/government/publications/genome-uk-2022-to-2025-implementation-plan-for-england/genome-uk-2022-to-2025-implementation-plan-for-england. Accessed 07 Dec 2023
Glaab, E.: Using prior knowledge from cellular pathways and molecular networks for diagnostic specimen classification. Brief. Bioinform. 17, 440–452 (2016). https://doi.org/10.1093/bib/bbv044
Glaab, E., Rauschenberger, A., Banzi, R., Gerardi, C., Garcia, P., Demotes, J.: Biomarker discovery studies for patient stratification using machine learning analysis of omics data: a scoping review. BMJ Open 11 (2021). https://doi.org/10.1136/bmjopen-2021-053674
Ha, M.K., et al.: Blood transcriptomics to facilitate diagnosis and stratification in pediatric rheumatic diseases – a proof of concept study. Pediatr. Rheumatol. (2022). https://doi.org/10.1186/s12969-022-00747-x
Hira, Z.M., Gillies, D.F.: A review of feature selection and feature extraction methods applied on microarray data. Adv. Bioinform. 2015 (2015). https://doi.org/10.1155/2015/198363
Jović, A., Brkić, K., Bogunović, N.: A review of feature selection methods with applications. In: 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pp. 1200–1205 (2015)
Kaur, P., Singh, A., Chana, I.: Computational techniques and tools for omics data analysis: state-of-the-art, challenges, and future directions. Arch. Comput. Methods Eng. 28(7), 4595–4631 (2021). https://doi.org/10.1007/s11831-021-09547-0
Kumari, S., Kumar, D., Mittal, M.: An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier. Int. J. Cogn. Comput. Eng. 2, 40–46 (2021). https://doi.org/10.1016/j.ijcce.2021.01.001
Li, Q., Wang, P., Yuan, J., Zhou, Y., Mei, Y., Ye, M.: A two-stage hybrid gene selection algorithm combined with machine learning models to predict the rupture status in intracranial aneurysms. Front. Neuroscience 16 (2022). https://doi.org/10.3389/fnins.2022.1034971
Lim, N., et al.: Curation of over 10 000 transcriptomic studies to enable data reuse. Database 2021 (2021). https://doi.org/10.1093/database/baab006
Martorell-Marugán, J., et al.: Deep learning in omics data analysis and precision medicine. Codon Publications 37–53. (2019). https://doi.org/10.15586/computationalbiology.2019.ch3
Monaco, A., et al.: A primer on machine learning techniques for genomic applications. Comput. Struct. Biotechnol. J. 19, 4345–4359 (2021). https://doi.org/10.1016/j.csbj.2021.07.021
Naithani, N., Sinha, S., Misra, P., Vasudevan, B., Sahu, R.: Precision medicine: concept and tools. Med. J. Armed Forces India 77, 249–257 (2021). https://doi.org/10.1016/j.mjafi.2021.06.021
Olivier, M., Asmis, R., Hawkins, G.A., Howard, T.D., Cox, L.A.: The need for multi-omics biomarker signatures in precision medicine. Int. J. Mol. Sci. 20 (2019). https://doi.org/10.3390/ijms20194781
Pudjihartono, N., Fadason, T., Kempa-Liehr, A.W., O’Sullivan, J.M.: A review of feature selection methods for machine learning-based disease risk prediction. Front. Bioinform. 2 (2022). https://doi.org/10.3389/fbinf.2022.927312
Shi, K., Lin, W., Zhao, X.M.: Identifying molecular biomarkers for diseases with machine learning based on integrative omics. IEEE/ACM Trans. Comput. Biol. Bioinf. 18, 2514–2525 (2021). https://doi.org/10.1109/TCBB.2020.2986387
Srinivas, A., Mosiganti, J.P.: A brain stroke detection model using soft voting based ensemble machine learning classifier. Meas. Sensors 29 (2023). https://doi.org/10.1016/j.measen.2023.100871
Acknowledgments
This work has been supported by FCT – Fundação para a Ciência e Tecnologia within the R&D Units Project Scope: UIDB/00319/2020, and the PhD grant: 2022.12728.BD.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Disclosure of Interests
The authors have no competing interests to declare that are relevant to the content of this article.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Duarte, A., Belo, O. (2025). From Omics Data to Candidate Genes: An Innovative Machine Learning Approach for Biomarker Identification. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2024. Lecture Notes in Computer Science(), vol 15166. Springer, Cham. https://doi.org/10.1007/978-3-031-81596-6_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-81596-6_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-81595-9
Online ISBN: 978-3-031-81596-6
eBook Packages: Computer ScienceComputer Science (R0)