HBS–STACK: hierarchical biomarker selection and stacked ensemble model for biomarker identification and cancer prediction on multi-omics

Dhillon, Arwinder; Singh, Ashima; Bhalla, Vinod Kumar

doi:10.1007/s00521-023-09359-2

HBS–STACK: hierarchical biomarker selection and stacked ensemble model for biomarker identification and cancer prediction on multi-omics

Original Article
Published: 03 January 2024

Volume 36, pages 5413–5431, (2024)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

484 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Genomic and transcriptomic data development has provided new prospects for biomarker identification and cancer prediction. However, it is challenging to capture the biological dataset with complex and nonlinear associations using existing biomarkers and cancer diagnosis techniques. Machine learning offers enormous potential for creating feature selection techniques and models to identify cancer biomarkers. In this article, we propose a Hierarchical Biomarker Selection and Stacked Ensemble model for Biomarker Identification and Cancer Prediction (HBS–STACK) on miRNA, gene expression, and DNA Methylation (DM) datasets. Three-stage biomarker selection is developed comprising an aggregation of information between CpG sites and genes by considering the biological relations at stage 1, Fold Change and False Discovery Rate selection at stage 2, and Light Gradient Boosting Machine with Recursive Feature Elimination (LBGMRFE) selection at stage 3. The selected features and markers are integrated and passed to stacked ML models comprising Gradient Boosting Machine (GBM), Naïve Bayes (NB), Random Forest (RF) at level 1 learning, and DNN at level 2 learning. HBS–STACK is evaluated on breast cancer (BRCA) and is validated on kidney renal clear cell carcinoma (KIRC) from TCGA (The Cancer Genome Atlas) Portal and on Alzheimer Disease. We found several genomic and transcriptomic biomarkers comprising IQSEC1 for BRCA, ZFHX3, CTBP2, and SLC9AR2 for KIRC and TMEM61 for Alzheimer disease, respectively. The experimental results show that the HBS–STACK outperformed GBM, NB, and RF with 99.60, 99.03, and 92.05% accuracy and shows an improvement of 2.27, 26.03, 10.05% in performance compared with existing techniques on BRCA, KIRC, and Alzheimer, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Netrank: network-based approach for biomarker discovery

Article Open access 29 July 2023

GeneSelectML: a comprehensive way of gene selection for RNA-Seq data via machine learning algorithms

Article 10 November 2022

An integrated ensemble learning technique for gene expression classification and biomarker identification from RNA-seq data for pancreatic cancer prognosis

Article 16 January 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The datasets will be made available on suitable request.

References

Vargas AJ, Harris CC (2019) Cancer as a case study. Biomakers 16:525–537. https://doi.org/10.1038/nrc.2016.56.Biomarker
Article Google Scholar
One in every 15 Indians will die of cancer, says WHO report. https://theprint.in/health/one-in-every-15-indians-will-die-of-cancer-says-who-report/359394/. Accessed 14 Feb 2022
Smith TR, Miller MS, Lohman KK et al (2003) DNA damage and breast cancer risk. Carcinogenesis 24:883–889. https://doi.org/10.1093/carcin/bgg037
Article CAS PubMed Google Scholar
Raweh AA, Nassef M, Badr A et al (2020) Identifying a miRNA signature for predicting the stage of breast cancer. Cancers (Basel) 12:1–14. https://doi.org/10.18632/oncotarget.2915
Article Google Scholar
Das T, Andrieux G, Ahmed M, Chakraborty S (2020) Integration of online omics-data resources for cancer research. Front Genet 11:1–24. https://doi.org/10.3389/fgene.2020.578345
Article CAS Google Scholar
Reel PS, Reel S, Pearson E et al (2021) Using machine learning approaches for multi-omics data analysis: a review. Biotechnol Adv 49:107739. https://doi.org/10.1016/j.biotechadv.2021.107739
Article CAS PubMed Google Scholar
Lazar C, Taminau J, Meganck S et al (2012) Survey of filter techniques for feature selection in MicroArrays. IEEE Trans Comput Biol Bioinform 9:1106–1119
Article Google Scholar
Raweh AA, Nassef M, Badr A (2018) A hybridized feature selection and extraction approach for enhancing cancer prediction based on DNA methylation. IEEE Access 6:15212–15223. https://doi.org/10.1109/ACCESS.2018.2812734
Article Google Scholar
Yasuda T, Bateni M, Chen L, et al (2022) Sequential attention for feature selection, pp 1–21
Zhao, Z., Zhang, Y., Harinen, T., Yung M (2022) Feature selection methods for uplift modeling and heterogeneous treatment effect. In: IFIP international conference on artificial intelligence applications and innovations. Springer: Cham, pp 217–230
Tang XF, Shi Z, Jin M (2021) Multi-category multi-state information ensemble-based classification method for precise diagnosis of three cancers. Neural Comput Appl 33:15901–15917. https://doi.org/10.1007/s00521-021-06211-3
Article Google Scholar
Huang MW, Chen CW, Lin WC et al (2017) SVM and SVM ensembles in breast cancer prediction. PLoS ONE 12:1–14. https://doi.org/10.1371/journal.pone.0161501
Article CAS Google Scholar
Cho S-B, Won H-H (2003) Machine learning in DNA microarray analysis for cancer classification. Proc First Asia-Pacific Bioinform Conf Bioinform 19:189–198
Google Scholar
Sun L, Zhang X, Qian Y et al (2019) Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification. Inf Sci (N Y) 502:18–41. https://doi.org/10.1016/j.ins.2019.05.072
Article MathSciNet Google Scholar
Li L, Ching WK, Liu ZP (2022) Robust biomarker screening from gene expression data by stable machine learning-recursive feature elimination methods. Comput Biol Chem 100:107747. https://doi.org/10.1016/j.compbiolchem.2022.107747
Article CAS PubMed Google Scholar
Liaw A, Wiener M (2002) The R Journal: classification and regression by randomForest. R Journal 2:18–22
Google Scholar
Genomic Data Commons Data Portal. https://portal.gdc.cancer.gov/. Accessed 10 Jan 2022
Rehman O, Zhuang H, Ali AM, Ibrahim A (2019) Validation of miRNAs as breast cancer biomarkers with a machine learning approach. Cancers (Basel) 11:431. https://doi.org/10.3390/cancers11030431
Article CAS PubMed Google Scholar
Danaee P, Ghaeini R, Hendrix DA (2017) A deep learning approach for cancer detection and relevant gene identification. Pac Symp Biocomput. https://doi.org/10.1142/9789813207813_0022
Article PubMed Google Scholar
Alghunaim S, Al-Baity HH (2019) On the scalability of machine-learning algorithms for breast cancer prediction in big data context. IEEE Access 7:91535–91546. https://doi.org/10.1109/ACCESS.2019.2927080
Article Google Scholar
Jeon H, Oh S (2020) Hybrid-recursive feature elimination for efficient feature selection. Appl Sci 10(9):1–8
Article Google Scholar
Zhang G, Xue Z, Yan C et al (2021) A novel biomarker identification approach for gastric cancer using gene expression and DNA methylation dataset. Front Genet. https://doi.org/10.3389/fgene.2021.644378
Article PubMed PubMed Central Google Scholar
Wang T, Shao W, Huang Z et al (2021) MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification. Nat Commun 12:1–13. https://doi.org/10.1038/s41467-021-23774-w
Article CAS Google Scholar
Choi JM, Chae H (2023) moBRCA-net: a breast cancer subtype classification framework based on multi-omics attention neural networks. BMC Bioinform 24:1–15. https://doi.org/10.1186/s12859-023-05273-5
Article Google Scholar
Garzon R, Fabbri M, Cimmino A et al (2006) MicroRNA expression and function in cancer. Trends Mol Med 12:580–587. https://doi.org/10.1016/j.molmed.2006.10.006
Article CAS PubMed Google Scholar
Wessely F, Emes RD (2012) Identication of DNA methylation biomarkers from Innium arrays. Front Genet 3:1–8. https://doi.org/10.3389/fgene.2012.00161
Article Google Scholar
Shobha G, Rangaswamy S (2018) Machine learning, 1st edn. Amsterdam, Elsevier
Google Scholar
Yiu T (2019) Understanding Random Forest. https://towardsdatascience.com/understanding-random-forest-58381e0602d2. Accessed 2 Mar 2022
Natekin A, Knoll A (2013) Gradient boosting machines, a tutorial. Front Neurorobot. https://doi.org/10.3389/fnbot.2013.00021
Article PubMed PubMed Central Google Scholar
Montavon G, Samek W, Müller KR (2018) Methods for interpreting and understanding deep neural networks. Digit Signal Proc Rev J 73:1–15. https://doi.org/10.1016/j.dsp.2017.10.011
Article MathSciNet Google Scholar
Pavlyshenko B (2018) Using stacking approaches for machine learning models. In: Proceedings of the 2018 IEEE 2nd international conference on data stream mining and processing, DSMP 2018 pp. 255–258. https://doi.org/10.1109/DSMP.2018.8478522
Stacked Models, Hands-On Machine Learning with R (2020). https://bradleyboehmke.github.io/HOML/stacking.html. Accessed 12 Jan 2022
impute.knn: A function to impute missing expression data. https://www.rdocumentation.org/packages/impute/versions/1.46.0/topics/impute.knn. Accessed 12 Jan 2022
Pavya K, Srinivasan DB (2017) Feature selection techniques in data mining: a study. Int J Sci Dev Res 2:594–598
Google Scholar
Witten D (2007) A comparison of fold-change and the t-statistic for microarray data analysis. Analysis 1776:58–85
Google Scholar
Norris AW, Kahn CR (2006) Analysis of gene expression in pathophysiological states: Balancing false discovery and false negative rates. Proc Natl Acad Sci U S A 103:649–653. https://doi.org/10.1073/pnas.0510115103
Article ADS CAS PubMed PubMed Central Google Scholar
Shen Z (2020) A Novel Hybrid Classification Model - LightGBM With Neural Net. https://zitaoshen.rbind.io/project/machine_learning/a-novel-hybrid-classification-model-lightgbm-with-neural-net/. Accessed 23 Jan 2022
Wang D, Li JR, Zhang YH et al (2018) Identification of differentially expressed genes between original breast cancer and xenograft using machine learning algorithms. Genes (Basel) 9:1–15. https://doi.org/10.3390/genes9030155
Article CAS Google Scholar
Ma B, Meng F, Yan G et al (2020) Diagnostic classification of cancers using extreme gradient boosting algorithm and multi-omics data. Comput Biol Med 121:103761. https://doi.org/10.1016/j.compbiomed.2020.103761
Article CAS PubMed Google Scholar
Li MW, Xu DY, Geng J, Hong WC (2022) A hybrid approach for forecasting ship motion using CNN–GRU–AM and GCWOA. Appl Soft Comput 114:108084. https://doi.org/10.1016/j.asoc.2021.108084
Article Google Scholar
Sultan G (2019) Towards the early detection of ductal carcinoma (a common type of breast cancer) using biomarkers linked to the PPAR(γ) signaling pathway. Bioinformation 15:799–805. https://doi.org/10.6026/97320630015799
Article PubMed PubMed Central Google Scholar
Hunter S, Nault B, Ugwuagbo KC et al (2019) Mir526b and mir655 promote tumour associated angiogenesis and lymphangiogenesis in breast cancer. Cancers (Basel). https://doi.org/10.3390/cancers11070938
Article PubMed PubMed Central Google Scholar
Martinez-Ledesma E, Verhaak RGW, Treviño V (2015) Identification of a multi-cancer gene expression biomarker for cancer clinical outcomes using a network-based algorithm. Sci Rep 5:1–14. https://doi.org/10.1038/srep11966
Article CAS Google Scholar
Salas LA, Johnson KC, Koestler DC et al (2017) Integrative epigenetic and genetic pan-cancer somatic alteration portraits. Epigenetics 12:561–574. https://doi.org/10.1080/15592294.2017.1319043
Article PubMed PubMed Central Google Scholar
Zhu H, Lu J, Zhao H et al (2018) Functional long noncoding RNAs (IncRNAs) in clear cell kidney carcinoma revealed by reconstruction and comprehensive analysis of the lncRNA–miRNA–mRNA regulatory network. Med Sci Monit 24:8250–8263. https://doi.org/10.12659/MSM.910773
Article CAS PubMed PubMed Central Google Scholar
Zong X, Fu J, Wang Z, Wang Q (2022) The diagnostic and prognostic values of HOXA gene family in kidney clear cell renal cell carcinoma. J Oncol 2022:1–14. https://doi.org/10.1155/2022/1762637
Article CAS Google Scholar
Han G, Zhao W, Song X et al (2017) Unique protein expression signatures of survival time in kidney renal clear cell carcinoma through a pan-cancer screening. BMC Genom. https://doi.org/10.1186/s12864-017-4026-6
Article Google Scholar
Zheng X, Song T, Dou C et al (2015) CtBP2 is an independent prognostic marker that promotes GLI1 induced epithelial-mesenchymal transition in hepatocellular carcinoma. Oncotarget 6:3752–3769. https://doi.org/10.18632/oncotarget.2915
Article PubMed PubMed Central Google Scholar
Aboulouard S, Wisztorski M, Duhamel M et al (2021) In-depth proteomics analysis of sentinel lymph nodes from individuals with endometrial cancer. Cell Rep Med 2:100318. https://doi.org/10.1016/j.xcrm.2021.100318
Article CAS PubMed PubMed Central Google Scholar
Ali M, Archer DB, Gorijala P et al (2023) Large multi-ethnic genetic analyses of amyloid imaging identify new genes for Alzheimer disease. Acta Neuropathol Commun 11:1–20. https://doi.org/10.1186/s40478-023-01563-4
Article CAS Google Scholar
Vasanthakumar A, Davis JW, Idler K et al (2020) Harnessing peripheral DNA methylation differences in the Alzheimer’s Disease Neuroimaging Initiative (ADNI) to reveal novel biomarkers of disease. Clin Epigenet 12:1–11. https://doi.org/10.1186/s13148-020-00864-y
Article CAS Google Scholar
Silva GJJ, Bye A, el Azzouzi H, Wisløff U (2017) MicroRNAs as important regulators of exercise adaptation. Prog Cardiovasc Dis 60:130–151. https://doi.org/10.1016/j.pcad.2017.06.003
Article PubMed Google Scholar
Brownlee J (2016) Naive Bayes for machine learning. https://machinelearningmastery.com/naive-bayes-for-machine-learning/. Accessed 28 Feb 2022

Download references

Funding

The authors have no funding to report.

Author information

Authors and Affiliations

Computer Science and Engineering Department, Thapar Institute of Engineering and Technology, Patiala, Punjab, India
Arwinder Dhillon, Ashima Singh & Vinod Kumar Bhalla

Authors

Arwinder Dhillon
View author publications
You can also search for this author in PubMed Google Scholar
Ashima Singh
View author publications
You can also search for this author in PubMed Google Scholar
Vinod Kumar Bhalla
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arwinder Dhillon.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Ethical standards

The author declares that this article complies the ethical standard.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix I: Algorithm for proposed HBS–STACK

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Dhillon, A., Singh, A. & Bhalla, V.K. HBS–STACK: hierarchical biomarker selection and stacked ensemble model for biomarker identification and cancer prediction on multi-omics. Neural Comput & Applic 36, 5413–5431 (2024). https://doi.org/10.1007/s00521-023-09359-2

Download citation

Received: 04 May 2022
Accepted: 07 December 2023
Published: 03 January 2024
Issue Date: April 2024
DOI: https://doi.org/10.1007/s00521-023-09359-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HBS–STACK: hierarchical biomarker selection and stacked ensemble model for biomarker identification and cancer prediction on multi-omics

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Netrank: network-based approach for biomarker discovery

GeneSelectML: a comprehensive way of gene selection for RNA-Seq data via machine learning algorithms

An integrated ensemble learning technique for gene expression classification and biomarker identification from RNA-seq data for pancreatic cancer prognosis

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical standards

Additional information

Publisher's Note

Appendix I: Algorithm for proposed HBS–STACK

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

HBS–STACK: hierarchical biomarker selection and stacked ensemble model for biomarker identification and cancer prediction on multi-omics

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Netrank: network-based approach for biomarker discovery

GeneSelectML: a comprehensive way of gene selection for RNA-Seq data via machine learning algorithms

An integrated ensemble learning technique for gene expression classification and biomarker identification from RNA-seq data for pancreatic cancer prognosis

Explore related subjects

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical standards

Additional information

Publisher's Note

Appendix I: Algorithm for proposed HBS–STACK

Appendix I: Algorithm for proposed HBS–STACK

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation