Abstract
Aging process is one of the main unsolved problems of modern biology that affects almost all living species, resulting from multiple interactions of genetics and environmental factors. Numerous studies have shown that DNA methylation changes are one of the most sustainable biomarkers in predicting biological age that has complex relationship with chronological age. This point shows the importance of selecting age-related CpG-sites. Most feature selection methods that have been proposed in this field are problem-dependent techniques for finding important age-related CpG-sites. However, in this study, we propose a general-purpose framework that is problem independent. This adaptive framework is proposed to find the best sequence of feature selection methods and the number of features that selected in each step according to the used dataset. To evaluate our proposed framework, we used two groups of DNA methylation dataset related to blood tissue and non-blood tissues from healthy samples. The results of our adaptive framework have been compared with four studies in terms of mean absolute deviation (MAD) and correlation (R2) separately on blood and non-blood datasets. Our framework achieved MAD of 3.9 years and 5.33 years on the blood and non-blood test datasets, respectively. Also, a correlation (R2) of 95.24% and 91.92% between chronological age and DNAm has been reported on the blood and non-blood test datasets, respectively. The experimental results show that our proposed framework was able to adaptively find the best feature selection method appropriate to the data that has an acceptable performance compared to other studies.
Similar content being viewed by others
Data availability
All used datasets in this work are free and public.
References
Aliferi A, Ballard D, Gallidabino MD, Thurtle H, Barron L, SyndercombeCourt D (2018) DNA methylation-based age prediction using massively parallel sequencing data and multiple machine learning models. Forensic Sci Int Genet 37:215–226
Alkuhlani A, Nassef M, Farag I (2017) Multistage feature selection approach for high-dimensional cancer data. Soft Comput 21(22):6895–6906
Amoozegar M, Minaei-Bidgoli B (2018) Optimizing multi-objective PSO based feature selection method using a feature elitism mechanism. Expert Syst Appl 113:499–514
Bekaert B, Kamalandua A, Zapico SC, Van de Voorde W, Decorte R (2015) A selective set of DNA-methylation markers for age determination of blood, teeth and buccal samples. Forensic Sci Int Genet Suppl Ser 5:e144–e145
Berdyshev GD, Korotaev GK, Boiarskikh GV, Vaniushin BF (1967) Nucleotide composition of DNA and RNA from somatic tissues of humpback and its changes during spawning. Biokhimiia 32:988–993
Bouraoui A, Jamoussi S, BenAyed Y (2018) A multi-objective genetic algorithm for simultaneous model and feature selection for support vector machines. Artif Intell Rev 50(2):261–281
Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28
Dashtban M, Balafar M, Suravajhala P (2018) Gene selection for tumor classification using a novel bio-inspired multi-objective approach. Genomics 110(1):10–17
Di Lena P, Sala C, Nardini C (2021) Estimage: a webserver hub for the computation of methylation age. Nucleic Acids Res 49(W1):W199–W206
Du H, Wang Z, Zhan W, Guo J (2018) Elitism and distance strategy for selection of evolutionary algorithms. IEEE Access 6:44531–44541
Freire-Aradas A et al (2016) Development of a methylation marker set for forensic age estimation using analysis of public methylation data and the Agena Bioscience EpiTYPER system. Forensic Sci Int Genet 24:65–74
Ghosh M, Adhikary S, Ghosh KK, Sardar A, Begum S, Sarkar R (2019) Genetic algorithm based cancerous gene identification from microarray data using ensemble of filter methods. Med Biol Eng Comput 57(1):159–176
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422
Hackett JA, Surani MA (2013) DNA methylation dynamics during the mammalian life cycle. Philos Trans R Soc B Biol Sci 368(1609):20110328
Hannum G et al (2013) Genome-wide methylation profiles reveal quantitative views of human aging rates. Mol Cell 49(2):359–367
Hillary RF et al (2020) Epigenetic measures of ageing predict the prevalence and incidence of leading causes of death and disease burden. Clin Epigenet 12(1):1–12
Hong SR, Jung SE, Lee EH, Shin KJ, Yang WI, Lee HY (2017) DNA methylation-based age prediction from saliva: High age predictability by combination of 7 CpG markers. Forensic Sci Int Genet 29:118–125
Hong SR, Shin KJ, Jung SE, Lee EH, Lee HY (2019) Platform-independent models for age prediction using DNA methylation data. Forensic Sci Int Genet 38:39–47
Hoque N, Bhattacharyya DK, Kalita JK (2014) MIFS-ND: A mutual information-based feature selection method. Expert Syst Appl 41(14):6371–6385
Horvath S (2013) DNA methylation age of human tissues and cell types. Genome Biol 14(10):115
Horvath S et al (2015) Accelerated epigenetic aging in down syndrome. Aging Cell 14(3):491–495
Horvath S et al (2016) An epigenetic clock analysis of race/ethnicity, sex, and coronary heart disease. Genome Biol 17(1):1–23
Horvath S et al (2018) Epigenetic clock for skin and blood cells applied to Hutchinson Gilford Progeria Syndrome and ex vivo studies. Aging (albany, NY) 10(7):1758–1775
Itano F, De Abreu De Sousa MA, Del-Moral-Hernandez E (2018) Extending MLP ANN hyper-parameters Optimization by using Genetic Algorithm. In: Proceedings of the international joint conference on neural networks, vol 2018—July
Jain I, Jain VK, Jain R (2018) Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification. Appl Soft Comput 62:203–215
Jebari K (2013) Selection methods for genetic algorithms. Int J Emerg Sci 3(4):333–344
Jung SE, Lim SM, Hong SR, Lee EH, Shin KJ, Lee HY (2019) DNA methylation of the ELOVL2, FHL2, KLF14, C1orf132/MIR29B2C, and TRIM59 genes for age prediction from blood, saliva, and buccal swab samples. Forensic Sci Int Genet 38:1–8
Kanigur Sultuybek G, Soydas T, Yenmis G (2019) NF-κB as the mediator of Metformin’s effect on ageing and ageing-related diseases. Clin Exp Pharmacol Physiol 46(5):413–422
Lazar C et al (2012) A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans Comput Biol Bioinform 9(4):1106–1119
Lee HY, Lee SD, Shin KJ (2016) Forensic DNA methylation profiling from evidence material for investigative leads. BMB Rep 49(7):359
Levine ME, Lu AT, Bennett DA, Horvath S (2015) Epigenetic age of the pre-frontal cortex is associated with neuritic plaques, amyloid load, and Alzheimer’s disease related cognitive functioning. Aging (albany, NY) 7(12):1198–1211
Levine ME et al (2018) An epigenetic biomarker of aging for lifespan and healthspan. Aging (albany, NY) 10(4):573
Li X, Li W, Xu Y (2018) Human age prediction based on DNA methylation using a gradient boosting regressor. Genes (basel) 9(9):424
Maierhofer A, Flunkert J, Oshima J, Martin GM, Haaf T, Horvath S (2017) Accelerated epigenetic aging in Werner syndrome. Aging (albany, NY) 9(4):1143
Manikandan G, Abirami S (2018) A survey on feature selection and extraction techniques for high-dimensional microarray datasets. Knowl Comput Appl Knowl Comput Specif Domains 2:311–333
McEwen LM et al (2020) The PedBE clock accurately estimates DNA methylation age in pediatric buccal cells. Proc Natl Acad Sci USA 117(38):23329–23335
Momeni Z, Abadeh MS (2019) Mapreduce-based parallel genetic algorithm for CpG-site selection in age prediction. Genes (basel) 10(12):969
Momeni Z, Hassanzadeh E, Saniee Abadeh M, Bellazzi R (2020) A survey on single and multi omics data mining methods in cancer data classification. J Biomed Inform 107:103466
Moslehi F, Haeri A (2019) A novel hybrid wrapper–filter approach based on genetic algorithm, particle swarm optimization for feature subset selection. J Ambient Intell Humaniz Comput 11(3):1105–1127
Nasir IM et al (2020) Pearson correlation-based feature selection for document classification using balanced training. Sensors 20(23):6793
Naue J et al (2017) Chronological age prediction based on DNA methylation: massive parallel sequencing and random forest regression. Forensic Sci Int Genet 31:19–28
Park JL et al (2016) Identification and evaluation of age-correlated DNA methylation markers for forensic use. Forensic Sci Int Genet 23:64–70
Pes B, Dessì N, Angioni M (2017) Exploiting the ensemble paradigm for stable feature selection: a case study on high-dimensional genomic data. Inf Fusion 35:132–147
Pudil P, Novovičová J, Kittler J (1994) Floating search methods in feature selection. Pattern Recognit Lett 15(11):1119–1125
Solorio-Fernández S, Carrasco-Ochoa JA, Martínez-Trinidad JF (2016) A new hybrid filter–wrapper feature selection method for clustering based on ranking. Neurocomputing 214:866–880
Urbanowicz RJ, Meeker M, La Cava W, Olson RS, Moore JH (2018) Relief-based feature selection: introduction and review. J Biomed Inform 85:189–203
Ververidis D, Kotropoulos C (2005) Sequential forward feature selection with low computational cost. IEEE Conference Publication. IEEE Xplore. In: 13th European signal processing conference
Vidaki A, Ballard D, Aliferi A, Miller TH, Barron LP, Syndercombe Court D (2017) DNA methylation-based forensic age prediction using artificial neural networks and next generation sequencing. Forensic Sci Int Genet 28:225–236
Xu C et al (2015) A novel strategy for forensic age prediction by DNA methylation and support vector regression model. Sci Rep 5(1):1–10
Xu Y, Li X, Yang Y, Li C, Shao X (2019) Human age prediction based on DNA methylation of non-blood tissues. Comput Methods Programs Biomed 171:11–18
Yi SH, Jia YS, Mei K, Yang RZ, Huang DX (2015) Age-related DNA methylation changes for forensic age-prediction. Int J Legal Med 129(2):237–244
Zaghlool SB, Al-Shafai M, Al-Muftah WA, Kumar P, Falchi M, Suhre K (2015) Association of DNA methylation with age, gender, and smoking in an Arab population. Clin Epigenet 7(1):7–6
Zbieć-Piekarska R et al (2015) Examination of DNA methylation status of the ELOVL2 marker may be useful for human age prediction in forensic science. Forensic Sci Int Genet 14:161–167
Zbieć-Piekarska R et al (2015) Development of a forensically useful age prediction method based on DNA methylation analysis. Forensic Sci Int Genet 17:173–179
Zhao W et al (2019) Education and lifestyle factors are associated with DNA methylation clocks in older African Americans. Int J Environ Res Public Health 16(17):3141
Zheng SC, Widschwendter M, Teschendorff AE (2016) Epigenetic drift, epigenetic clocks and cancer risk. Epigenomics 8(5):705–719
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (stat Methodol) 67(2):301–320
Funding
The authors have not disclosed any funding.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
There is no conflict of interest regarding this manuscript.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Momeni, Z., Saniee Abadeh, M. Adaptive feature selection framework for DNA methylation-based age prediction. Soft Comput 26, 3777–3788 (2022). https://doi.org/10.1007/s00500-022-06844-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-022-06844-z