Abstract
Early identification of Parkinson’s disease (PD) using a non-invasive method is essential to slow down the disease progression with appropriate therapy. This can be accomplished by analysing microarray gene expression data from blood samples. This study proposes a computational framework for predicting PD from blood-based microarray gene expression data. Pre-processing, data balancing and feature reduction, and prediction are the three stages of the proposed system. In the pre-processing stage, annotation, cross-platform normalisation, and integration were performed. Balanced subsets were created using k-means clustering on majority samples and random undersampling. The ANOVA filter extracted critical features from balanced subsets in the feature reduction stage, and various cost-sensitive classification models and an ensemble model were built in the prediction stage. The method could achieve an AUC of 82.6% using the cost-sensitive Logistic regression classifier and 83.2% using the ensemble model on independent test data. The experimental results indicate that the suggested framework could effectively diagnose PD at the early stages.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Blauwendraat, C., Nalls, M.A., Singleton, A.B.: The genetic architecture of Parkinson’s disease. Lancet Neurol. 19(2), 170–178 (2020)
Karlsson, M.K., et al.: Found in transcription: accurate Parkinson’s disease classification in peripheral blood. J. Parkinson’s Dis. 3(1), 19–29 (2013)
Keo, A., et al.: Transcriptomic signatures of brain regional vulnerability to Parkinson’s disease. Commun. Biol. 3(1), 1–12 (2020)
Benoit, S.M., et al.: Expanding the search for genetic biomarkers of Parkinson’s disease into the living brain. Neurobiol. Dis. 140, 104872 (2020)
Scherzer, C.R., et al.: Molecular markers of early Parkinson’s disease based on gene expression in blood. Proc. Natl. Acad. Sci. 104(3), 955–960 (2007)
Pinho, R., et al.: Gene expression differences in peripheral blood of Parkinson’s disease patients with distinct progression profiles. PLoS ONE 11(6), e0157852 (2016)
Shamir, R., et al.: Analysis of blood-based gene expression in idiopathic Parkinson disease. Neurology 89(16), 1676–1683 (2017)
Augustine, J., Jereesh, A.S.: Blood-based gene-expression biomarkers identification for the non-invasive diagnosis of Parkinson’s disease using two-layer hybrid feature selection. Gene 823, 146366 (2022)
Falchetti, M., Prediger, R.D., Zanotto-Filho, A.: Classification algorithms applied to blood-based transcriptome meta-analysis to predict idiopathic Parkinson’s disease. Comput. Biol. Med. 124, 103925 (2020)
Jiang, F., Qianqian, W., Sun, S., Bi, G., Guo, L.: Identification of potential diagnostic biomarkers for Parkinson’s disease. FEBS Open Bio 9(8), 1460–1468 (2019)
Barrett, T., et al.: NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res. 41(D1), D991–D995 (2012)
Hoehn, M.M., Yahr, M.D.: Parkinsonism: onset, progression and mortality. Neurology, 50, 318–318 (2001)
Locascio, J.J., et al.: Association between α-synuclein blood transcripts and early, neuroimaging-supported Parkinson’s disease. Brain 138(9), 2659–2671 (2015)
Calligaris, R., et al.: Blood transcriptomics of drug-naive sporadic Parkinson’s disease patients. BMC Genomics 16(1), 1–14 (2015)
Irizarry, R.A., Bolstad, B.M., Collin, F., Cope, L.M., Hobbs, B., Speed, T.P.: Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 31(4), e15–e15 (2003)
Shi, W., Oshlack, A., Smyth, G.K.: Optimizing the noise versus bias trade-off for Illumina whole genome expression BeadChips. Nucleic Acids Res. 38(22), e204–e204 (2010)
Cheadle, C., Vawter, M.P., Freed, W.J., Becker, K.G.: Analysis of microarray data using Z score transformation. J. Mol. Diagn. 5(2), 73–81 (2003)
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 6(1), 20–29 (2004)
Johnson, K.J., Synovec, R.E.: Pattern recognition of jet fuels: comprehensive GC× GC with ANOVA-based feature selection and principal component analysis. Chemom. Intell. Lab. Syst. 60(1–2), 225–237 (2002)
Elkan, C.: The foundations of cost-sensitive learning. In: International Joint Conference on Artificial Intelligence, vol. 17, no. 1, pp. 973–978 (2001)
He, H., Garcia, E.A.: Learning from imbalanced data IEEE transactions on knowledge and data engineering, vol. 21, pp. 1263–1284 (2009)
Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)
Tomek, I.: Two modifications of CNN. IEEE Trans. Syst. Man Cybern. 6, 769–772 (1976)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Philip Kegelmeyer, W.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1322–1328. IEEE (2008)
Gopar-Cuevas, Y., et al.: Pursuing multiple biomarkers for early idiopathic Parkinson’s disease diagnosis. Mol. Neurobiol. 58(11), 5517–5532 (2021). https://doi.org/10.1007/s12035-021-02500-z
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Augustine, J., Jereesh, A.S. (2022). An Ensemble Feature Selection Framework for the Early Non-invasive Prediction of Parkinson’s Disease from Imbalanced Microarray Data. In: Singh, M., Tyagi, V., Gupta, P.K., Flusser, J., Ören, T. (eds) Advances in Computing and Data Sciences. ICACDS 2022. Communications in Computer and Information Science, vol 1614. Springer, Cham. https://doi.org/10.1007/978-3-031-12641-3_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-12641-3_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-12640-6
Online ISBN: 978-3-031-12641-3
eBook Packages: Computer ScienceComputer Science (R0)