Skip to main content

An Ensemble Feature Selection Framework for the Early Non-invasive Prediction of Parkinson’s Disease from Imbalanced Microarray Data

  • Conference paper
  • First Online:
Advances in Computing and Data Sciences (ICACDS 2022)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1614))

Included in the following conference series:

  • 527 Accesses

Abstract

Early identification of Parkinson’s disease (PD) using a non-invasive method is essential to slow down the disease progression with appropriate therapy. This can be accomplished by analysing microarray gene expression data from blood samples. This study proposes a computational framework for predicting PD from blood-based microarray gene expression data. Pre-processing, data balancing and feature reduction, and prediction are the three stages of the proposed system. In the pre-processing stage, annotation, cross-platform normalisation, and integration were performed. Balanced subsets were created using k-means clustering on majority samples and random undersampling. The ANOVA filter extracted critical features from balanced subsets in the feature reduction stage, and various cost-sensitive classification models and an ensemble model were built in the prediction stage. The method could achieve an AUC of 82.6% using the cost-sensitive Logistic regression classifier and 83.2% using the ensemble model on independent test data. The experimental results indicate that the suggested framework could effectively diagnose PD at the early stages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Blauwendraat, C., Nalls, M.A., Singleton, A.B.: The genetic architecture of Parkinson’s disease. Lancet Neurol. 19(2), 170–178 (2020)

    Article  Google Scholar 

  2. Karlsson, M.K., et al.: Found in transcription: accurate Parkinson’s disease classification in peripheral blood. J. Parkinson’s Dis. 3(1), 19–29 (2013)

    Google Scholar 

  3. Keo, A., et al.: Transcriptomic signatures of brain regional vulnerability to Parkinson’s disease. Commun. Biol. 3(1), 1–12 (2020)

    Google Scholar 

  4. Benoit, S.M., et al.: Expanding the search for genetic biomarkers of Parkinson’s disease into the living brain. Neurobiol. Dis. 140, 104872 (2020)

    Article  Google Scholar 

  5. Scherzer, C.R., et al.: Molecular markers of early Parkinson’s disease based on gene expression in blood. Proc. Natl. Acad. Sci. 104(3), 955–960 (2007)

    Google Scholar 

  6. Pinho, R., et al.: Gene expression differences in peripheral blood of Parkinson’s disease patients with distinct progression profiles. PLoS ONE 11(6), e0157852 (2016)

    Google Scholar 

  7. Shamir, R., et al.: Analysis of blood-based gene expression in idiopathic Parkinson disease. Neurology 89(16), 1676–1683 (2017)

    Google Scholar 

  8. Augustine, J., Jereesh, A.S.: Blood-based gene-expression biomarkers identification for the non-invasive diagnosis of Parkinson’s disease using two-layer hybrid feature selection. Gene 823, 146366 (2022)

    Google Scholar 

  9. Falchetti, M., Prediger, R.D., Zanotto-Filho, A.: Classification algorithms applied to blood-based transcriptome meta-analysis to predict idiopathic Parkinson’s disease. Comput. Biol. Med. 124, 103925 (2020)

    Google Scholar 

  10. Jiang, F., Qianqian, W., Sun, S., Bi, G., Guo, L.: Identification of potential diagnostic biomarkers for Parkinson’s disease. FEBS Open Bio 9(8), 1460–1468 (2019)

    Article  Google Scholar 

  11. Barrett, T., et al.: NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res. 41(D1), D991–D995 (2012)

    Google Scholar 

  12. Hoehn, M.M., Yahr, M.D.: Parkinsonism: onset, progression and mortality. Neurology, 50, 318–318 (2001)

    Google Scholar 

  13. Locascio, J.J., et al.: Association between α-synuclein blood transcripts and early, neuroimaging-supported Parkinson’s disease. Brain 138(9), 2659–2671 (2015)

    Google Scholar 

  14. Calligaris, R., et al.: Blood transcriptomics of drug-naive sporadic Parkinson’s disease patients. BMC Genomics 16(1), 1–14 (2015)

    Google Scholar 

  15. Irizarry, R.A., Bolstad, B.M., Collin, F., Cope, L.M., Hobbs, B., Speed, T.P.: Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 31(4), e15–e15 (2003)

    Article  Google Scholar 

  16. Shi, W., Oshlack, A., Smyth, G.K.: Optimizing the noise versus bias trade-off for Illumina whole genome expression BeadChips. Nucleic Acids Res. 38(22), e204–e204 (2010)

    Article  Google Scholar 

  17. Cheadle, C., Vawter, M.P., Freed, W.J., Becker, K.G.: Analysis of microarray data using Z score transformation. J. Mol. Diagn. 5(2), 73–81 (2003)

    Article  Google Scholar 

  18. Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)

    Google Scholar 

  19. Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 6(1), 20–29 (2004)

    Google Scholar 

  20. Johnson, K.J., Synovec, R.E.: Pattern recognition of jet fuels: comprehensive GC× GC with ANOVA-based feature selection and principal component analysis. Chemom. Intell. Lab. Syst. 60(1–2), 225–237 (2002)

    Article  Google Scholar 

  21. Elkan, C.: The foundations of cost-sensitive learning. In: International Joint Conference on Artificial Intelligence, vol. 17, no. 1, pp. 973–978 (2001)

    Google Scholar 

  22. He, H., Garcia, E.A.: Learning from imbalanced data IEEE transactions on knowledge and data engineering, vol. 21, pp. 1263–1284 (2009)

    Google Scholar 

  23. Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)

    Article  MathSciNet  Google Scholar 

  24. Tomek, I.: Two modifications of CNN. IEEE Trans. Syst. Man Cybern. 6, 769–772 (1976)

    Google Scholar 

  25. Chawla, N.V., Bowyer, K.W., Hall, L.O., Philip Kegelmeyer, W.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Google Scholar 

  26. He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1322–1328. IEEE (2008)

    Google Scholar 

  27. Gopar-Cuevas, Y., et al.: Pursuing multiple biomarkers for early idiopathic Parkinson’s disease diagnosis. Mol. Neurobiol. 58(11), 5517–5532 (2021). https://doi.org/10.1007/s12035-021-02500-z

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jisha Augustine .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Augustine, J., Jereesh, A.S. (2022). An Ensemble Feature Selection Framework for the Early Non-invasive Prediction of Parkinson’s Disease from Imbalanced Microarray Data. In: Singh, M., Tyagi, V., Gupta, P.K., Flusser, J., Ören, T. (eds) Advances in Computing and Data Sciences. ICACDS 2022. Communications in Computer and Information Science, vol 1614. Springer, Cham. https://doi.org/10.1007/978-3-031-12641-3_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-12641-3_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-12640-6

  • Online ISBN: 978-3-031-12641-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics