Class Imbalance in the Prediction of Dementia from Neuropsychological Data

Nunes, Cecília; Silva, Dina; Guerreiro, Manuela; de Mendonça, Alexandre; Carvalho, Alexandra M.; Madeira, Sara C.

doi:10.1007/978-3-642-40669-0_13

Cecília Nunes²²,
Dina Silva²³,
Manuela Guerreiro²³,
Alexandre de Mendonça²³,
Alexandra M. Carvalho²⁴ &
…
Sara C. Madeira²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8154))

Included in the following conference series:

Portuguese Conference on Artificial Intelligence

2854 Accesses
4 Citations

Abstract

Class imbalance affects medical diagnosis, as the number of disease cases is often outnumbered. When it is severe, learning algorithms fail to retrieve the rarer classes and common assessment metrics become uninformative. In this work, class imbalance is approached using neuropsychological data, with the aim of differentiating Alzheimer’s Disease (AD) from Mild Cognitive Impairment (MCI) and predicting the conversion from MCI to AD. The effect of the imbalance on four learning algorithms is examined through the application of bagging, Bayes risk minimization and MetaCost. Plain decision trees were always outperformed, indicating susceptibility to the imbalance. The naïve Bayes classifier was robust but suffered a bias that was adjusted through risk minimization. This strategy outperformed all other combinations of classifiers and meta-learning/ensemble methods. The tree-augmented naïve Bayes classifier also benefited from an adjustment of the decision threshold. In the nearly balanced datasets, it was improved by bagging, suggesting that the tree structure was too strong for the attribute dependencies. Support vector machines were robust, as their plain version achieved good results and was never outperformed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Brookmeyer, R., Johnson, E., Ziegler-Graham, K., Arrighi, H.M.: Forecasting the global burden of Alzheimer’s disease. Alzheimers Dementia the Journal of the Alzheimers Association 3(3), 186–191 (2007)
Article Google Scholar
Alzheimer’s Association: Alzheimer’s Disease Facts and Figures. Technical report, Alzheimer’s Association (2012)
Google Scholar
Yesavage, J.A., O’Hara, R., Kraemer, H., Noda, A., Taylor, J.L., Rosen, A., Friedman, L., Sheikh, J., Derouesné, C.: Modeling the prevalence and incidence of Alzheimers disease and mild cognitive impairment. Journal of Psychiatric Research 36, 281–286 (2002)
Article Google Scholar
Maroco, J., Silva, D., Rodrigues, A., Guerreiro, M., Santana, I., Mendonça, A.D.: Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests. BMC Research Notes 4:299 (2011)
Google Scholar
Lemos, L.: A data mining approach to predict conversion from mild cognitive impairment to Alzheimers Disease. Master’s thesis, IST (2012)
Google Scholar
Breiman, L.E.O.: Bagging Predictors. Machine Learning 24(2), 123–140 (1996)
MATH MathSciNet Google Scholar
Kearns, M., Valiant, L.: Cryptographic limitations on learning Boolean formulae and finite automata. Journal of the Association for Computing Machinery 41(1), 67–95 (1994)
Article MATH MathSciNet Google Scholar
Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: one-sided selection. Training, 179–186 (1997)
Google Scholar
Elkan, C.: The Foundations of Cost-Sensitive Learning. In: Int. Joint Conf. on Artificial Intelligence, vol. 17(1), pp. 973–978 (2001)
Google Scholar
Sun, Y., Wong, A.K.C., Kamel, M.S.: Classification of Imbalanced Data: a Review. Int. Journ. of Pattern Recognition and Artificial Intelligence 23(04), 687–719 (2009)
Article Google Scholar
Akbani, R., Kwek, S.S., Japkowicz, N.: Applying support vector machines to imbalanced datasets. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 39–50. Springer, Heidelberg (2004)
Chapter Google Scholar
Wu, G., Chang, E.Y.: Class-Boundary Alignment for Imbalanced Dataset Learning. In: ICML 2003 Workshop on Learning from Imbalanced Data Sets (2003)
Google Scholar
Tao, D., Tang, X.: Assymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(7), 1088–1099 (2006)
Article Google Scholar
Sun, Y., Kamel, M.S., Wong, A.K.C., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recognition 40, 3358–3378 (2007)
Article MATH Google Scholar
Japkowicz, N.: The Class Imbalance Problem: Significance and Strategies. Complexity 1, 111–117 (2000)
Google Scholar
McCarthy, K., Zabar, B., Weiss, G.: Does cost-sensitive learning beat sampling for classifying rare classes? In: Proceedings of the 1st Int. Work. on Utilitybased Data Mining, pp. 69–77. ACM Press, New York (2005)
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research 16(1), 321–357 (2002)
MATH Google Scholar
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005)
Chapter Google Scholar
Garcia, E.A.: ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE Int. Joint Conf. on Neural Networks (IEEE World Congress on Computational Intelligence), vol. (3), pp. 1322–1328 (June 2008)
Google Scholar
Jo, T., Japkowicz, N.: Class imbalances versus small disjuncts. ACM SIGKDD Explorations Newsletter 6(1), 40–49 (2004)
Article MathSciNet Google Scholar
Maloof, M.A.: Learning When Data Sets are Imbalanced and When Costs are Unequal and Unknown. Analysis 21(9), 1263–1284 (2003)
Google Scholar
Breiman, L., Friedman, J.H., Stone, C.J., Olshen, R.A.: Classification and Regression Trees (1984)
Google Scholar
Zadrozny, B., Langford, J., Abe, N.: Cost-Sensitive Learning by Cost-Proportionate Example Weighting. In: Third IEEE Int. Conf. on Data Mining, pp. 435–442 (2003)
Google Scholar
Ting, K.M.: An instance-weighting method to induce cost-sensitive trees (2002)
Google Scholar
Veropoulos, K., Campbell, C., Cristianini, N.: Controlling the Sensitivity of Support Vector Machines. Heart Disease, 55–60 (1999)
Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Information science and statistics, vol. 4. Springer (2006)
Google Scholar
Domingos, P.: MetaCost: A General Method for Making Classifiers Cost-Sensitive. In: Proceedings of the Fifth Int. Conf. on Knowledge Discovery, pp. 155–164 (1999)
Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley (2001)
Google Scholar
Thai-nghe, N., Gantner, Z., Schmidt-thieme, L.: Cost-Sensitive Learning Methods for Imbalanced Data. In: The 2010 Int. Joint Conf. on Neural Networks, pp. 1–8 (2010)
Google Scholar
Lawrence, S., Burns, I., Back, A., Tsoi, A.C., Giles, C.L.: Neural network classification and prior class probabilities. In: Orr, G.B., Müller, K.-R. (eds.) NIPS-WS 1996. LNCS, vol. 1524, pp. 299–314. Springer, Heidelberg (1998)
Chapter Google Scholar
Kubat, M., Holte, R.C., Matwin, S.: Machine Learning for the Detection of Oil Spills in Satellite Radar Images. Machine Learning 30, 195–215 (1998)
Article Google Scholar
Davis, J., Goadrich, M.: The relationship between Precision-Recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning, ICML 2006, pp. 233–240 (2006)
Google Scholar
Silva, D., Guerreiro, M., Maroco, J.A., Santana, I., Rodrigues, A., Bravo Marques, J., de Mendonça, A.: Comparison of Four Verbal Memory Tests for the Diagnosis and Predictive Value of Mild Cognitive Impairment. Dementia and Geriatric Cognitive Disorders Extra 2(1), 120–131 (2012)
Article Google Scholar
Garcia, C.: A Doença de Alzheimer, problemas do diagnóstico clínico. Phd, Universidade de Medicina de Lisboa (1984)
Google Scholar
Hall, M.A.: Correlation-based Feature Selection for Machine Learning. Methodology 21i195-i20, 1–5 (1999)
Google Scholar
Honghai, F., Guoshun, C., Cheng, Y., Bingru, Y., Yumei, C.: A SVM Regression Based Approach to Filling in Missing Values. In: Khosla, R., Howlett, R.J., Jain, L.C. (eds.) KES 2005. LNCS (LNAI), vol. 3683, pp. 581–587. Springer, Heidelberg (2005)
Chapter Google Scholar
Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian Network Classifiers. Machine Learning 29(1), 131–163 (1997)
Article MATH Google Scholar
Bradford, J., Kunz, C., Kohavi, R., Brunk, C.: Pruning Decision Trees with Misclassification Costs. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 131–136. Springer, Heidelberg (1998)
Chapter Google Scholar
Provost, F., Domingos, P.: Well-Trained PETs: Improving Probability Estimation Trees (2000)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explorations 11(1), 10–18 (2009)
Article Google Scholar
Demsar, J.: Statistical Comparison of Classifiers over Multiple Data Sets. Journal of Machine Learning Research 7(7), 1–30 (2006)
MATH MathSciNet Google Scholar
Sheskin, D.J.: Handbook of Parametric and Nonparametric Statistical Procedures, vol. 51. CRC Press (1997)
Google Scholar
Domingos, P., Pazzani, M.: Beyond independence: Conditions for the optimality of the simple Bayesian classifier. Machine Learning 29(2/3), 105–112 (1997)
Article Google Scholar
Thai-nghe, N., Schmidt-thieme, L., Techniques, A.M.: Learning Optimal Threshold on Resampling Data to Deal with Class Imbalance. In: 8th IEEE Int. Conf. on Computing and Communication Technologies: Research, Innovation, and Vision for the Future (2010)
Google Scholar
Quinn, C.J., Coleman, T.P., Kiyavash, N.: Approximating discrete probability distributions with causal dependence trees (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Knowledge Discovery and Bioinformatics (KDBio) Group, INESC-ID and Instituto Superior Técnico (IST), Technical University of Lisbon, Lisbon, Portugal
Cecília Nunes & Sara C. Madeira
Dementia Clinics, Institute of Molecular Medicine and Faculty of Medicine, University of Lisbon, Lisbon, Portugal
Dina Silva, Manuela Guerreiro & Alexandre de Mendonça
Instituto de Telecomunicac̣ões (IT) and Instituto Superior Técnico (IST), Technical University of Lisbon, Lisbon, Portugal
Alexandra M. Carvalho

Authors

Cecília Nunes
View author publications
You can also search for this author in PubMed Google Scholar
Dina Silva
View author publications
You can also search for this author in PubMed Google Scholar
Manuela Guerreiro
View author publications
You can also search for this author in PubMed Google Scholar
Alexandre de Mendonça
View author publications
You can also search for this author in PubMed Google Scholar
Alexandra M. Carvalho
View author publications
You can also search for this author in PubMed Google Scholar
Sara C. Madeira
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Informatics Department, University of Lisbon, Campo Grande, 174-016, Lisbon, Portugal
Luís Correia
Information Systems Department, University of Minho, Campus de Azurém, 4800-058, Guimarães, Portugal
Luís Paulo Reis
Department of Education, University of the Azores, Campus de Angra do Heroísmo, Angra do Heroísma, 9700-042, Azores, Portugal
José Cascalho

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nunes, C., Silva, D., Guerreiro, M., de Mendonça, A., Carvalho, A.M., Madeira, S.C. (2013). Class Imbalance in the Prediction of Dementia from Neuropsychological Data. In: Correia, L., Reis, L.P., Cascalho, J. (eds) Progress in Artificial Intelligence. EPIA 2013. Lecture Notes in Computer Science(), vol 8154. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40669-0_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-40669-0_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40668-3
Online ISBN: 978-3-642-40669-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics