Abstract
The evaluation of risk factors for falls (RFF) is a key point in fall prevention for the elderly. Since the information of the main actionable RFF can not always be regularly re-evaluated by medical factors, their automatic prediction would allow providing useful recommendations to reduce the risk of falls. This article explores the advantages of three oversampling methods to improve the quality of the prediction of 12 target RFF on the basis of a real imbalanced data set. We first present the data set, together with the selection of 45 variables and 12 target variables and other pre-processing steps. Second, we present the three oversampling methods, SMOTE, SMOTE-SVM, and ADASYN, the classifiers (Logistic Regression, Random Forest, Bayesian Network, Artificial Neural Network, and Naive Bayes), and the quality measures that we use in this study (balanced accuracy, area under ROC curve, area under Precision-Recall curve, F1 and F2 score). Each target is successively evaluated from all other variables. Results are presented by the classifier (averaging over targets) and by target (averaging over classifiers), for each oversampling method and quality measure. Finally, statistical tests validate the interest of using oversampling methods. The three methods demonstrate a clear advantage in comparison with the imbalanced data set, and SVM-SMOTE provides the best increment.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alasadi, S.A., Bhaya, W.S.: Review of data preprocessing techniques in data mining. J. Eng. Appl. Sci. 12(16), 4102–4107 (2017)
Apsemidis, A., Psarakis, S.: Support vector machines: a review and applications in statistical process monitoring. Data Anal. Appl. 3: Comput. Classif. Financ. Stat. Stochastic Methods 5, 123–144 (2020)
Azar, A.T., Elshazly, H.I., Hassanien, A.E., Elkorany, A.M.: A random forest classifier for lymph diseases. Comput. Methods Programs Biomed. 113(2), 465–473 (2014)
Cahyana, N., Khomsah, S., Aribowo, A.S.: Improving imbalanced dataset classification using oversampling and gradient boosting. In: 2019 5th International Conference on Science in Information Technology (ICSITech), pp. 217–222. IEEE (2019)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Cheng, J., Greiner, R.: Comparing Bayesian network classifiers. arXiv preprint arXiv:1301.6684 (2013)
Chu, X., Ilyas, I.F., Krishnan, S., Wang, J.: Data cleaning: overview and emerging challenges. In: Proceedings of the 2016 International Conference on Management of Data, pp. 2201–2206 (2016)
Delcroix, V., Essghaier, F., Oliveira, K., Pudlo, P., Gaxatte, C., Puisieux, F.: Towards a fall prevention system design by using ontology. En lien avec les Journées francophones d’Ingénierie des Connaissances, Plate-Forme PFIA (2019)
Francis, S., Prasad, P., Zahoor-Ul-Huq, s.: Medical data classification based on smote and recurrent neural network. Int. J. Eng. Adv. Technol. 9 (2020). https://doi.org/10.35940/ijeat.C5444.029320
He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1322–1328. IEEE (2008)
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Hosmer, D.W., Jr., Lemeshow, S., Sturdivant, R.X.: Applied Logistic Regression, vol. 398. Wiley, Hoboken (2013)
Huang, X., Shi, L., Suykens, J.A.: Support vector machine classifier with pinball loss. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 984–997 (2013)
Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques. MIT Press, Cambridge (2009)
Kotsiantis, S.B., Kanellopoulos, D., Pintelas, P.E.: Data preprocessing for supervised leaning. Int. J. Comput. Sci. 1(2), 111–117 (2006)
Lin, J.T., Lane, J.M.: Falls in the elderly population. Phys. Med. Rehabil. Clin. 16(1), 109–128 (2005)
Nalepa, J., Kawulok, M.: Selecting training sets for support vector machines: a review. Artif. Intell. Rev. 52(2), 857–900 (2019)
Obiedat, R., et al.: Sentiment analysis of customers’ reviews using a hybrid evolutionary SVM-based approach in an imbalanced data distribution. IEEE Access 10, 22260–22273 (2022)
Rahman, M.M., Davis, D.N.: Machine learning-based missing value imputation method for clinical datasets. In: Yang, G.C., Ao, S., Gelman, L. (eds.) IAENG Transactions on Engineering Technologies. Lecture Notes in Electrical Engineering, vol. 229, pp. 245–257. Springer, Dordrecht (2013). https://doi.org/10.1007/978-94-007-6190-2_19
Rish, I., et al.: An empirical study of the naive bayes classifier. In: IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, vol. 3, pp. 41–46 (2001)
Russell, S., Norvig, P.: Artificial intelligence: a modern approach (2002)
Sihag, G., et al.: Evaluation of risk factors for fall in elderly using Bayesian networks: a case study. Comput. Methods Program. Biomed. Update 1, 100035 (2021)
Sihag., G., et al.: Evaluation of risk factors for fall in elderly people from imbalanced data using the oversampling technique smote. In: Proceedings of the 8th International Conference on Information and Communication Technologies for Ageing Well and e-Health - ICT4AWE, pp. 50–58. INSTICC, SciTePress (2022). https://doi.org/10.5220/0011041200003188
Sokolova, M., Japkowicz, N., Szpakowicz, S.: Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. In: Sattar, A., Kang, B. (eds.) AI 2006. LNCS (LNAI), vol. 4304, pp. 1015–1021. Springer, Heidelberg (2006). https://doi.org/10.1007/11941439_114
Wu, T.K., Huang, S.C., Meng, Y.R.: Evaluation of ANN and SVM classifiers as predictors to the diagnosis of students with learning disabilities. Expert Syst. Appl. 34(3), 1846–1856 (2008)
Zhang, S., Li, X., Zong, M., Zhu, X., Cheng, D.: Learning k for KNN classification. ACM Trans. Intell. Syst. Technol. (TIST) 8(3), 1–19 (2017)
Zheng, X.: SMOTE variants for imbalanced binary classification: heart disease prediction. University of California, Los Angeles (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sihag, G. et al. (2023). Advantages of Oversampling Techniques: A Case Study in Risk Factors for Fall Prediction. In: Maciaszek, L.A., Mulvenna, M.D., Ziefle, M. (eds) Information and Communication Technologies for Ageing Well and e-Health. ICT4AWE ICT4AWE 2021 2022. Communications in Computer and Information Science, vol 1856. Springer, Cham. https://doi.org/10.1007/978-3-031-37496-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-37496-8_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-37495-1
Online ISBN: 978-3-031-37496-8
eBook Packages: Computer ScienceComputer Science (R0)