Skip to main content

Advantages of Oversampling Techniques: A Case Study in Risk Factors for Fall Prediction

  • Conference paper
  • First Online:
Information and Communication Technologies for Ageing Well and e-Health (ICT4AWE 2021, ICT4AWE 2022)

Abstract

The evaluation of risk factors for falls (RFF) is a key point in fall prevention for the elderly. Since the information of the main actionable RFF can not always be regularly re-evaluated by medical factors, their automatic prediction would allow providing useful recommendations to reduce the risk of falls. This article explores the advantages of three oversampling methods to improve the quality of the prediction of 12 target RFF on the basis of a real imbalanced data set. We first present the data set, together with the selection of 45 variables and 12 target variables and other pre-processing steps. Second, we present the three oversampling methods, SMOTE, SMOTE-SVM, and ADASYN, the classifiers (Logistic Regression, Random Forest, Bayesian Network, Artificial Neural Network, and Naive Bayes), and the quality measures that we use in this study (balanced accuracy, area under ROC curve, area under Precision-Recall curve, F1 and F2 score). Each target is successively evaluated from all other variables. Results are presented by the classifier (averaging over targets) and by target (averaging over classifiers), for each oversampling method and quality measure. Finally, statistical tests validate the interest of using oversampling methods. The three methods demonstrate a clear advantage in comparison with the imbalanced data set, and SVM-SMOTE provides the best increment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alasadi, S.A., Bhaya, W.S.: Review of data preprocessing techniques in data mining. J. Eng. Appl. Sci. 12(16), 4102–4107 (2017)

    Google Scholar 

  2. Apsemidis, A., Psarakis, S.: Support vector machines: a review and applications in statistical process monitoring. Data Anal. Appl. 3: Comput. Classif. Financ. Stat. Stochastic Methods 5, 123–144 (2020)

    Google Scholar 

  3. Azar, A.T., Elshazly, H.I., Hassanien, A.E., Elkorany, A.M.: A random forest classifier for lymph diseases. Comput. Methods Programs Biomed. 113(2), 465–473 (2014)

    Article  Google Scholar 

  4. Cahyana, N., Khomsah, S., Aribowo, A.S.: Improving imbalanced dataset classification using oversampling and gradient boosting. In: 2019 5th International Conference on Science in Information Technology (ICSITech), pp. 217–222. IEEE (2019)

    Google Scholar 

  5. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  MATH  Google Scholar 

  6. Cheng, J., Greiner, R.: Comparing Bayesian network classifiers. arXiv preprint arXiv:1301.6684 (2013)

  7. Chu, X., Ilyas, I.F., Krishnan, S., Wang, J.: Data cleaning: overview and emerging challenges. In: Proceedings of the 2016 International Conference on Management of Data, pp. 2201–2206 (2016)

    Google Scholar 

  8. Delcroix, V., Essghaier, F., Oliveira, K., Pudlo, P., Gaxatte, C., Puisieux, F.: Towards a fall prevention system design by using ontology. En lien avec les Journées francophones d’Ingénierie des Connaissances, Plate-Forme PFIA (2019)

    Google Scholar 

  9. Francis, S., Prasad, P., Zahoor-Ul-Huq, s.: Medical data classification based on smote and recurrent neural network. Int. J. Eng. Adv. Technol. 9 (2020). https://doi.org/10.35940/ijeat.C5444.029320

  10. He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1322–1328. IEEE (2008)

    Google Scholar 

  11. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)

    Article  Google Scholar 

  12. Hosmer, D.W., Jr., Lemeshow, S., Sturdivant, R.X.: Applied Logistic Regression, vol. 398. Wiley, Hoboken (2013)

    Book  MATH  Google Scholar 

  13. Huang, X., Shi, L., Suykens, J.A.: Support vector machine classifier with pinball loss. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 984–997 (2013)

    Article  Google Scholar 

  14. Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques. MIT Press, Cambridge (2009)

    MATH  Google Scholar 

  15. Kotsiantis, S.B., Kanellopoulos, D., Pintelas, P.E.: Data preprocessing for supervised leaning. Int. J. Comput. Sci. 1(2), 111–117 (2006)

    Google Scholar 

  16. Lin, J.T., Lane, J.M.: Falls in the elderly population. Phys. Med. Rehabil. Clin. 16(1), 109–128 (2005)

    Article  Google Scholar 

  17. Nalepa, J., Kawulok, M.: Selecting training sets for support vector machines: a review. Artif. Intell. Rev. 52(2), 857–900 (2019)

    Article  Google Scholar 

  18. Obiedat, R., et al.: Sentiment analysis of customers’ reviews using a hybrid evolutionary SVM-based approach in an imbalanced data distribution. IEEE Access 10, 22260–22273 (2022)

    Article  Google Scholar 

  19. Rahman, M.M., Davis, D.N.: Machine learning-based missing value imputation method for clinical datasets. In: Yang, G.C., Ao, S., Gelman, L. (eds.) IAENG Transactions on Engineering Technologies. Lecture Notes in Electrical Engineering, vol. 229, pp. 245–257. Springer, Dordrecht (2013). https://doi.org/10.1007/978-94-007-6190-2_19

    Chapter  Google Scholar 

  20. Rish, I., et al.: An empirical study of the naive bayes classifier. In: IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, vol. 3, pp. 41–46 (2001)

    Google Scholar 

  21. Russell, S., Norvig, P.: Artificial intelligence: a modern approach (2002)

    Google Scholar 

  22. Sihag, G., et al.: Evaluation of risk factors for fall in elderly using Bayesian networks: a case study. Comput. Methods Program. Biomed. Update 1, 100035 (2021)

    Google Scholar 

  23. Sihag., G., et al.: Evaluation of risk factors for fall in elderly people from imbalanced data using the oversampling technique smote. In: Proceedings of the 8th International Conference on Information and Communication Technologies for Ageing Well and e-Health - ICT4AWE, pp. 50–58. INSTICC, SciTePress (2022). https://doi.org/10.5220/0011041200003188

  24. Sokolova, M., Japkowicz, N., Szpakowicz, S.: Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. In: Sattar, A., Kang, B. (eds.) AI 2006. LNCS (LNAI), vol. 4304, pp. 1015–1021. Springer, Heidelberg (2006). https://doi.org/10.1007/11941439_114

    Chapter  Google Scholar 

  25. Wu, T.K., Huang, S.C., Meng, Y.R.: Evaluation of ANN and SVM classifiers as predictors to the diagnosis of students with learning disabilities. Expert Syst. Appl. 34(3), 1846–1856 (2008)

    Article  Google Scholar 

  26. Zhang, S., Li, X., Zong, M., Zhu, X., Cheng, D.: Learning k for KNN classification. ACM Trans. Intell. Syst. Technol. (TIST) 8(3), 1–19 (2017)

    Google Scholar 

  27. Zheng, X.: SMOTE variants for imbalanced binary classification: heart disease prediction. University of California, Los Angeles (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gulshan Sihag .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sihag, G. et al. (2023). Advantages of Oversampling Techniques: A Case Study in Risk Factors for Fall Prediction. In: Maciaszek, L.A., Mulvenna, M.D., Ziefle, M. (eds) Information and Communication Technologies for Ageing Well and e-Health. ICT4AWE ICT4AWE 2021 2022. Communications in Computer and Information Science, vol 1856. Springer, Cham. https://doi.org/10.1007/978-3-031-37496-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-37496-8_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-37495-1

  • Online ISBN: 978-3-031-37496-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics