Skip to main content

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1153))

Abstract

This paper studies the problem of imbalance in medical datasets. Today, modern machine learning techniques are becoming increasingly popular for this type of problem, with examples in the areas of health and medical. One of the major difficulties with this technique is that the database handled is highly imbalance. Under-sampling and over-sampling techniques are used to work around this problem. In this paper, we apply random forests, which are combinations of decision trees fitted to subsamples of the data, built using under-sampling and over-sampling. At the end of the work, we compare fit metrics obtained in the various specifications of the models tested and evaluate their results inside and outside the sample. We observed that random forest techniques using imbalanced sub-samples smaller than the original sample presented the best performance among the random forests used and an improvement compared to that practiced in the medical dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Lu, H., et al.: Kernel principal component analysis combining rotation forest method for linearly inseparable data. Cogn. Syst. Res. 53, 111–122 (2019)

    Article  Google Scholar 

  2. Zhu, H.-J., et al.: DroidDet: effective and robust detection of android malware using static analysis along with rotation forest model. Neurocomputing 272, 638–646 (2018)

    Article  Google Scholar 

  3. Hong, H., et al.: Landslide susceptibility mapping using J48 decision tree with AdaBoost, bagging and rotation forest ensembles in the Guangchang area (China). Catena 163, 399–413 (2018)

    Article  Google Scholar 

  4. Wang, L., et al.: An improved efficient rotation forest algorithm to predict the interactions among proteins. Soft Comput. 22(10), 3373–3381 (2018)

    Article  Google Scholar 

  5. Pham, B.T., et al.: A hybrid machine learning ensemble approach based on a radial basis function neural network and rotation forest for landslide susceptibility modeling: a case study in the Himalayan area India. Int. J. Sediment Res. 33(2), 157–170 (2018)

    Article  Google Scholar 

  6. Lee, S.-J., et al.: A novel bagging C4.5 algorithm based on wrapper feature selection for supporting wise clinical decision making. J. Biomed. Inform. 78, 144–155 (2018)

    Article  Google Scholar 

  7. Gul, A., et al.: Ensemble of a subset of kNN classifiers. Adv. Data Anal. Classif. 12(4), 827–840 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  8. Sun, J., et al.: Unbalanced enterprise credit evaluation with DTE-SBD: decision tree ensemble based on SMOTE and bagging with differentiated sampling rates. Inf. Sci. 425, 76–91 (2018)

    Article  Google Scholar 

  9. Lango, M., Stefanowski, J.: Multi-class and feature selection extensions of roughly balanced bagging for unbalanced data. J. Intell. Inf. Syst. 50(1), 97–127 (2018)

    Article  Google Scholar 

  10. Chen, W., et al.: Novel hybrid integration approach of bagging-based fisher’s linear discriminant function for groundwater potential analysis. Nat. Resour. Res. 1–20 (2019)‏

    Google Scholar 

  11. García, S., et al.: Dynamic ensemble selection for multi-class unbalanced datasets. Inf. Sci. 445, 22–37 (2018)

    Article  Google Scholar 

  12. Maldonado, S., López, J.: Dealing with high-dimensional class-unbalanced datasets: embedded feature selection for SVM classification. Appl. Soft Comput. 67, 94–105 (2018)

    Article  Google Scholar 

  13. Piri, S., Delen, D., Liu, T.: A synthetic informative minority over-sampling (SIMO) algorithm leveraging support vector machine to enhance learning from unbalanced datasets. Decis. Support Syst. 106, 15–29 (2018)

    Article  Google Scholar 

  14. Zhang, C., et al.: Research on classification method of high-dimensional class-unbalanced datasets based on SVM. Int. J. Mach. Learn. Cybern. 10(7), 1765–1778 (2019)

    Article  Google Scholar 

  15. Douzas, G., Bacao, F.: Effective data generation for unbalanced learning using conditional generative adversarial networks. Expert Syst. Appl. 91, 464–471 (2018)

    Article  Google Scholar 

  16. Veganzones, D., Séverin, E.: An investigation of bankruptcy prediction in unbalanced datasets. Decis. Support Syst. 112, 111–124 (2018)

    Article  Google Scholar 

  17. Tahan, M.H., Asadi, S.: EMDID: evolutionary multi-objective discretization for unbalanced datasets. Inf. Sci. 432, 442–461 (2018)

    Article  Google Scholar 

  18. Hosmer Jr., D.W., Lemeshow, S., Sturdivant, R.X.: Applied logistic regression, vol. 398. Wiley, Hoboken (2013)

    Book  MATH  Google Scholar 

  19. Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning, Springer Series in Statistics. vol. 1. no. 10. Springer, New York (2001)‏

    Google Scholar 

  20. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  21. Drummond, C., Holte, R.C.: C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on Learning from Imbalanced Datasets II, vol. 11. Citeseer, Washington DC (2003)‏

    Google Scholar 

  22. Burez, J., Van den Poel, D.: Handling class imbalance in customer churn prediction. Expert Syst. Appl. 36(3), 4626–4636 (2009)

    Article  Google Scholar 

  23. Weiss, G.M.: Mining with rarity: a unifying framework. ACM SIGKDD Explor. Newslett. 6(1), 7–19 (2004)

    Article  Google Scholar 

  24. Chawla, N.V., et al.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  MATH  Google Scholar 

  25. Chen, C., Liaw, A., Breiman, L.: Using random forest to learn imbalanced data. Univ. Calif. Berkeley 110(1-12), 24 (2004)

    Google Scholar 

  26. Bekkar, M., Djemaa, H.K., Alitouche, T.A.: Evaluation measures for models assessment over imbalanced data sets. J. Inf. Eng. Appl. 3(10) (2013)‏

    Google Scholar 

  27. Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: one-sided selection. In: ICML, vol. 97 (1997)‏

    Google Scholar 

  28. Hulse, J.V., Khoshgoftaar, T.M., Napolitano, A.: Experimental perspectives on learning from imbalanced data. In: ICML 2007 Proceedings of the 24th International Conference on Machine Learning, pp. 935–942, Corvalis, OR, USA (2007)

    Google Scholar 

  29. Sanz, J.A., et al.: An evolutionary underbagging approach to tackle the survival prediction of trauma patients: a case study at the hospital of Navarre. IEEE Access 7, 76009–76021 (2019)

    Article  Google Scholar 

  30. Wang, S., Yao, X.: Diversity analysis on imbalanced data sets by using ensemble models. In: IEEE Symposium on Computational Intelligence and Data Mining, pp. 324–331 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Engy El-shafeiy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

El-shafeiy, E., Abohany, A. (2020). Medical Imbalanced Data Classification Based on Random Forests. In: Hassanien, AE., Azar, A., Gaber, T., Oliva, D., Tolba, F. (eds) Proceedings of the International Conference on Artificial Intelligence and Computer Vision (AICV2020). AICV 2020. Advances in Intelligent Systems and Computing, vol 1153. Springer, Cham. https://doi.org/10.1007/978-3-030-44289-7_8

Download citation

Publish with us

Policies and ethics