Skip to main content
Log in

Integrating MTS with bagging strategy for class imbalance problems

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Class imbalance is a common problem in classification tasks. The learning schemes of most classification algorithms tend to optimize the overall accuracy, and thus, identification of important but rarely occurring examples is ignored. The Mahalanobis–Taguchi system (MTS) has been shown to be robust in addressing class imbalance problems owing to its inherent properties of classification model construction. The bagging learning approach often has been applied as a superior strategy to reduce the learning bias of classification algorithms. In this study, we propose MTSbag, which integrates the MTS and the bagging-based ensemble learning approaches to enhance the ability of conventional MTS in handling imbalanced data. We perform numerical experiments involving multiple datasets with various class imbalance levels to demonstrate the effectiveness of MTSbag, especially for datasets with high imbalance levels. Finally, as a healthcare application, an early warning system for in-hospital cardiac arrest, was successfully implemented by leveraging the minority class identification ability of MTSbag.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Alfaro E, Gamez M, Garcia N (2013) Adabag: an R package for classification with boosting and bagging. J Stat Softw 54(2):1–35

    Google Scholar 

  2. Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor 6(1):20–29

    Google Scholar 

  3. Błaszczyński J, Deckert M, Stefanowski J, Wilk S (2010) Integrating selective pre-processing of imbalanced data with ivotes ensemble. In: International conference on rough sets and current trends in computing, pp 148–157

  4. Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC, BOca Raton

    MATH  Google Scholar 

  5. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    MATH  Google Scholar 

  6. Breiman L (1996) Out-of-bag estimation. Tech Rep Stat Dep Univ Calif Berkeley 33(34):1–13

    Google Scholar 

  7. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    MATH  Google Scholar 

  8. Buenviaje B, Bischoff JE, Roncace RA, Willy CJ (2016) Mahalanobis-Taguchi system to identify preindicators of delirium in the ICU. IEEE J Biomed Health Inform 20(4):1205–1212

    Google Scholar 

  9. Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. Accessed 1 Sept 2016

  10. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    MATH  Google Scholar 

  11. Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) SMOTEBoost: improving prediction of the minority class in boosting. In: European conference on principles of data mining and knowledge discovery, Springer Berlin Heidelberg, pp 107–119

  12. Chawla NV, Cieslak DA, Hall LO, Joshi A (2008) Automatically countering imbalance and its empirical relationship to cost. Data Min Knowl Disc 17(2):225–252

    MathSciNet  Google Scholar 

  13. Chen Z, Lin T, Xia X, Xu H, Ding S (2018) A synthetic neighborhood generation based ensemble learning for the imbalanced data classification. Appl Intell 48(8):2441–2457

    Google Scholar 

  14. Chen HH (2017) Package ‘ebmc’. https://CRAN.R-project.org/package=ebmc. Accessed 15 Mar 2018

  15. Das P, Datta S (2007) Exploring the effects of chemical composition in hot rolled steel product using Mahalanobis distance scale under Mahalanobis-Taguchi system. Comput Mater Sci 38(4):671–677

    Google Scholar 

  16. Estabrooks A, Jo T, Japkowicz N (2004) A multiple resampling method for learning from imbalanced data sets. Comput Intell 20(1):18–36

    MathSciNet  Google Scholar 

  17. Fan W, Stolfo SJ, Zhang J, Chan PK (1999) Adacost: misclassification cost-sensitive boosting. In: 16th international conference on machine learning, pp 97–105

  18. Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139

    MathSciNet  MATH  Google Scholar 

  19. Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2012) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern Part C Appl Rev 42(4):463–484

    Google Scholar 

  20. Galar M, Fernández A, Barrenechea E, Herrera F (2013) EUSBoost: enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recogn 46(12):3460–3471

    Google Scholar 

  21. Grzymala-Busse JW, Stefanowski J, Wilk S (2004) A comparison of two approaches to data mining from imbalanced data. Lect Notes Comput Sci 3213:757–763

    Google Scholar 

  22. Guo H, Viktor HL (2004) Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach. SIGKDD Explor 6(1):30–39

    Google Scholar 

  23. Guo H, Li Y, Shang J, Gu M, Huang Y, Gong B (2017) Learning from class-imbalanced data: review of methods and applications. Expert Syst Appl 73:220–239

    Google Scholar 

  24. Hakim L, Sartono B, Saefuddin A (2017) Bagging based ensemble classification method on imbalance datasets. Int J Comput Sci Netw 6(6):670–676

    Google Scholar 

  25. Hanifah FS, Wijayanto H, Kurnia A (2015) SMOTEBagging algorithm for imbalanced dataset in logistic regression analysis (case: credit of bank X). Appl Math Sci 9(138):6857–6865

    Google Scholar 

  26. Harliman R, Uchida K (2018) Data-and algorithm-hybrid approach for imbalanced data problems in deep neural network. Int J Mach Learn Comput 8(3):208–213

    Google Scholar 

  27. He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284

    Google Scholar 

  28. Huang JC (2010) Reducing solder paste inspection in surface-mount assembly through Mahalanobis–Taguchi analysis. IEEE Trans Electron Packag Manuf 33(4):265–274

    Google Scholar 

  29. Khoshgoftaar TM, Golawala M, Van Hulse J (2007) An empirical study of learning from imbalanced data using random forest. In: 19th IEEE international conference on tools with artificial intelligence, vol 2, pp 310–317

  30. Khoshgoftaar TM, Van Hulse J, Napolitano A (2011) Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Trans Syst Man Cybern Part A Syst Hum 41(3):552–568

    Google Scholar 

  31. Khwaja AS, Naeem M, Anpalagan A, Venetsanopoulos A, Venkatesh B (2015) Improved short-term load forecasting using bagged neural networks. Electr Power Syst Res 125:109–115

    Google Scholar 

  32. Kuo RJ, Su PY, Zulvia FE, Lin CC (2018) Integrating cluster analysis with granular computing for imbalanced data classification problem—a case study on prostate cancer prognosis. Comput Ind Eng 125:319–332

    Google Scholar 

  33. Ling C, Sheng V, Yang Q (2006) Test strategies for cost-sensitive decision trees. IEEE Trans Knowl Data Eng 18(8):1055–1067

    Google Scholar 

  34. Liparas D, Angelis L, Feldt R (2012) Applying the Mahalanobis–Taguchi strategy for software defect diagnosis. Autom Softw Eng 19(2):141–165

    Google Scholar 

  35. Liu XY, Wu J, Zhou ZH (2009) Exploratory undersampling for class imbalance learning. IEEE Trans Syst Man Cybern B Cybern 39(2):539–550

    Google Scholar 

  36. Mahalanobis PC (1936) On the generalised distance in statistics. Proc Natl Inst Sci India 2:49–55

    MATH  Google Scholar 

  37. Manevitz LM, Yousef M (2001) One-class SVMs for document classification. J Mach Learn Res 2:139–154

    MATH  Google Scholar 

  38. Phua C, Alahakoon D, Lee V (2004) Minority report in fraud detection: classification of skewed data. SIGKDD Explor 6(1):50–59

    Google Scholar 

  39. Polikar R (2006) Ensemble based systems in decision making. IEEE Circ Syst Mag 6(3):21–45

    Google Scholar 

  40. Raghuwanshi BS, Shukla S (2019) Class imbalance learning using UnderBagging based kernelized extreme learning machine. Neurocomputing 329:172–187

    Google Scholar 

  41. Raskutti A, Kowalczyk A (2004) Extreme rebalancing for SVMs: a case study. SIGKDD Explor 6(1):60–69

    Google Scholar 

  42. RColorBrewer S, Liaw A, Wiener M, Liaw MA (2015) Package ‘randomForest’. ftp://ie.freshrpms.net/pub/CRAN/web/packages/randomForest/randomForest.pdf. Accessed 1 Sept 2016

  43. Riho T, Suzuki A, Oro J, Ohmi K, Tanaka H (2005) The yield enhancement methodology for invisible defects using the MTS + method. IEEE Trans Semicond Manuf 18(4):561–568

    Google Scholar 

  44. Rokach L (2010) Ensemble-based classifiers. Artif Intell Rev 33(1):1–39

    MathSciNet  Google Scholar 

  45. Schapire RE (1990) The strength of weak learnability. Mach Learn 5(2):197–227

    Google Scholar 

  46. Seiffert C, Khoshgoftaar TM, Van Hulse J, Napolitano A (2010) RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans Syst Man Cybern Part A Syst Hum 40(1):185–197

    Google Scholar 

  47. Shakya P, Kulkarni MS, Darpe AK (2015) Bearing diagnosis based on Mahalanobis–Taguchi–Gram–Schmidt method. J Sound Vib 337:342–362

    Google Scholar 

  48. Soylemezoglu A, Jagannathan S, Saygin C (2011) Mahalanobis-Taguchi system as a multi-sensor based decision making prognostics tool for centrifugal pump failures. IEEE Trans Reliab 60(4):864–878

    Google Scholar 

  49. Su CT, Hsiao YH (2007) An evaluation of the robustness of MTS for imbalanced data. IEEE Trans Knowl Data Eng 19(10):1321–1332

    Google Scholar 

  50. Sun Y, Kamel MS, Wong AK, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn 40(12):3358–3378

    MATH  Google Scholar 

  51. Sun J, Lang J, Fujita H, Li H (2018) Imbalanced enterprise credit evaluation with DTE-SBD: decision tree ensemble based on SMOTE and bagging with differentiated sampling rates. Inf Sci 425:76–91

    MathSciNet  Google Scholar 

  52. Taguchi G, Jugulum R (2002) The Mahalanobis–Taguchi strategy. Wiley, New York

    Google Scholar 

  53. Ting KM (2000) A comparative study of cost-sensitive boosting algorithms. in: 17th International conference on machine learning, pp 983–990

  54. Wang Q, Luo Z, Huang J, Feng Y, Liu Z (2017) A novel ensemble method for imbalanced data learning: bagging of extrapolation-SMOTE SVM. Comput Intell Neurosci 2017:1827016

    Google Scholar 

  55. Woodall WH, Koudelik R, Tsui KL, Kim SB, Stoumbos ZG, Carvounis CP (2003) A review and analysis of the Mahalanobis–Taguchi system. Technometrics 45(1):1–15

    MathSciNet  Google Scholar 

  56. Wu G, Chang E (2003) Adaptive feature-space conformal transformation for imbalanced data learning. In: 20th International conference on machine learning, pp 816–823

  57. Wu G, Chang E (2005) KBA: kernel boundary alignment considering imbalanced data distribution. IEEE Trans Knowl Data Eng 17(6):786–795

    Google Scholar 

  58. Yang Q, Wu X (2006) 10 challenging problems in data mining research. Int J Inf Technol Decis Making 5(4):597–604

    Google Scholar 

  59. Yu H, Sun C, Yang X, Zheng S, Zou H (2019) Fuzzy support vector machine with relative density information for classifying imbalanced data. In: IEEE transactions on fuzzy systems

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu-Hsiang Hsiao.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hsiao, YH., Su, CT. & Fu, PC. Integrating MTS with bagging strategy for class imbalance problems. Int. J. Mach. Learn. & Cyber. 11, 1217–1230 (2020). https://doi.org/10.1007/s13042-019-01033-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-019-01033-1

Keywords

Navigation