Skip to main content

Evolutionary Cost-Sensitive Balancing: A Generic Method for Imbalanced Classification Problems

  • Conference paper
  • First Online:
EVOLVE - A Bridge between Probability, Set Oriented Numerics, and Evolutionary Computation VI

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 674))

Abstract

Efficient classification under imbalanced class distributions is currently of interest in data mining research, considering that traditional learning methods often fail to achieve satisfying results in such domains. Also, the correct choice of the metric is essential for the recognition effort. This paper presents a new general methodology for improving the performance of classifiers in imbalanced problems. The method, Evolutionary Cost-Sensitive Balancing (ECSB), is a meta-approach, which can be employed with any error-reduction classifier. It utilizes genetic search and cost-sensitive mechanisms to boost the performance of the base classifier. We present evaluations on benchmark data, comparing the results obtained by ECSB with those of similar recent methods in the literature: SMOTE and EUS. We found that ECSB boosts the performance of traditional classifiers in imbalanced problems, achieving ~45% relative improvement in true positive rate (\(\text {TP}_{\text {rate}}\)) and around 16% in F-measure (FM) on the average; also, it performs better than sampling strategies, with ~35% relative improvement in \(\text {TP}_{\text {rate}}\) and ~12% in FM over SMOTE (on the average), similar \(text{TP}_{\text {rate}}\) and geometric mean (GM) values and slightly higher area under de curve (AUC) values than EUS (up to ~9% relative improvement).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aliamiri A: Statistical Methods for Unexploded Ordnance Discrimination. PhD Thesis. Department of Electrical and Computer Engineering. Northeastern University. Boston, MA (2006)

    Google Scholar 

  2. Barandela, R., Sanchez, J.S., Garcia, V., Rangel, E.: Strategies for learning in class imbalance problems. Pattern Recogn. 36(3), 849–85 (2003)

    Article  Google Scholar 

  3. Batista, G.E.A.P.A, Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newslett. 6(1), 20—29 (2004). https://doi.org/10.1145/1007730.1007735

  4. Brodersen, K.H., Ong, C.S., Stephen, K.E., Buhmann, J.M.: The balanced accuracy and its posterior distribution. In: Proceedings of the 20th International Conference on Pattern Recognition, pp. 3121–3124 (2010)

    Google Scholar 

  5. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    MATH  Google Scholar 

  6. Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEboost: improving prediction of the minority class in boosting. In: Proceedings of the Seventh European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 107—119 (2003)

    Google Scholar 

  7. Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor. 6(1), 1–6 (2004)

    Article  Google Scholar 

  8. Chawla, N.: Data Mining from Imbalanced Data Sets: An Overview. Data Mining and Knowledge Discovery Handbook. Springer, US (2006)

    Google Scholar 

  9. Derderian, K.: General Genetic Algorithm Tool (2002), http://www.karnig.co.uk/ga/ggat.html

  10. Domingos, P.: MetaCost: a general method for making classifiers cost-sensitive. In: Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, pp. 155–164. ACM Press (1999)

    Google Scholar 

  11. Garcia, S., Herrera, F.: Evolutionary undersampling for classification with imbalanced datasets: proposals and taxonomy. Evol. Comput. 17(3), 275–306 (2009)

    Article  Google Scholar 

  12. Grzymala-Busse, J.W., Stefanowski, J., Wilk, S.: A comparison of two approaches to data mining from imbalanced data. J. Intell. Manuf. 16, 565–573 (2005)

    Article  Google Scholar 

  13. Guo, H., Viktor, H.L.: Learning from imbalanced data sets with boosting and data generation: the databoost-IM approach. Sigkdd Explor. 6, 30–39 (2004)

    Article  Google Scholar 

  14. Hart, P.E.: The condensed nearest neighbor rule. IEEE Trans. Inf. Theory IT-14, 515—516 (1968)

    Google Scholar 

  15. Huang, K., Yang, H., King, I., Lyu, M.R.: Imbalanced learning with a biased minimax probability machine. IEEE Trans. Syst. Man Cybern. B Cybern. 36(4), 913–923 (2006)

    Article  Google Scholar 

  16. Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. J. 6(5), 429–449 (2002)

    MATH  Google Scholar 

  17. Kubat, M., Matwin, S.: Addressing the course of imbalanced training sets: one-sided selection. In: ICML, pp. 179—186 (1997)

    Google Scholar 

  18. Laurikkala, J.: Improving Identification of Difficult Small Classes by Balancing Class Distribution. Technical Report A-2001-2. University of Tampere (2001)

    Google Scholar 

  19. Lemnaru, C., Potolea, R.: Imbalanced Classification Problems: Systematic Study. Issues and Best Practices. LNBIP, vol. 102, pp. 35–50 (2012)

    Google Scholar 

  20. Lin, Y., Lee, Y., Wahba, G.: Support vector machines for classification in nonstandard situations. Mach. Learn. 46, 191–202 (2002)

    Article  MATH  Google Scholar 

  21. Liu, B., Ma, Y., Wong, C.K.: Improving an association rule based classifier. In: Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, pp. 504–509 (2000)

    Google Scholar 

  22. Liu, W., Chawlam, S., Cieslak, D., Chawla, N.: A robust decision tree algorithms for imbalanced data sets. In: Proceedings of the Tenth SIAM International Conference on Data Mining, pp. 766–777 (2010)

    Google Scholar 

  23. Liu, W., Chawla, S.: Class Confidence Weighted kNN Algorithms for Imbalanced Data Sets. Advances in Knowledge Discovery and Data Mining. LNCS, vol. 6635, pp. 345–356 (2011)

    Google Scholar 

  24. Quinlan, J.R.: Improved estimates for the accuracy of small disjuncts. Mach. Learn. 6, 93–98 (1991)

    Google Scholar 

  25. Sun, Y., Kamel, M.S., Wong, A.K.C., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn. 40(12), 3358–3378 (2007)

    Article  MATH  Google Scholar 

  26. Tian, J., Gu, H., Liu, W.: Imbalanced classification using support vector machine ensemble. Neural Comput. Appl. 20(2), 203–209 (2011)

    Article  Google Scholar 

  27. Tomek, I.: Two modifications of CNN. IEEE Trans. Syst. Man Commun. SMC-6, 769—772 (1976)

    Google Scholar 

  28. Turney, P.: Types of cost in inductive concept learning. In: Proceedings of the Workshop on Cost-Sensitive Learning at the Seventeenth International Conference on Machine Learning. Stanford University, California (2000)

    Google Scholar 

  29. Visa, S., Ralescu, A.: Issues in mining imbalanced data sets-a review paper. In: Proceedings of the Sixteen Midwest Artificial Intelligence and Cognitive Science Conference, pp. 67–73 (2005)

    Google Scholar 

  30. Weiss, G.M., Provost, F.: The Effect of Class Distribution on Classifier Learning: An Empirical Study. Technical Report ML-TR-44. Department of Computer Science, Rutgers University (2001)

    Google Scholar 

  31. Weiss, G.M., Provost, F.: Learning when training data are costly: the effect of class distribution on tree induction. J. Artif. Intell. Res. 19, 315–354 (2003)

    MATH  Google Scholar 

  32. Weiss, G.: Mining with rarity: a unifying framework. SIGKDD Explor. 6(1), 7—19 (2004)

    Google Scholar 

  33. Williams, D., Myers, V., Silvious, M.: Mine classification with imbalanced data. IEEE Geosci. Remote Sens. Lett. 6(3), 528–532 (2009)

    Article  Google Scholar 

  34. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    Google Scholar 

  35. Wu, G., Chang, E.Y.: Class-boundary alignment for imbalanced dataset learning. In: Proceedings of the ICML 2003 Workshop on Learning from Imbalanced Data Sets (2003)

    Google Scholar 

  36. Zadrozny, B., Elkan, C.: Learning and making decisions when costs and probabilities are both unknown. In: Proceedings of the Seventh International Conference on Knowledge Discovery and Data Mining, pp. 204–213 (2001)

    Google Scholar 

  37. Zhou, Z.H., Liu, X.Y.: Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans. Knowl. Data Eng. 18(1), 63–77 (2006)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgement

The work of the authors is supported by European Social Fund, via Programme POSDRU, DMI 1.5, ID 137516 – PARTING

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Camelia Lemnaru .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lemnaru, C., Potolea, R. (2018). Evolutionary Cost-Sensitive Balancing: A Generic Method for Imbalanced Classification Problems. In: Tantar, AA., Tantar, E., Emmerich, M., Legrand, P., Alboaie, L., Luchian, H. (eds) EVOLVE - A Bridge between Probability, Set Oriented Numerics, and Evolutionary Computation VI. Advances in Intelligent Systems and Computing, vol 674. Springer, Cham. https://doi.org/10.1007/978-3-319-69710-9_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-69710-9_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-69708-6

  • Online ISBN: 978-3-319-69710-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics