Skip to main content

Standoff-Balancing: A Novel Class Imbalance Treatment Method Inspired by Military Strategy

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9457))

Abstract

A class imbalanced dataset contains a disproportionate number of a certain class’ records compared to other classes. Classifiers which are built from class imbalanced datasets are biased and thus under-perform for the minority class. Treatment methods such as sampling and cost-sensitivity can be used to negate the bias induced by class imbalance. In this study, we present an analogy between class imbalance and war. By creating this analogy, we make it possible for military strategies to be applied to class imbalanced datasets. We propose a novel class imbalance treatment method Standoff-Balancing which uses a well-known mathematical law from military strategy literature. We compare the proposed technique with four existing techniques on five real world data sets. Our experiments show that the proposed technique may provide a higher AUC to existing techniques.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    A cluster’s centroid is a record whose values are the average of the records within that cluster.

References

  1. Alcalá, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Logic Soft Comput. 17(2–3), 255–287 (2010)

    Google Scholar 

  2. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    MATH  Google Scholar 

  3. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Series B (Methodological) 39, 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  4. Domingos, P.: Metacost: a general method for making classifiers cost-sensitive. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 155–164. ACM (1999)

    Google Scholar 

  5. Fernández, A., García, S., del Jesus, M.J., Herrera, F.: A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets. Fuzzy Sets Syst. 159(18), 2378–2398 (2008). http://dx.doi.org/10.1016/j.fss.2007.12.023

    Article  MathSciNet  Google Scholar 

  6. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)

    Article  Google Scholar 

  7. He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). IEEE, June 2008. http://dx.doi.org/10.1109/IJCNN.2008.4633969

  8. He, H., Garcia, E.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009). http://dx.doi.org/10.1109/TKDE.2008.239

    Article  Google Scholar 

  9. Hernández-Orallo, J., Flach, P., Ferri, C.: A unified view of performance metrics: translating threshold choice into expected classification loss. J. Mach. Learn. Res. 13(1), 2813–2869 (2012). http://dl.acm.org/citation.cfm?id=2503308.2503332

    MathSciNet  MATH  Google Scholar 

  10. Lanchester, F.W.: Mathematics in warfare. World Math. 4, 2138–2157 (1956)

    Google Scholar 

  11. Lichman, M.: UCI Machine Learning Repository (2013). http://archive.ics.uci.edu/ml

  12. Lloyd, S.P.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  13. Quinlan, J.R.: C 4.5: Programs for Machine Learning. The Morgan Kaufmann Series in Machine Learning. Morgan Kaufmann, San Mateo (1993)

    Google Scholar 

  14. Sheng, V.S., Gu, B., Fang, W., Wu, J.: Cost-sensitive learning for defect escalation. Knowl. Based Syst. 66, 146–155 (2014). http://dx.doi.org/10.1016/j.knosys.2014.04.033

    Article  Google Scholar 

  15. Siers, M.J., Islam, M.Z.: Software defect prediction using a cost sensitive decision forest and voting and a potential solution to the class imbalance problem. Inf. Syst. 51, 62–71 (2015). http://dx.doi.org/10.1016/j.is.2015.02.006

    Article  Google Scholar 

  16. Simpkin, R.E.: Race to the Swift: Thoughts on Twenty-First Century Warfare, vol. 1. Potomac Books, Herndon (1985)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael J. Siers .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Siers, M.J., Islam, M.Z. (2015). Standoff-Balancing: A Novel Class Imbalance Treatment Method Inspired by Military Strategy. In: Pfahringer, B., Renz, J. (eds) AI 2015: Advances in Artificial Intelligence. AI 2015. Lecture Notes in Computer Science(), vol 9457. Springer, Cham. https://doi.org/10.1007/978-3-319-26350-2_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26350-2_46

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26349-6

  • Online ISBN: 978-3-319-26350-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics