Standoff-Balancing: A Novel Class Imbalance Treatment Method Inspired by Military Strategy

Siers, Michael J.; Islam, Md Zahidul

doi:10.1007/978-3-319-26350-2_46

Standoff-Balancing: A Novel Class Imbalance Treatment Method Inspired by Military Strategy

Michael J. Siers¹⁵ &
Md Zahidul Islam¹⁵

Conference paper
First Online: 22 November 2015

1533 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9457))

Abstract

A class imbalanced dataset contains a disproportionate number of a certain class’ records compared to other classes. Classifiers which are built from class imbalanced datasets are biased and thus under-perform for the minority class. Treatment methods such as sampling and cost-sensitivity can be used to negate the bias induced by class imbalance. In this study, we present an analogy between class imbalance and war. By creating this analogy, we make it possible for military strategies to be applied to class imbalanced datasets. We propose a novel class imbalance treatment method Standoff-Balancing which uses a well-known mathematical law from military strategy literature. We compare the proposed technique with four existing techniques on five real world data sets. Our experiments show that the proposed technique may provide a higher AUC to existing techniques.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
A cluster’s centroid is a record whose values are the average of the records within that cluster.

References

Alcalá, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Logic Soft Comput. 17(2–3), 255–287 (2010)
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
MATH Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Series B (Methodological) 39, 1–38 (1977)
MathSciNet MATH Google Scholar
Domingos, P.: Metacost: a general method for making classifiers cost-sensitive. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 155–164. ACM (1999)
Google Scholar
Fernández, A., García, S., del Jesus, M.J., Herrera, F.: A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets. Fuzzy Sets Syst. 159(18), 2378–2398 (2008). http://dx.doi.org/10.1016/j.fss.2007.12.023
Article MathSciNet Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Article Google Scholar
He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). IEEE, June 2008. http://dx.doi.org/10.1109/IJCNN.2008.4633969
He, H., Garcia, E.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009). http://dx.doi.org/10.1109/TKDE.2008.239
Article Google Scholar
Hernández-Orallo, J., Flach, P., Ferri, C.: A unified view of performance metrics: translating threshold choice into expected classification loss. J. Mach. Learn. Res. 13(1), 2813–2869 (2012). http://dl.acm.org/citation.cfm?id=2503308.2503332
MathSciNet MATH Google Scholar
Lanchester, F.W.: Mathematics in warfare. World Math. 4, 2138–2157 (1956)
Google Scholar
Lichman, M.: UCI Machine Learning Repository (2013). http://archive.ics.uci.edu/ml
Lloyd, S.P.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
Article MathSciNet MATH Google Scholar
Quinlan, J.R.: C 4.5: Programs for Machine Learning. The Morgan Kaufmann Series in Machine Learning. Morgan Kaufmann, San Mateo (1993)
Google Scholar
Sheng, V.S., Gu, B., Fang, W., Wu, J.: Cost-sensitive learning for defect escalation. Knowl. Based Syst. 66, 146–155 (2014). http://dx.doi.org/10.1016/j.knosys.2014.04.033
Article Google Scholar
Siers, M.J., Islam, M.Z.: Software defect prediction using a cost sensitive decision forest and voting and a potential solution to the class imbalance problem. Inf. Syst. 51, 62–71 (2015). http://dx.doi.org/10.1016/j.is.2015.02.006
Article Google Scholar
Simpkin, R.E.: Race to the Swift: Thoughts on Twenty-First Century Warfare, vol. 1. Potomac Books, Herndon (1985)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing and Mathematics, Charles Sturt University, Bathurst, Australia
Michael J. Siers & Md Zahidul Islam

Authors

Michael J. Siers
View author publications
You can also search for this author in PubMed Google Scholar
Md Zahidul Islam
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael J. Siers .

Editor information

Editors and Affiliations

The University of Waikato, Hamilton, New Zealand
Bernhard Pfahringer
The Australian National University, Canberra, Aust Capital Terr, Australia
Jochen Renz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Siers, M.J., Islam, M.Z. (2015). Standoff-Balancing: A Novel Class Imbalance Treatment Method Inspired by Military Strategy. In: Pfahringer, B., Renz, J. (eds) AI 2015: Advances in Artificial Intelligence. AI 2015. Lecture Notes in Computer Science(), vol 9457. Springer, Cham. https://doi.org/10.1007/978-3-319-26350-2_46

Download citation

DOI: https://doi.org/10.1007/978-3-319-26350-2_46
Published: 22 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26349-6
Online ISBN: 978-3-319-26350-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics