Abstract
A class imbalanced dataset contains a disproportionate number of a certain class’ records compared to other classes. Classifiers which are built from class imbalanced datasets are biased and thus under-perform for the minority class. Treatment methods such as sampling and cost-sensitivity can be used to negate the bias induced by class imbalance. In this study, we present an analogy between class imbalance and war. By creating this analogy, we make it possible for military strategies to be applied to class imbalanced datasets. We propose a novel class imbalance treatment method Standoff-Balancing which uses a well-known mathematical law from military strategy literature. We compare the proposed technique with four existing techniques on five real world data sets. Our experiments show that the proposed technique may provide a higher AUC to existing techniques.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
A cluster’s centroid is a record whose values are the average of the records within that cluster.
References
Alcalá, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Logic Soft Comput. 17(2–3), 255–287 (2010)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Series B (Methodological) 39, 1–38 (1977)
Domingos, P.: Metacost: a general method for making classifiers cost-sensitive. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 155–164. ACM (1999)
Fernández, A., García, S., del Jesus, M.J., Herrera, F.: A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets. Fuzzy Sets Syst. 159(18), 2378–2398 (2008). http://dx.doi.org/10.1016/j.fss.2007.12.023
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). IEEE, June 2008. http://dx.doi.org/10.1109/IJCNN.2008.4633969
He, H., Garcia, E.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009). http://dx.doi.org/10.1109/TKDE.2008.239
Hernández-Orallo, J., Flach, P., Ferri, C.: A unified view of performance metrics: translating threshold choice into expected classification loss. J. Mach. Learn. Res. 13(1), 2813–2869 (2012). http://dl.acm.org/citation.cfm?id=2503308.2503332
Lanchester, F.W.: Mathematics in warfare. World Math. 4, 2138–2157 (1956)
Lichman, M.: UCI Machine Learning Repository (2013). http://archive.ics.uci.edu/ml
Lloyd, S.P.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
Quinlan, J.R.: C 4.5: Programs for Machine Learning. The Morgan Kaufmann Series in Machine Learning. Morgan Kaufmann, San Mateo (1993)
Sheng, V.S., Gu, B., Fang, W., Wu, J.: Cost-sensitive learning for defect escalation. Knowl. Based Syst. 66, 146–155 (2014). http://dx.doi.org/10.1016/j.knosys.2014.04.033
Siers, M.J., Islam, M.Z.: Software defect prediction using a cost sensitive decision forest and voting and a potential solution to the class imbalance problem. Inf. Syst. 51, 62–71 (2015). http://dx.doi.org/10.1016/j.is.2015.02.006
Simpkin, R.E.: Race to the Swift: Thoughts on Twenty-First Century Warfare, vol. 1. Potomac Books, Herndon (1985)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Siers, M.J., Islam, M.Z. (2015). Standoff-Balancing: A Novel Class Imbalance Treatment Method Inspired by Military Strategy. In: Pfahringer, B., Renz, J. (eds) AI 2015: Advances in Artificial Intelligence. AI 2015. Lecture Notes in Computer Science(), vol 9457. Springer, Cham. https://doi.org/10.1007/978-3-319-26350-2_46
Download citation
DOI: https://doi.org/10.1007/978-3-319-26350-2_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26349-6
Online ISBN: 978-3-319-26350-2
eBook Packages: Computer ScienceComputer Science (R0)