Abstract
To improve the classification performance of imbalanced learning, a novel over-sampling method, Global Immune Centroids Over-Sampling (Global-IC) based on an immune network, is proposed. Global-IC generates a set of representative immune centroids to broaden the decision regions of small class spaces. The representative immune centroids are regarded as synthetic examples in order to resolve the imbalance problem. We utilize an artificial immune network to generate synthetic examples on clusters with high data densities. This approach addresses the problem of synthetic minority oversampling techniques, which lacks of the reflection on groups of training examples. Our comprehensive experimental results show that Global-IC can achieve better performance than renowned multi-class resampling methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hastie, T., Tibshirani, R.: Classification by pairwise coupling. Annals of Statistics 26(2), 451–471 (1998)
Rifkin, R., Klautau, A.: In defense of one-vs-all classification. Journal of Machine Learning Research 5, 101–141 (2004)
Castro, L.N.D., Zuben, F.J.V.: aiNet: An artificial immune network for data analysis. In: Abbass, H.A., Sarker, R.A., Newton, C.S., (eds.) Data Mining: A Heuristic Approach. Idea Group Publishing, ch XII, pp. 231–259, USA (2001)
Tan, A.C., Gilbert, D., Deville, Y.: Multi-class protein fold classification using a new ensemble machine learning approach. Genome Informatics 14, 206–217 (2003)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. Journal of Artificial Intelligent Research 16, 321–357 (2002)
Liao, T.W.: Classification of weld flaws with imbalanced class data. Expert Systems with Applications 35, 1041–1052 (2008)
Chen, K., Lu, B.L., Kwok, J.T.: Efficient classification of multi-label and imbalanced data using min-max modular classifiers. In: International Joint Conference on Neural Net-works, pp. 1770–1775 (2006)
Zhou, Z.H., Liu, X.Y.: On multi-class cost-sensitive learning. Computational Intelligence 26(3), 232–257 (2010)
Fernndez-Navarro, F., Hervs-Martnez, C., Gutirrez, P.A.: A dynamic oversampling pro-cedure based on sensitivity for multi-class problems. Pattern Recognition 44, 1821–1833 (2011)
Wang, S., Yao, X.: Multi-class imbalance problems: analysis and potential solutions. IEEE TransSystems, Man, and Cybernetics, Part B: Cybernetics 42(4), 1119–1130 (2012)
Jerne, N.K.: Towards a Network Theory of the Immune System. Annales d’immunologie 125C(1–2), 373–389 (1974)
Burnet, F.M.: A modification of Jerne’s theory of antibody production using the concept of clonal selection. A Cancer Journal for Clinicians 26(2), 119–121 (1976)
Jo, T., Japkowicz, N.: Class imbalances versus small disjuncts. ACM SIGKDD Explorations Newsletter 6(1), 40–49 (2004)
Alcala-Fdez, J., Snchez, L., Garca, S., del Jesus, M.J., Ventura, S., Garrell, J.M., Otero, J., Romero, C., Bacardit, J., Rivas, V.M., Fernndez, J.C., Herrera, F.: KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Computing 13, 307–318 (2009)
McLachlan, G.J.: Discriminant Analysis and Statistical Pattern Recognition. John Wiley and Sons (2004)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kauffman (1993)
Cortes, C., Vapnik, V.: Support vector networks. Machine Learning 20, 273–297 (1995)
Bradley, A.P.: The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognition 30(7), 1145–1159 (1997)
Fernndez, A., Lpez, V., Galar, M., Jesus, M.J.D., Herrera, F.: Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approach-es. Knowledge-Based Systems 42, 97–110 (2013)
Zhao, H.: Instance weighting versus threshold adjusting for cost-sensitive classification. Knowledge and Information Systems 15(3), 321–334 (2008)
Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behaviour of several methods for balancing machine learning training data. SIGKDD Explorations 6(1), 20–29 (2004)
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proceedings of the 13th International Conference, Machine Learning, pp. 148–156 (1996)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Ai, X., Wu, J., Sheng, V.S., Zhao, P., Yao, Y., Cui, Z. (2015). Immune Centroids Over-Sampling Method for Multi-Class Classification. In: Cao, T., Lim, EP., Zhou, ZH., Ho, TB., Cheung, D., Motoda, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2015. Lecture Notes in Computer Science(), vol 9077. Springer, Cham. https://doi.org/10.1007/978-3-319-18038-0_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-18038-0_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18037-3
Online ISBN: 978-3-319-18038-0
eBook Packages: Computer ScienceComputer Science (R0)