Abstract
In recent years, classification of imbalanced data get into trouble due to the imbalanced class distribution. Cost-Sensitive Learning (CSL) is one of the solutions at algorithm level, but this kind of method needs to provide cost information in advance. Mutual-Information Classification (MIC) is a Cost-Free Learning (CFL) method, which could summarize cost information automatically. But this method pays too much attention to minority class and ignores the accuracy of majority class data. Based on MIC, this paper proposed a CFL method for imbalanced data classification, called Mutual-Information-SMOTE Classification (MISC). Firstly, we uses mutual-information classifiers to generate the abstaining samples, which is difficult to be classified. Secondly, we use these abstention samples to synthetize new samples. Thirdly, we construct mutual-information-SMOTE classifiers using MIC and SMOTE. Finally, we uses our classifiers to obtain the final results. Numerical examples are given for indicating that MISC is more effective for imbalanced data classification through comparing with MIC.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zhang, S., Sadaoui, S., Mouhoub, M.: An empirical analysis of imbalanced data classification. Comput. Inf. Sci. 8(1), 151–162 (2015)
Raeder, T., Forman, G., Chawla, N.V.: Learning from Imbalanced data: evaluation matters. In: Holmes, D.E., Jain, L.C. (eds.) Data Mining: Foundations and Intelligent Paradigms. ISRL, vol. 23, pp. 315–331. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-23166-7_12
Beyan, C., Fisher, R.: Classifying imbalanced data sets using similarity based hierarchical decomposition. Pattern Recogn. 48, 1653–1672 (2015)
DÃez-Pastor, J.F., RodrÃguez, J.J., GarcÃa-Osorio, C., et al.: Random balance: ensembles of variable priors classifiers for imbalanced data. Knowl. Based Syst. 85, 96–111 (2015)
Gao, M., Hong, X., Harris, C.J.: Construction of neurofuzzy models for imbalanced data classification. IEEE Trans. Fuzzy Syst. 22(6), 1472–1488 (2014)
Li, H., Zou, P., Han, W., et al.: A combination method for multi-class imbalanced data classification. In: Web Information System and Application Conference, pp. 365–368. IEEE (2013)
Li, Y., Liu, Z.D., Zhang, H.J.: Review on ensemble algorithms for imbalanced data classification. Appl. Res. Comput. 1001–3695 (2014)
Zhang, X., Hu, B.: A new strategy of cost-free learning in the class imbalance problem. IEEE Trans. Knowl. Data Eng. 26(12), 2872–2885 (2013)
Schaefer, G., Krawczyk, B., Doshi, N.P., et al.: Cost-sensitive texture classification. In: IEEE Congress on Evolutionary Computation, pp. 105–108. IEEE (2014)
Bahnsen, A.C., Aouada, D.: Example-dependent cost-sensitive decision trees. Expert Syst. Appl. 42, 6609–6619 (2015)
Yu, H., Mu, C., Sun, C., et al.: Support vector machine-based optimized decision threshold adjustment strategy for classifying imbalanced data. Knowl. Based Syst. 76(1), 67–78 (2015)
Lin, W.J., Chen, J.J.: Class-imbalanced classifiers for high-dimensional data. Brief. Bioinf. 14(1), 13 (2013)
Napierala, K., Stefanowski, J.: Abstaining in rule set bagging for imbalanced data. Log. J. IGPL 23(3), 421 (2015)
Zhao, Z., Wang, X.: A research of optimal rejection thresholds based on ROC curve. In: International Conference on Signal Processing, pp. 1403–1407. IEEE (2015)
Blankenburg, M., Bloch, C., Krüger, J.: Computation of a rejection threshold used for the bayes classifier. In: International Conference on Machine Learning and Applications, pp. 342–349. IEEE (2015)
Hu, B.G.: What are the differences between bayesian classifiers and mutual-information classifiers. IEEE Trans. Neural Netw. Learn. Syst. 25(2), 249 (2014)
Cover, T.M., Thomas, J.A.: Elements of Information Theory, 2nd edn. Wiley, Hoboken (2006). J. Am. Stat. Assoc. 39(7), 1600–1601 (2006)
Zhen, Z., Xue-Gang, H.U.: Classification model based on mutual information. J. Comput. Appl. 31(6), 1678–1680 (2011)
Principe, J.C., Xu, D., Zhao, Q.: Learning from examples with information theoretic criteria. J. VLSI Signal Process. Syst. 26(1–2), 61–77 (2000)
Hu, B.G., He, R., Yuan, X.T.: Information-theoretic measures for objective evaluation of classifications. Acta Autom. Sinica 38(7), 1169–1182 (2012)
Mi, Y.: Imbalanced classification based on active learning SMOTE. Res. J. Appl. Sci. Eng. Technol. 5(3), 944–949 (2013)
Hu, B.G., Wang, Y.: Evaluation criteria based on mutual information for classifications including rejected class. Acta Autom. Sinica 34(11), 1396–1403 (2008)
Dai, H.L.: Imbalanced protein data classification using ensemble FTM-SVM. IEEE Trans. Nanobiosci. 14(4), 350–359 (2015)
Acknowledgements
This work was supported by the National High Technology Research, Development Program of China (No. 2015IM03030), the National Natural Science Foundation of China (No. 61573235), the Shanghai Innovation Action Project of Science and Technology (No. 17511103502), and the Fundamental Research Funds for the Central Universities.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Chen, Y., Chen, Y., Liu, X., Zhao, W. (2018). Mutual-Information-SMOTE: A Cost-Free Learning Method for Imbalanced Data Classification. In: Li, K., Li, W., Chen, Z., Liu, Y. (eds) Computational Intelligence and Intelligent Systems. ISICA 2017. Communications in Computer and Information Science, vol 873. Springer, Singapore. https://doi.org/10.1007/978-981-13-1648-7_2
Download citation
DOI: https://doi.org/10.1007/978-981-13-1648-7_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1647-0
Online ISBN: 978-981-13-1648-7
eBook Packages: Computer ScienceComputer Science (R0)