Skip to main content

Mutual-Information-SMOTE: A Cost-Free Learning Method for Imbalanced Data Classification

  • Conference paper
  • First Online:
Computational Intelligence and Intelligent Systems (ISICA 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 873))

Included in the following conference series:

  • 725 Accesses

Abstract

In recent years, classification of imbalanced data get into trouble due to the imbalanced class distribution. Cost-Sensitive Learning (CSL) is one of the solutions at algorithm level, but this kind of method needs to provide cost information in advance. Mutual-Information Classification (MIC) is a Cost-Free Learning (CFL) method, which could summarize cost information automatically. But this method pays too much attention to minority class and ignores the accuracy of majority class data. Based on MIC, this paper proposed a CFL method for imbalanced data classification, called Mutual-Information-SMOTE Classification (MISC). Firstly, we uses mutual-information classifiers to generate the abstaining samples, which is difficult to be classified. Secondly, we use these abstention samples to synthetize new samples. Thirdly, we construct mutual-information-SMOTE classifiers using MIC and SMOTE. Finally, we uses our classifiers to obtain the final results. Numerical examples are given for indicating that MISC is more effective for imbalanced data classification through comparing with MIC.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Zhang, S., Sadaoui, S., Mouhoub, M.: An empirical analysis of imbalanced data classification. Comput. Inf. Sci. 8(1), 151–162 (2015)

    Google Scholar 

  2. Raeder, T., Forman, G., Chawla, N.V.: Learning from Imbalanced data: evaluation matters. In: Holmes, D.E., Jain, L.C. (eds.) Data Mining: Foundations and Intelligent Paradigms. ISRL, vol. 23, pp. 315–331. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-23166-7_12

    Chapter  MATH  Google Scholar 

  3. Beyan, C., Fisher, R.: Classifying imbalanced data sets using similarity based hierarchical decomposition. Pattern Recogn. 48, 1653–1672 (2015)

    Article  Google Scholar 

  4. Díez-Pastor, J.F., Rodríguez, J.J., García-Osorio, C., et al.: Random balance: ensembles of variable priors classifiers for imbalanced data. Knowl. Based Syst. 85, 96–111 (2015)

    Article  Google Scholar 

  5. Gao, M., Hong, X., Harris, C.J.: Construction of neurofuzzy models for imbalanced data classification. IEEE Trans. Fuzzy Syst. 22(6), 1472–1488 (2014)

    Article  Google Scholar 

  6. Li, H., Zou, P., Han, W., et al.: A combination method for multi-class imbalanced data classification. In: Web Information System and Application Conference, pp. 365–368. IEEE (2013)

    Google Scholar 

  7. Li, Y., Liu, Z.D., Zhang, H.J.: Review on ensemble algorithms for imbalanced data classification. Appl. Res. Comput. 1001–3695 (2014)

    Google Scholar 

  8. Zhang, X., Hu, B.: A new strategy of cost-free learning in the class imbalance problem. IEEE Trans. Knowl. Data Eng. 26(12), 2872–2885 (2013)

    Article  Google Scholar 

  9. Schaefer, G., Krawczyk, B., Doshi, N.P., et al.: Cost-sensitive texture classification. In: IEEE Congress on Evolutionary Computation, pp. 105–108. IEEE (2014)

    Google Scholar 

  10. Bahnsen, A.C., Aouada, D.: Example-dependent cost-sensitive decision trees. Expert Syst. Appl. 42, 6609–6619 (2015)

    Article  Google Scholar 

  11. Yu, H., Mu, C., Sun, C., et al.: Support vector machine-based optimized decision threshold adjustment strategy for classifying imbalanced data. Knowl. Based Syst. 76(1), 67–78 (2015)

    Article  Google Scholar 

  12. Lin, W.J., Chen, J.J.: Class-imbalanced classifiers for high-dimensional data. Brief. Bioinf. 14(1), 13 (2013)

    Article  Google Scholar 

  13. Napierala, K., Stefanowski, J.: Abstaining in rule set bagging for imbalanced data. Log. J. IGPL 23(3), 421 (2015)

    Article  MathSciNet  Google Scholar 

  14. Zhao, Z., Wang, X.: A research of optimal rejection thresholds based on ROC curve. In: International Conference on Signal Processing, pp. 1403–1407. IEEE (2015)

    Google Scholar 

  15. Blankenburg, M., Bloch, C., Krüger, J.: Computation of a rejection threshold used for the bayes classifier. In: International Conference on Machine Learning and Applications, pp. 342–349. IEEE (2015)

    Google Scholar 

  16. Hu, B.G.: What are the differences between bayesian classifiers and mutual-information classifiers. IEEE Trans. Neural Netw. Learn. Syst. 25(2), 249 (2014)

    Article  Google Scholar 

  17. Cover, T.M., Thomas, J.A.: Elements of Information Theory, 2nd edn. Wiley, Hoboken (2006). J. Am. Stat. Assoc. 39(7), 1600–1601 (2006)

    MATH  Google Scholar 

  18. Zhen, Z., Xue-Gang, H.U.: Classification model based on mutual information. J. Comput. Appl. 31(6), 1678–1680 (2011)

    Google Scholar 

  19. Principe, J.C., Xu, D., Zhao, Q.: Learning from examples with information theoretic criteria. J. VLSI Signal Process. Syst. 26(1–2), 61–77 (2000)

    Article  Google Scholar 

  20. Hu, B.G., He, R., Yuan, X.T.: Information-theoretic measures for objective evaluation of classifications. Acta Autom. Sinica 38(7), 1169–1182 (2012)

    Article  MathSciNet  Google Scholar 

  21. Mi, Y.: Imbalanced classification based on active learning SMOTE. Res. J. Appl. Sci. Eng. Technol. 5(3), 944–949 (2013)

    MathSciNet  Google Scholar 

  22. Hu, B.G., Wang, Y.: Evaluation criteria based on mutual information for classifications including rejected class. Acta Autom. Sinica 34(11), 1396–1403 (2008)

    Article  MathSciNet  Google Scholar 

  23. Dai, H.L.: Imbalanced protein data classification using ensemble FTM-SVM. IEEE Trans. Nanobiosci. 14(4), 350–359 (2015)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National High Technology Research, Development Program of China (No. 2015IM03030), the National Natural Science Foundation of China (No. 61573235), the Shanghai Innovation Action Project of Science and Technology (No. 17511103502), and the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yufei Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, Y., Chen, Y., Liu, X., Zhao, W. (2018). Mutual-Information-SMOTE: A Cost-Free Learning Method for Imbalanced Data Classification. In: Li, K., Li, W., Chen, Z., Liu, Y. (eds) Computational Intelligence and Intelligent Systems. ISICA 2017. Communications in Computer and Information Science, vol 873. Springer, Singapore. https://doi.org/10.1007/978-981-13-1648-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-1648-7_2

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-1647-0

  • Online ISBN: 978-981-13-1648-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics