Skip to main content
Log in

Multi-matrices entropy discriminant ensemble learning for imbalanced problem

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The objective of this paper is to make an improvement on ensemble learning for imbalanced problem. Multi-matrices approach and nearest entropy are introduced into model of base classifier for the sake of utilizing spatial information of data and geometric relation between instances. Our method utilizes the variety of matrix to mine the potential information in the data and constructs regularization term that measures the neighboring relationship among instances with entropy to enhance the stability of decision boundary. The different shapes of matrix contain distinct spatial information. As a result, the origin vector-oriented data are reorganized into multiple shapes of matrix to expand the different spatial information. The nearest entropy is used to measure the local certainty of instances so that the stable instances can be selected to train by the new regularization term. In order to compare the advantages of introducing the multi-matrices and entropy, several ensemble learning methods that have similar ensemble strategy and variants of linear classification models are selected to implement experiments, based on 55 binary classification datasets of KEEL benchmark.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Barua S, Islam MM, Yao X, Murase K (2014) Mwmote-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans Knowl Data Eng 26(2):405–425

    Article  Google Scholar 

  2. Breiman L, Friedman JH, Olshen R, Stone CJ (1984) Classification and regression trees. Wadsworth Brooks 57(1):582–588

    MATH  Google Scholar 

  3. Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2009) Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Pacific-Asia conference on advances in knowledge discovery and data mining, pp 475–482

  4. Cai D, He X, Zhou K, Han J, Bao H (2007) Locality sensitive discriminant analysis. In: International joint conference on artificial intelligence, pp 708–713

  5. Cao C, Wang Z (2018) Imcstacking: cost-sensitive stacking learning with feature inverse mapping for imbalanced problems. Knowl Based Syst 150:27–37

    Article  Google Scholar 

  6. Castro CL, Braga AP (2013) Novel cost-sensitive approach to improve the multilayer perceptron performance on imbalanced data. IEEE Trans Neural Netw Learn Syst 24(6):888–899

    Article  Google Scholar 

  7. Chan P, Stolfo S (1998) Toward scalable learning with non-uniform class and cost distributions. In: International conference on knowledge discovery and data mining, pp 164–168

  8. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Article  MATH  Google Scholar 

  9. Chao C, Andy L, Leo B et al (2004) Using random forest to learn imbalanced data, vol 110. University of California, Berkeley, pp 1–12

    Google Scholar 

  10. Chen S, Wang Z, Tian Y (2007) Matrix-pattern-oriented Ho–Kashyap classifier with regularization learning. Pattern Recognit 40(5):1533–1543

    Article  MATH  Google Scholar 

  11. Chen Y, Wu K, Chen X, Tang C, Zhu Q (2014) An entropy-based uncertainty measurement approach in neighborhood systems. Inf Sci 279:239–250

    Article  MathSciNet  MATH  Google Scholar 

  12. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30

    MathSciNet  MATH  Google Scholar 

  13. Drummond C, Holte RC (2000) Exploiting the cost (in)sensitivity of decision tree splitting criteria. In: International conference on machine learning, pp 1–2

  14. Duda Richard O, Hart Peter E, Stork David G (2001) Pattern classification, second edn. Wiley, New York

    MATH  Google Scholar 

  15. Fan Q, Wang Z, Li DD, Gao DQ, Zha HY (2017) Entropy-based fuzzy support vector machine for imbalanced datasets. Knowl Based Syst 115:87–99

    Article  Google Scholar 

  16. Fan W, Stolfo SJ, Zhang J, Chan PK (1999) Adacost: misclassification cost-sensitive boosting. In: International conference on machine learning, pp 97–105

  17. Fumera G, Roli F (2002) Support vector machines with embedded reject option. In: Lee S-W, Verri A (eds) Pattern recognition with support vector machines. Springer, Berlin

    MATH  Google Scholar 

  18. Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2012) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern C (Appl Rev) 42(4):463–484

    Article  Google Scholar 

  19. Guo HX, Li YJ, Li Y, Liu X, Li J (2016) BPSO-AdaBoost-KNN ensemble learning algorithm for multi-class imbalanced data classification. Eng Appl Artif Intell 49(3):176–193

    Google Scholar 

  20. Han H, Wang WY, Mao BH (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. Int Conf Intell Comput 2005:878–887

    Google Scholar 

  21. He HB, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284

    Article  Google Scholar 

  22. Jo T, Japkowicz N (2004) Class imbalances versus small disjuncts. ACM SIGKDD Explor Newsl 6(1):40–49

    Article  Google Scholar 

  23. Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Prog Artif Intell 5(4):221–232

    Article  Google Scholar 

  24. Kukar M, Kononenko I et al (1998) Cost-sensitive learning with neural networks. In: European conference on artificial intelligence, pp 445–449

  25. Kwok JT (1999) Moderating the outputs of support vector machine classifiers. In: Proceedings of the international joint conference on neural networks (IJCNN’99), pp 943–948

  26. Li Q, Li G, Niu WJ, Cao Y, Chang Lg, Tan JL, Guo L (2016) Boosting imbalanced data learning with wiener process oversampling. Front Comput Sci 11(5):1–16

    MATH  Google Scholar 

  27. Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988

  28. Liu XY, Wu JX, Zhou ZH (2009) Exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man Cybern B (Cybern) 39(2):539–550

    Article  Google Scholar 

  29. Maloof MA (2003) Learning when data sets are imbalanced and when costs are unequal and unknown. In: International conference on machine learning-2003 workshop on learning from imbalanced data sets II, pp 1–2

  30. Masnadi-Shirazi H, Vasconcelos N, Iranmehr A (2012) Cost-sensitive support vector machines. arXiv preprint arXiv:1212.0975

  31. Poggio T (1996) Image representations for visual learning. Science 272(5270):1905–1909

    Article  Google Scholar 

  32. Seiffert C, Khoshgoftaar TM, Van Hulse J, Napolitano A (2010) RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans Syst Man Cybern A Syst Hum 40(1):185–197

    Article  Google Scholar 

  33. Shannon CE (2001) A mathematical theory of communication. ACM SIGMOBILE Mob Comput Commun Rev 5(1):3–55

    Article  MathSciNet  Google Scholar 

  34. Sun B, Chen HY, Wang JD, Xie H (2018) Evolutionary under-sampling based bagging ensemble method for imbalanced data classification. Front Comput Sci 12(2):331–350

    Article  Google Scholar 

  35. Sun SL (2013) A survey of multi-view machine learning. Neural Comput Appl 23(7–8):2031–2038

    Article  Google Scholar 

  36. Sun Shiliang (2013) A survey of multi-view machine learning. Neural Comput Appl 23(7):2031–2038

    Article  Google Scholar 

  37. Sun Y, Wong AKC, Kamel MS (2009) Classification of imbalanced data: a review. Int J Pattern Recognit Artif Intell 23(4):687–719

    Article  Google Scholar 

  38. Wang Q, Luo ZH, Huang JC, Feng YH, Liu Z (2017) A novel ensemble method for imbalanced data learning: bagging of extrapolation-smote svm. Comput Intell Neurosci 2017(3):11

    Google Scholar 

  39. Wang Z, Chen S (2007) New least squares support vector machines based on matrix patterns. Neural Process Lett 26(1):41–56

    Article  MathSciNet  Google Scholar 

  40. Wang Z, Chen SC, Gao DQ (2011) A novel multi-view learning developed from single-view patterns. Pattern Recognit 44(10):2395–2413

    Article  MATH  Google Scholar 

  41. Wang Z, Zhang GW, Li DD, Zhu YJ, Cao CJ (2017) Locality sensitive discriminant matrixized learning machine. Knowl Based Syst 116:13–25

    Article  Google Scholar 

  42. Wimalawarne K, Tomioka R, Sugiyama M (2016) Theoretical and experimental analyses of tensor-based regression and classification. Neural Comput 28(4):686–715

    Article  MathSciNet  MATH  Google Scholar 

  43. Xue H, Chen S, Yang Q (2009) Discriminatively regularized least-squares classification. Pattern Recognit 42(1):93–104

    Article  MATH  Google Scholar 

  44. Zhang H, Wang S, Zhao M, Xu X, Ye Y (2018) Locality reconstruction models for book representation. IEEE Trans Knowl Data Eng 30(10):1873–1886

    Article  Google Scholar 

  45. Zhou ZH, Liu XY (2006) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans Knowl Data Eng 18(1):63–77

    Article  MathSciNet  Google Scholar 

  46. Zhu YJ, Wang Z, Gao DQ (2015) Gravitational fixed radius nearest neighbor for imbalanced problem. Knowl Based Syst 90:224–238

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by Natural Science Foundation of China under Grant No. 61672227, “Shuguang Program” supported by Shanghai Education Development Foundation and Shanghai Municipal Education Commission, National Science Foundation of China for Distinguished Young Scholars under Grant 61725301, National Key R&D Program of China under Grant No. 2018YFC0910500 and Natural Science Foundations of China under Grant No. 61806078.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Zhe Wang, Jing Zhang or Wenli Du.

Ethics declarations

Conflict of interest

The authors of this manuscript state that there are no conflicts of interests between this manuscript and other published works.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Z., Chen, Z., Zhu, Y. et al. Multi-matrices entropy discriminant ensemble learning for imbalanced problem. Neural Comput & Applic 32, 8245–8264 (2020). https://doi.org/10.1007/s00521-019-04306-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-019-04306-6

Keywords

Navigation