Abstract
The objective of this paper is to make an improvement on ensemble learning for imbalanced problem. Multi-matrices approach and nearest entropy are introduced into model of base classifier for the sake of utilizing spatial information of data and geometric relation between instances. Our method utilizes the variety of matrix to mine the potential information in the data and constructs regularization term that measures the neighboring relationship among instances with entropy to enhance the stability of decision boundary. The different shapes of matrix contain distinct spatial information. As a result, the origin vector-oriented data are reorganized into multiple shapes of matrix to expand the different spatial information. The nearest entropy is used to measure the local certainty of instances so that the stable instances can be selected to train by the new regularization term. In order to compare the advantages of introducing the multi-matrices and entropy, several ensemble learning methods that have similar ensemble strategy and variants of linear classification models are selected to implement experiments, based on 55 binary classification datasets of KEEL benchmark.
Similar content being viewed by others
References
Barua S, Islam MM, Yao X, Murase K (2014) Mwmote-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans Knowl Data Eng 26(2):405–425
Breiman L, Friedman JH, Olshen R, Stone CJ (1984) Classification and regression trees. Wadsworth Brooks 57(1):582–588
Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2009) Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Pacific-Asia conference on advances in knowledge discovery and data mining, pp 475–482
Cai D, He X, Zhou K, Han J, Bao H (2007) Locality sensitive discriminant analysis. In: International joint conference on artificial intelligence, pp 708–713
Cao C, Wang Z (2018) Imcstacking: cost-sensitive stacking learning with feature inverse mapping for imbalanced problems. Knowl Based Syst 150:27–37
Castro CL, Braga AP (2013) Novel cost-sensitive approach to improve the multilayer perceptron performance on imbalanced data. IEEE Trans Neural Netw Learn Syst 24(6):888–899
Chan P, Stolfo S (1998) Toward scalable learning with non-uniform class and cost distributions. In: International conference on knowledge discovery and data mining, pp 164–168
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Chao C, Andy L, Leo B et al (2004) Using random forest to learn imbalanced data, vol 110. University of California, Berkeley, pp 1–12
Chen S, Wang Z, Tian Y (2007) Matrix-pattern-oriented Ho–Kashyap classifier with regularization learning. Pattern Recognit 40(5):1533–1543
Chen Y, Wu K, Chen X, Tang C, Zhu Q (2014) An entropy-based uncertainty measurement approach in neighborhood systems. Inf Sci 279:239–250
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30
Drummond C, Holte RC (2000) Exploiting the cost (in)sensitivity of decision tree splitting criteria. In: International conference on machine learning, pp 1–2
Duda Richard O, Hart Peter E, Stork David G (2001) Pattern classification, second edn. Wiley, New York
Fan Q, Wang Z, Li DD, Gao DQ, Zha HY (2017) Entropy-based fuzzy support vector machine for imbalanced datasets. Knowl Based Syst 115:87–99
Fan W, Stolfo SJ, Zhang J, Chan PK (1999) Adacost: misclassification cost-sensitive boosting. In: International conference on machine learning, pp 97–105
Fumera G, Roli F (2002) Support vector machines with embedded reject option. In: Lee S-W, Verri A (eds) Pattern recognition with support vector machines. Springer, Berlin
Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2012) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern C (Appl Rev) 42(4):463–484
Guo HX, Li YJ, Li Y, Liu X, Li J (2016) BPSO-AdaBoost-KNN ensemble learning algorithm for multi-class imbalanced data classification. Eng Appl Artif Intell 49(3):176–193
Han H, Wang WY, Mao BH (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. Int Conf Intell Comput 2005:878–887
He HB, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Jo T, Japkowicz N (2004) Class imbalances versus small disjuncts. ACM SIGKDD Explor Newsl 6(1):40–49
Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Prog Artif Intell 5(4):221–232
Kukar M, Kononenko I et al (1998) Cost-sensitive learning with neural networks. In: European conference on artificial intelligence, pp 445–449
Kwok JT (1999) Moderating the outputs of support vector machine classifiers. In: Proceedings of the international joint conference on neural networks (IJCNN’99), pp 943–948
Li Q, Li G, Niu WJ, Cao Y, Chang Lg, Tan JL, Guo L (2016) Boosting imbalanced data learning with wiener process oversampling. Front Comput Sci 11(5):1–16
Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
Liu XY, Wu JX, Zhou ZH (2009) Exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man Cybern B (Cybern) 39(2):539–550
Maloof MA (2003) Learning when data sets are imbalanced and when costs are unequal and unknown. In: International conference on machine learning-2003 workshop on learning from imbalanced data sets II, pp 1–2
Masnadi-Shirazi H, Vasconcelos N, Iranmehr A (2012) Cost-sensitive support vector machines. arXiv preprint arXiv:1212.0975
Poggio T (1996) Image representations for visual learning. Science 272(5270):1905–1909
Seiffert C, Khoshgoftaar TM, Van Hulse J, Napolitano A (2010) RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans Syst Man Cybern A Syst Hum 40(1):185–197
Shannon CE (2001) A mathematical theory of communication. ACM SIGMOBILE Mob Comput Commun Rev 5(1):3–55
Sun B, Chen HY, Wang JD, Xie H (2018) Evolutionary under-sampling based bagging ensemble method for imbalanced data classification. Front Comput Sci 12(2):331–350
Sun SL (2013) A survey of multi-view machine learning. Neural Comput Appl 23(7–8):2031–2038
Sun Shiliang (2013) A survey of multi-view machine learning. Neural Comput Appl 23(7):2031–2038
Sun Y, Wong AKC, Kamel MS (2009) Classification of imbalanced data: a review. Int J Pattern Recognit Artif Intell 23(4):687–719
Wang Q, Luo ZH, Huang JC, Feng YH, Liu Z (2017) A novel ensemble method for imbalanced data learning: bagging of extrapolation-smote svm. Comput Intell Neurosci 2017(3):11
Wang Z, Chen S (2007) New least squares support vector machines based on matrix patterns. Neural Process Lett 26(1):41–56
Wang Z, Chen SC, Gao DQ (2011) A novel multi-view learning developed from single-view patterns. Pattern Recognit 44(10):2395–2413
Wang Z, Zhang GW, Li DD, Zhu YJ, Cao CJ (2017) Locality sensitive discriminant matrixized learning machine. Knowl Based Syst 116:13–25
Wimalawarne K, Tomioka R, Sugiyama M (2016) Theoretical and experimental analyses of tensor-based regression and classification. Neural Comput 28(4):686–715
Xue H, Chen S, Yang Q (2009) Discriminatively regularized least-squares classification. Pattern Recognit 42(1):93–104
Zhang H, Wang S, Zhao M, Xu X, Ye Y (2018) Locality reconstruction models for book representation. IEEE Trans Knowl Data Eng 30(10):1873–1886
Zhou ZH, Liu XY (2006) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans Knowl Data Eng 18(1):63–77
Zhu YJ, Wang Z, Gao DQ (2015) Gravitational fixed radius nearest neighbor for imbalanced problem. Knowl Based Syst 90:224–238
Acknowledgements
This work is supported by Natural Science Foundation of China under Grant No. 61672227, “Shuguang Program” supported by Shanghai Education Development Foundation and Shanghai Municipal Education Commission, National Science Foundation of China for Distinguished Young Scholars under Grant 61725301, National Key R&D Program of China under Grant No. 2018YFC0910500 and Natural Science Foundations of China under Grant No. 61806078.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The authors of this manuscript state that there are no conflicts of interests between this manuscript and other published works.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, Z., Chen, Z., Zhu, Y. et al. Multi-matrices entropy discriminant ensemble learning for imbalanced problem. Neural Comput & Applic 32, 8245–8264 (2020). https://doi.org/10.1007/s00521-019-04306-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-019-04306-6