LMNNB: Two-in-One imbalanced classification approach by combining metric learning and ensemble learning

Qiao, Shaojie; Han, Nan; Huang, Faliang; Yue, Kun; Wu, Tao; Yi, Yugen; Mao, Rui; Yuan, Chang-an

doi:10.1007/s10489-021-02901-6

LMNNB: Two-in-One imbalanced classification approach by combining metric learning and ensemble learning

Published: 13 October 2021

Volume 52, pages 7870–7889, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Shaojie Qiao ORCID: orcid.org/0000-0002-4703-780X¹,
Nan Han²,
Faliang Huang³,
Kun Yue⁴,
Tao Wu⁵,
Yugen Yi⁶,
Rui Mao⁷ &
…
Chang-an Yuan⁸

522 Accesses
5 Citations
Explore all metrics

Abstract

In the real-world applications of machine learning and cybernetics, the data with imbalanced distribution of classes or skewed class proportions is very pervasive. When dealing with imbalanced data, traditional classification approaches might fail to learn a good classifier. In the phase of learning, these algorithms are greatly impacted by the skewed distribution of data. Consequently, the performance of classification drops drastically. In this study, we propose a novel two-in-one algorithm for classifying the imbalanced data by integrating metric learning and ensemble learning algorithms. Firstly, we design a new metric learning algorithm for imbalanced data, which is called Large Margin Nearest Neighbors Balance (called LMNNB). This method can minimize the distance between one sample and its similar neighbors which belong to the same class, and maximize the distance from its dissimilar neighbors which belong to different classes as well. Essentially, this beneficial effect can also be achieved even if the distribution of data is imbalanced. Through metric learning, the imbalance data can be used to learn a better classifier. Secondly, we propose an ensemble learning algorithm to further improve the performance of classification. This method combines multiple sub-classifiers and makes decisions by applying a soft voting strategy. Extensive experiments are conducted on real benchmark imbalanced datasets to demonstrate the effectiveness of LMNNB with ensemble algorithm (called LMNNB-E) in several evaluation measurements. The results show that LMNNB and LMNNB-E outperform the state-of-the-art methods in classifying imbalance data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

Article 09 November 2022

Vitor Werner de Vargas, Jorge Arthur Schneider Aranda, … Jorge Luis Victória Barbosa

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey on ensemble learning

Article 30 August 2019

Xibin Dong, Zhiwen Yu, … Qianli Ma

References

Xu S, Yang M, Zhou Yu , Zheng R, Liu W, He J (2020) Partial label metric learning by collapsing classes. Int J Mach Learn Cybern 11(11):2453–2460
Article Google Scholar
Ri JH, Tian G, Liu Y, Wei-Hua X, Lou J-G (2020) Extreme learning machine with hybrid cost function of g-mean and probability for imbalance learning. Int J Mach Learn Cybern 11(9):2007–2020
Article Google Scholar
Hsiao Y-H, Su C-T, Fu P-C (2020) Integrating MTS with bagging strategy for class imbalance problems. Int J Mach Learn Cybern 11(6):1217–1230
Article Google Scholar
Galar M, Fernández A, Tartas EB, Sola HB, Herrera F (2012) A review on ensembles for the class imbalance problem Bagging-boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern Part C 42(4):463–484
Article Google Scholar
Liu D, Qiao S, Han N, Wu T, Mao R, Zhang Y, Yuan C, Xiao Y (2020) SOTB: Semi-Supervised oversampling approach based on trigonal barycenter theory. IEEE Access 8:50180–50189
Article Google Scholar
Xu H, Cui R, Lan Y, Kang Y, Deng JG, Jia N (2019) A gaussian mixture model based combined resampling algorithm for classification of imbalanced credit data sets. Int J Mach Learn Cybern 10 (12):3687–3699
Article Google Scholar
Han N, Qiao S, Yuan G, Huang P, Liu D, Yue K (2019) A novel chinese herbal medicine clustering algorithm via artificial bee colony optimization. Artif Intell Med 101:101760
Article Google Scholar
Sultana N, Chilamkurti N, Peng W, Alhadad R (2019) Survey on SDN based network intrusion detection system using machine learning approaches. Peer-to-Peer Netw Appl 12(2):493–501
Article Google Scholar
Fiore U, De Santis A, Perla F, Zanetti P, Palmieri F (2019) Using generative adversarial networks for improving classification effectiveness in credit card fraud detection. Inf Sci 479:448–455
Article Google Scholar
Bach M, Werner A, Zywiec J, Pluskiewicz W (2017) The study of under- and over-sampling methods’ utility in analysis of highly imbalanced data on osteoporosis. Inf Sci 384:174–190
Article Google Scholar
Huang J-W, Chiang C-W, Chang J-W (2018) Email security level classification of imbalanced data using artificial neural network: The real case in a world-leading enterprise. Eng Appl Artif Intell 75:11–21
Article Google Scholar
Zhai J, Zhou X, Zhang S, Wang T (2019) Ensemble rbm-based classifier using fuzzy integral for big data classification. Int J Mach Learn Cybern 10(11):3327–3337
Article Google Scholar
Roshan SE, Asadi S (2020) Improvement of bagging performance for classification of imbalanced datasets using evolutionary multi-objective optimization. Eng Appl Artif Intell 87:103319
Article Google Scholar
Zang B, Huang R, Wang L, Chen J, Tian F, Wei X (2016) An improved knn algorithm based on minority class distribution for imbalanced dataset. In: Proceedings of 2016 International Computer Symposium (ICS). IEEE, pp 696–700
Benítez-Peña S, Blanquero R, Carrizosa E, Ramírez-Cobo P (2019) Cost-sensitive feature selection for support vector machines. Comput Oper Res 106:169–178
Article MathSciNet Google Scholar
Furundzic D, Stankovic S, Jovicic S, Punisic S, Subotic M (2017) Distance based resampling of imbalanced classes: With an application example of speech quality assessment. Eng Appl Artif Intell 64:440–461
Article Google Scholar
Fan Q, Wang Z, Gao D (2016) One-sided dynamic undersampling no-propagation neural networks for imbalance problem. Eng Appl Artif Intell 53:62–73
Article Google Scholar
Yang K, Yu Z, Wen X, Cao W, Chen CLP, Wong Hau-San, You J (2020) Hybrid classifier ensemble for imbalanced data. IEEE Trans Neural Netw Learn Syst 31(4):1387–1400
Article MathSciNet Google Scholar
Lim P, Keong Goh C, Chen Tan K (2017) Evolutionary cluster-based synthetic oversampling ensemble (eco-ensemble) for imbalance learning. IEEE Trans Cybern 47(9):2850–2861
Article Google Scholar
Sugiyama M (2006) Local fisher discriminant analysis for supervised dimensionality reduction. In: Proceedings of the Twenty-Third International Conference on Machine Learning (ICML 2006), Pittsburgh, Pennsylvania, USA, June 25-29, 2006, pp 905–912
Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244
MATH Google Scholar
Weinberger KQ, Tesauro G (2007) Metric learning for kernel regression. In: Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, AISTATS 2007, San Juan, Puerto Rico, March 21-24, 2007, pp 612–619
Xing EP, Ng AY, Jordan MI, Russell SJ (2002) Distance metric learning with application to clustering with side-information. In: Advances in neural information processing systems 15 [neural information processing systems, NIPS 2002, December 9-14, 2002, vancouver, british columbia, canada], pp 505–512
Davis JV, Kulis B, Jain P, Sra S, Dhillon IS (2007) Information-theoretic metric learning. In: Proceedings of the Twenty-Fourth International Conference on Machine Learning (ICML 2007), Corvallis, Oregon, USA, June 20-24, 2007, pp 209–216
Goldberger J, T Roweis S, Hinton GE, Salakhutdinov R (2004) Neighbourhood components analysis. In: Advances in neural information processing systems 17 [neural information processing systems, NIPS 2004, december 13-18, 2004, vancouver, british columbia, canada], pp 513–520
Cheng G, Yang C, Yao X, Guo L, Han J (2018) When deep learning meets metric learning: Remote sensing image scene classification via learning discriminative cnns. IEEE Trans Geosci Remote Sens 56 (5):2811–2821
Article Google Scholar
Liao S, Hu Y, Zhu X, Li SZ (2015) Person re-identification by local maximal occurrence representation and metric learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, pp 2197–2206
Meyer BJ, Harwood B, Drummond T (2018) Deep metric learning and image classification with nearest neighbour gaussian kernels. In: Proceedings of 2018 IEEE International Conference on Image Processing, ICIP 2018, Athens, Greece, October 7-10, 2018, pp 151–155
Jun Y, Yang X, Gao F, Tao D (2017) Deep multimodal distance metric learning using click constraints for image ranking. IEEE Trans Cybern 47(12):4014–4024
Article Google Scholar
Kumar A, Halder A (2020) Ensemble-based active learning using fuzzy-rough approach for cancer sample classification. Eng Appl Artif Intell 91:103591
Article Google Scholar
Wang S, Yao X (2009) Diversity analysis on imbalanced data sets by using ensemble models. In: Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2009, part of the IEEE Symposium Series on Computational Intelligence 2009, Nashville, TN, USA, March 30, 2009 - April 2, 2009, pp 324–331
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Article Google Scholar
Díez-Pastor J-F, Diez JJR, García-Osorio CI, Kuncheva LI (2015) Random balance: Ensembles of variable priors classifiers for imbalanced data. Knowl Based Syst 85:96–111
Article Google Scholar
Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Proceedings of the Thirteenth International Conference on Machine Learning (ICML ’96), Bari, Italy, July 3-6, 1996, pp 148–156
Weinberger KQ, Blitzer J, Saul LK (2005) Distance metric learning for large margin nearest neighbor classification. In: Advances in neural information processing systems 18 [neural information processing systems, NIPS 2005, december 5-8, 2005, vancouver, british columbia, canada], pp 1473–1480
Han H, Wang W-Y, Mao B-H (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: Proceedings of International Conference on Intelligent Computing. Springer, pp 878–887
Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) Smoteboost: Improving prediction of the minority class in boosting. In: Proceedings of Knowledge Discovery in Databases: PKDD 2003, 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, Cavtat-Dubrovnik, Croatia, September 22-26, 2003, Proceedings, pp 107–119
Soda P (2011) A multi-objective optimisation approach for class imbalance learning. Pattern Recognit 44(8):1801–1810
Article Google Scholar
Liu P, Wang X, Yin L, Liu B (2020) Flat random forest: a new ensemble learning method towards better training efficiency and adaptive model size to deep forest. Int J Mach Learn Cybern 11(11):2501–2513
Article Google Scholar
Zhang X, Han N, Qiao S, Zhang Y, Huang P, Peng J, Zhou K, Yuan C, Xiao Y (2020) Balancing large margin nearest neighbours for imbalanced data. J Eng 2020(13):316–321
Article Google Scholar
Siddappa NG, Kampalappa T (2020) Imbalance data classification using local mahalanobis distance learning based on nearest neighbor SN. Comput Sci 1(2):76
Google Scholar
Sun J, Lang J, Fujita H, Li H (2018) Imbalanced enterprise credit evaluation with DTE-SBD: decision tree ensemble based on SMOTE and bagging with differentiated sampling rates. Inf Sci 425:76–91
Article MathSciNet Google Scholar
Zhang C, Bi J, Xu S, Ramentol E, Fan G, Qiao B, Fujita H (2019) Multi-imbalance: An open-source software for multi-class imbalance learning. Knowl Based Syst 174:137–143
Article Google Scholar
Sun J, Li H, Fujita H, Fu B, Ai W (2020) Class-imbalanced dynamic financial distress prediction based on adaboost-svm ensemble combined with SMOTE and time weighting. Inf Fusion 54:128–144
Article Google Scholar
Wang Q, Wan J, Yuan Y (2018) Deep metric learning for crowdedness regression. IEEE Trans Circuits Syst Video Techn 28(10):2633–2643
Article Google Scholar
Song HO, Xiang Y, Jegelka S, Savarese S (2016) Deep metric learning via lifted structured feature embedding. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp 4004–4012
Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300
Article Google Scholar
Keller JM, Gray MR, Givens JA (1985) A fuzzy k-nearest neighbor algorithm. IEEE Trans Syst Man Cybern 15(4):580–585
Article Google Scholar
Loh W-Y (2011) Classification and regression trees. Wiley Interdiscip. Rev Data Min Knowl Discov 1(1):14–23
Article Google Scholar
Zolnierek A, Rubacha B (2005) The empirical study of the naive bayes classifier in the case of markov chain recognition task. In: Proceedings of the 4th International Conference on Computer Recognition Systems, CORES’05, May 22-25, 2005, Rydzyna Castle, Poland, pp 329–336
Joanne Peng Chao-Ying, Lee KL, Ingersoll GM (2002) An introduction to logistic regression analysis and reporting. J Educ Res 96(1):3–14
Article Google Scholar
Dua D, Graff C (2017) UCI machine learning repository
Bae J-S, Oh S-K, Pedrycz W, Fu Z (2019) Design of fuzzy radial basis function neural network classifier based on information data preprocessing for recycling black plastic wastes: comparative studies of ATR FT-IR and raman spectroscopy. Appl Intell 49(3):929–949
Article Google Scholar
de Vazelhes W, Carey CJ, Tang Y, Vauquier N, Bellet A (2019) Metric-learn: Metric Learning Algorithms in Python. Technical report. arXiv:1908.04710
Liu X-Y, Wu J, Zhou Z-H (2009) Exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man Cybern Part B 39(2):539–550
Article Google Scholar
Seiffert C, Khoshgoftaar TM, Van Hulse J, Napolitano A (2010) RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Trans syst Man Cybern Part A 40(1):185–197
Article Google Scholar
Patel H, Thakur GS (2019) An improved fuzzy k-nearest neighbor algorithm for imbalanced data using adaptive approach. IETE J Res 65(6):780–789
Article Google Scholar

Download references

Acknowledgment

This work is partially supported by the National Natural Science Foundation of China under Grant Nos. 61772091, 61802035, 61962006, 61962038, U1802271, U2001212, 62072311; Digital Media Art, Key Laboratory of Sichuan Province, Sichuan Conservatory of Music, Chengdu, China under Grant No. 21DMAKL02; CCF-Huawei Database System Innovation Research Plan under Grant No. CCF-HuaweiDBIR2020004A; Chengdu Major Science and Technology Innovation Project under Grant No. 2021-YF08-00156-GX; Chengdu Technology Innovation and Research and Development Project under Grant No. 2021-YF05-00491-SN; Sichuan Science and Technology Program under Grant Nos. 2021JDJQ0021, 22ZDYF2680, 2020YFG0153, 2020YJ0481, 2020YFS0466, 2020YJ0430; the Natural Science Foundation of Guangxi under Grant No. 2018GXNSFDA138005, Guangdong Basic and Applied Basic Research Foundation under Grant No. 2020B1515120028; Guangxi Bagui Teams for Innovation and Research under Grant No. 201979.

The authors would like to thank Dr. Louis Alberto Gutierrez who is a researcher with the Department of Computer Science, Rensselaer Polytechnic Institute for his proofreading on this article.

Author information

Authors and Affiliations

School of Software Engineering, Chengdu University of Information Technology, Chengdu, 610225, China
Shaojie Qiao
School of Management, Chengdu University of Information Technology, Chengdu, 610225, China
Nan Han
School of Computer and Information Engineering, Nanning Normal University, Nanning, 530299, China
Faliang Huang
School of Information Science and Engineering, Yunnan University, Kunming, 650500, China
Kun Yue
School of Cybersecurity and Information Law, Chongqing University of Posts and Telecommunications, Chongqing, 400020, China
Tao Wu
School of Software, Jiangxi Normal University, Nanchang, 330022, China
Yugen Yi
Guangdong Province Key Laboratory of Popular High Performance Computers, Guangdong Province Engineering Center of China-made High Performance Data Computing System, Shenzhen, 518060, China
Rui Mao
Guangxi College of Education, Nanning, 530007, China
Chang-an Yuan

Authors

Shaojie Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Nan Han
View author publications
You can also search for this author in PubMed Google Scholar
Faliang Huang
View author publications
You can also search for this author in PubMed Google Scholar
Kun Yue
View author publications
You can also search for this author in PubMed Google Scholar
Tao Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yugen Yi
View author publications
You can also search for this author in PubMed Google Scholar
Rui Mao
View author publications
You can also search for this author in PubMed Google Scholar
Chang-an Yuan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nan Han.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qiao, S., Han, N., Huang, F. et al. LMNNB: Two-in-One imbalanced classification approach by combining metric learning and ensemble learning. Appl Intell 52, 7870–7889 (2022). https://doi.org/10.1007/s10489-021-02901-6

Download citation

Accepted: 05 October 2021
Published: 13 October 2021
Issue Date: May 2022
DOI: https://doi.org/10.1007/s10489-021-02901-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

LMNNB: Two-in-One imbalanced classification approach by combining metric learning and ensemble learning

Abstract

Access this article

Similar content being viewed by others

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey on ensemble learning

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

LMNNB: Two-in-One imbalanced classification approach by combining metric learning and ensemble learning

Abstract

Access this article

Similar content being viewed by others

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey on ensemble learning

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation