Skip to main content
Log in

LMNNB: Two-in-One imbalanced classification approach by combining metric learning and ensemble learning

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In the real-world applications of machine learning and cybernetics, the data with imbalanced distribution of classes or skewed class proportions is very pervasive. When dealing with imbalanced data, traditional classification approaches might fail to learn a good classifier. In the phase of learning, these algorithms are greatly impacted by the skewed distribution of data. Consequently, the performance of classification drops drastically. In this study, we propose a novel two-in-one algorithm for classifying the imbalanced data by integrating metric learning and ensemble learning algorithms. Firstly, we design a new metric learning algorithm for imbalanced data, which is called Large Margin Nearest Neighbors Balance (called LMNNB). This method can minimize the distance between one sample and its similar neighbors which belong to the same class, and maximize the distance from its dissimilar neighbors which belong to different classes as well. Essentially, this beneficial effect can also be achieved even if the distribution of data is imbalanced. Through metric learning, the imbalance data can be used to learn a better classifier. Secondly, we propose an ensemble learning algorithm to further improve the performance of classification. This method combines multiple sub-classifiers and makes decisions by applying a soft voting strategy. Extensive experiments are conducted on real benchmark imbalanced datasets to demonstrate the effectiveness of LMNNB with ensemble algorithm (called LMNNB-E) in several evaluation measurements. The results show that LMNNB and LMNNB-E outperform the state-of-the-art methods in classifying imbalance data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Xu S, Yang M, Zhou Yu , Zheng R, Liu W, He J (2020) Partial label metric learning by collapsing classes. Int J Mach Learn Cybern 11(11):2453–2460

    Article  Google Scholar 

  2. Ri JH, Tian G, Liu Y, Wei-Hua X, Lou J-G (2020) Extreme learning machine with hybrid cost function of g-mean and probability for imbalance learning. Int J Mach Learn Cybern 11(9):2007–2020

    Article  Google Scholar 

  3. Hsiao Y-H, Su C-T, Fu P-C (2020) Integrating MTS with bagging strategy for class imbalance problems. Int J Mach Learn Cybern 11(6):1217–1230

    Article  Google Scholar 

  4. Galar M, Fernández A, Tartas EB, Sola HB, Herrera F (2012) A review on ensembles for the class imbalance problem Bagging-boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern Part C 42(4):463–484

    Article  Google Scholar 

  5. Liu D, Qiao S, Han N, Wu T, Mao R, Zhang Y, Yuan C, Xiao Y (2020) SOTB: Semi-Supervised oversampling approach based on trigonal barycenter theory. IEEE Access 8:50180–50189

    Article  Google Scholar 

  6. Xu H, Cui R, Lan Y, Kang Y, Deng JG, Jia N (2019) A gaussian mixture model based combined resampling algorithm for classification of imbalanced credit data sets. Int J Mach Learn Cybern 10 (12):3687–3699

    Article  Google Scholar 

  7. Han N, Qiao S, Yuan G, Huang P, Liu D, Yue K (2019) A novel chinese herbal medicine clustering algorithm via artificial bee colony optimization. Artif Intell Med 101:101760

    Article  Google Scholar 

  8. Sultana N, Chilamkurti N, Peng W, Alhadad R (2019) Survey on SDN based network intrusion detection system using machine learning approaches. Peer-to-Peer Netw Appl 12(2):493–501

    Article  Google Scholar 

  9. Fiore U, De Santis A, Perla F, Zanetti P, Palmieri F (2019) Using generative adversarial networks for improving classification effectiveness in credit card fraud detection. Inf Sci 479:448–455

    Article  Google Scholar 

  10. Bach M, Werner A, Zywiec J, Pluskiewicz W (2017) The study of under- and over-sampling methods’ utility in analysis of highly imbalanced data on osteoporosis. Inf Sci 384:174–190

    Article  Google Scholar 

  11. Huang J-W, Chiang C-W, Chang J-W (2018) Email security level classification of imbalanced data using artificial neural network: The real case in a world-leading enterprise. Eng Appl Artif Intell 75:11–21

    Article  Google Scholar 

  12. Zhai J, Zhou X, Zhang S, Wang T (2019) Ensemble rbm-based classifier using fuzzy integral for big data classification. Int J Mach Learn Cybern 10(11):3327–3337

    Article  Google Scholar 

  13. Roshan SE, Asadi S (2020) Improvement of bagging performance for classification of imbalanced datasets using evolutionary multi-objective optimization. Eng Appl Artif Intell 87:103319

    Article  Google Scholar 

  14. Zang B, Huang R, Wang L, Chen J, Tian F, Wei X (2016) An improved knn algorithm based on minority class distribution for imbalanced dataset. In: Proceedings of 2016 International Computer Symposium (ICS). IEEE, pp 696–700

  15. Benítez-Peña S, Blanquero R, Carrizosa E, Ramírez-Cobo P (2019) Cost-sensitive feature selection for support vector machines. Comput Oper Res 106:169–178

    Article  MathSciNet  Google Scholar 

  16. Furundzic D, Stankovic S, Jovicic S, Punisic S, Subotic M (2017) Distance based resampling of imbalanced classes: With an application example of speech quality assessment. Eng Appl Artif Intell 64:440–461

    Article  Google Scholar 

  17. Fan Q, Wang Z, Gao D (2016) One-sided dynamic undersampling no-propagation neural networks for imbalance problem. Eng Appl Artif Intell 53:62–73

    Article  Google Scholar 

  18. Yang K, Yu Z, Wen X, Cao W, Chen CLP, Wong Hau-San, You J (2020) Hybrid classifier ensemble for imbalanced data. IEEE Trans Neural Netw Learn Syst 31(4):1387–1400

    Article  MathSciNet  Google Scholar 

  19. Lim P, Keong Goh C, Chen Tan K (2017) Evolutionary cluster-based synthetic oversampling ensemble (eco-ensemble) for imbalance learning. IEEE Trans Cybern 47(9):2850–2861

    Article  Google Scholar 

  20. Sugiyama M (2006) Local fisher discriminant analysis for supervised dimensionality reduction. In: Proceedings of the Twenty-Third International Conference on Machine Learning (ICML 2006), Pittsburgh, Pennsylvania, USA, June 25-29, 2006, pp 905–912

  21. Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244

    MATH  Google Scholar 

  22. Weinberger KQ, Tesauro G (2007) Metric learning for kernel regression. In: Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, AISTATS 2007, San Juan, Puerto Rico, March 21-24, 2007, pp 612–619

  23. Xing EP, Ng AY, Jordan MI, Russell SJ (2002) Distance metric learning with application to clustering with side-information. In: Advances in neural information processing systems 15 [neural information processing systems, NIPS 2002, December 9-14, 2002, vancouver, british columbia, canada], pp 505–512

  24. Davis JV, Kulis B, Jain P, Sra S, Dhillon IS (2007) Information-theoretic metric learning. In: Proceedings of the Twenty-Fourth International Conference on Machine Learning (ICML 2007), Corvallis, Oregon, USA, June 20-24, 2007, pp 209–216

  25. Goldberger J, T Roweis S, Hinton GE, Salakhutdinov R (2004) Neighbourhood components analysis. In: Advances in neural information processing systems 17 [neural information processing systems, NIPS 2004, december 13-18, 2004, vancouver, british columbia, canada], pp 513–520

  26. Cheng G, Yang C, Yao X, Guo L, Han J (2018) When deep learning meets metric learning: Remote sensing image scene classification via learning discriminative cnns. IEEE Trans Geosci Remote Sens 56 (5):2811–2821

    Article  Google Scholar 

  27. Liao S, Hu Y, Zhu X, Li SZ (2015) Person re-identification by local maximal occurrence representation and metric learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, pp 2197–2206

  28. Meyer BJ, Harwood B, Drummond T (2018) Deep metric learning and image classification with nearest neighbour gaussian kernels. In: Proceedings of 2018 IEEE International Conference on Image Processing, ICIP 2018, Athens, Greece, October 7-10, 2018, pp 151–155

  29. Jun Y, Yang X, Gao F, Tao D (2017) Deep multimodal distance metric learning using click constraints for image ranking. IEEE Trans Cybern 47(12):4014–4024

    Article  Google Scholar 

  30. Kumar A, Halder A (2020) Ensemble-based active learning using fuzzy-rough approach for cancer sample classification. Eng Appl Artif Intell 91:103591

    Article  Google Scholar 

  31. Wang S, Yao X (2009) Diversity analysis on imbalanced data sets by using ensemble models. In: Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2009, part of the IEEE Symposium Series on Computational Intelligence 2009, Nashville, TN, USA, March 30, 2009 - April 2, 2009, pp 324–331

  32. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Article  Google Scholar 

  33. Díez-Pastor J-F, Diez JJR, García-Osorio CI, Kuncheva LI (2015) Random balance: Ensembles of variable priors classifiers for imbalanced data. Knowl Based Syst 85:96–111

    Article  Google Scholar 

  34. Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Proceedings of the Thirteenth International Conference on Machine Learning (ICML ’96), Bari, Italy, July 3-6, 1996, pp 148–156

  35. Weinberger KQ, Blitzer J, Saul LK (2005) Distance metric learning for large margin nearest neighbor classification. In: Advances in neural information processing systems 18 [neural information processing systems, NIPS 2005, december 5-8, 2005, vancouver, british columbia, canada], pp 1473–1480

  36. Han H, Wang W-Y, Mao B-H (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: Proceedings of International Conference on Intelligent Computing. Springer, pp 878–887

  37. Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) Smoteboost: Improving prediction of the minority class in boosting. In: Proceedings of Knowledge Discovery in Databases: PKDD 2003, 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, Cavtat-Dubrovnik, Croatia, September 22-26, 2003, Proceedings, pp 107–119

  38. Soda P (2011) A multi-objective optimisation approach for class imbalance learning. Pattern Recognit 44(8):1801–1810

    Article  Google Scholar 

  39. Liu P, Wang X, Yin L, Liu B (2020) Flat random forest: a new ensemble learning method towards better training efficiency and adaptive model size to deep forest. Int J Mach Learn Cybern 11(11):2501–2513

    Article  Google Scholar 

  40. Zhang X, Han N, Qiao S, Zhang Y, Huang P, Peng J, Zhou K, Yuan C, Xiao Y (2020) Balancing large margin nearest neighbours for imbalanced data. J Eng 2020(13):316–321

    Article  Google Scholar 

  41. Siddappa NG, Kampalappa T (2020) Imbalance data classification using local mahalanobis distance learning based on nearest neighbor SN. Comput Sci 1(2):76

    Google Scholar 

  42. Sun J, Lang J, Fujita H, Li H (2018) Imbalanced enterprise credit evaluation with DTE-SBD: decision tree ensemble based on SMOTE and bagging with differentiated sampling rates. Inf Sci 425:76–91

    Article  MathSciNet  Google Scholar 

  43. Zhang C, Bi J, Xu S, Ramentol E, Fan G, Qiao B, Fujita H (2019) Multi-imbalance: An open-source software for multi-class imbalance learning. Knowl Based Syst 174:137–143

    Article  Google Scholar 

  44. Sun J, Li H, Fujita H, Fu B, Ai W (2020) Class-imbalanced dynamic financial distress prediction based on adaboost-svm ensemble combined with SMOTE and time weighting. Inf Fusion 54:128–144

    Article  Google Scholar 

  45. Wang Q, Wan J, Yuan Y (2018) Deep metric learning for crowdedness regression. IEEE Trans Circuits Syst Video Techn 28(10):2633–2643

    Article  Google Scholar 

  46. Song HO, Xiang Y, Jegelka S, Savarese S (2016) Deep metric learning via lifted structured feature embedding. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp 4004–4012

  47. Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300

    Article  Google Scholar 

  48. Keller JM, Gray MR, Givens JA (1985) A fuzzy k-nearest neighbor algorithm. IEEE Trans Syst Man Cybern 15(4):580–585

    Article  Google Scholar 

  49. Loh W-Y (2011) Classification and regression trees. Wiley Interdiscip. Rev Data Min Knowl Discov 1(1):14–23

    Article  Google Scholar 

  50. Zolnierek A, Rubacha B (2005) The empirical study of the naive bayes classifier in the case of markov chain recognition task. In: Proceedings of the 4th International Conference on Computer Recognition Systems, CORES’05, May 22-25, 2005, Rydzyna Castle, Poland, pp 329–336

  51. Joanne Peng Chao-Ying, Lee KL, Ingersoll GM (2002) An introduction to logistic regression analysis and reporting. J Educ Res 96(1):3–14

    Article  Google Scholar 

  52. Dua D, Graff C (2017) UCI machine learning repository

  53. Bae J-S, Oh S-K, Pedrycz W, Fu Z (2019) Design of fuzzy radial basis function neural network classifier based on information data preprocessing for recycling black plastic wastes: comparative studies of ATR FT-IR and raman spectroscopy. Appl Intell 49(3):929–949

    Article  Google Scholar 

  54. de Vazelhes W, Carey CJ, Tang Y, Vauquier N, Bellet A (2019) Metric-learn: Metric Learning Algorithms in Python. Technical report. arXiv:1908.04710

  55. Liu X-Y, Wu J, Zhou Z-H (2009) Exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man Cybern Part B 39(2):539–550

    Article  Google Scholar 

  56. Seiffert C, Khoshgoftaar TM, Van Hulse J, Napolitano A (2010) RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Trans syst Man Cybern Part A 40(1):185–197

    Article  Google Scholar 

  57. Patel H, Thakur GS (2019) An improved fuzzy k-nearest neighbor algorithm for imbalanced data using adaptive approach. IETE J Res 65(6):780–789

    Article  Google Scholar 

Download references

Acknowledgment

This work is partially supported by the National Natural Science Foundation of China under Grant Nos. 61772091, 61802035, 61962006, 61962038, U1802271, U2001212, 62072311; Digital Media Art, Key Laboratory of Sichuan Province, Sichuan Conservatory of Music, Chengdu, China under Grant No. 21DMAKL02; CCF-Huawei Database System Innovation Research Plan under Grant No. CCF-HuaweiDBIR2020004A; Chengdu Major Science and Technology Innovation Project under Grant No. 2021-YF08-00156-GX; Chengdu Technology Innovation and Research and Development Project under Grant No. 2021-YF05-00491-SN; Sichuan Science and Technology Program under Grant Nos. 2021JDJQ0021, 22ZDYF2680, 2020YFG0153, 2020YJ0481, 2020YFS0466, 2020YJ0430; the Natural Science Foundation of Guangxi under Grant No. 2018GXNSFDA138005, Guangdong Basic and Applied Basic Research Foundation under Grant No. 2020B1515120028; Guangxi Bagui Teams for Innovation and Research under Grant No. 201979.

The authors would like to thank Dr. Louis Alberto Gutierrez who is a researcher with the Department of Computer Science, Rensselaer Polytechnic Institute for his proofreading on this article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nan Han.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qiao, S., Han, N., Huang, F. et al. LMNNB: Two-in-One imbalanced classification approach by combining metric learning and ensemble learning. Appl Intell 52, 7870–7889 (2022). https://doi.org/10.1007/s10489-021-02901-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02901-6

Keywords

Navigation