Borderline-margin loss based deep metric learning framework for imbalanced data

Yan, Mi; Li, Ning

doi:10.1007/s10489-022-03494-4

Borderline-margin loss based deep metric learning framework for imbalanced data

Published: 29 April 2022

Volume 53, pages 1487–1504, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

421 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

The imbalanced data suffer the problem where minority class is under-represented compared with majority ones. Traditional imbalanced learning algorithms only consider the class imbalance while ignoring the class overlap, which leads to an undesirable accuracy for minority samples in overlapping regions. Considering the above issue, we propose a deep metric framework with borderline-margin loss (DMFBML) for improving the intra-class coherence and inter-class difference in overlapping regions. Firstly, a flexible borderline margin is designed for each minority sample, which is adaptively adjusted according to the neighborhood’s label. The proposed margin enables to discriminate minority samples with varying overlap degrees, which significantly preserves the valuable information of classification boundary. The input data is then reconstructed into training triplets set to generate more metric constraints for minority samples, thereby increasing the difference in overlapping regions. Finally, a neural network with DMFBML is presented to achieve a better classifier performance on imbalanced data. The proposed method is verified by the comparative experiments on six synthetic datasets and eleven actual datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Class overlap handling methods in imbalanced domain: A comprehensive survey

Article 11 January 2024

Anil Kumar, Dinesh Singh & Rama Shankar Yadav

ARConvL: Adaptive Region-Based Convolutional Learning for Multi-class Imbalance Classification

Hybrid Loss for Improving Classification Performance with Unbalanced Data

References

He H, Garcia EA (2009) Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering 21(9):1263–1284
Google Scholar
Du G, Zhang J, Ma F, Zhao M, Lin Y, Li S (2021) Towards graph-based class-imbalance learning for hospital readmission. Expert Syst Appl 176:114791
Google Scholar
Pes B (2020) Learning from high-dimensional biomedical datasets: The issue of class imbalance. IEEE Access 8:13527–13540
Google Scholar
Jiang N, Li N (2021) A wind turbine frequent principal fault detection and localization approach with imbalanced data using an improved synthetic oversampling technique. International Journal of Electrical Power & Energy Systems 126:106595
Google Scholar
Peng P, Zhang W, Zhang Y, Xu Y, Wang H, Zhang H (2020) Cost sensitive active learning using bidirectional gated recurrent neural networks for imbalanced fault diagnosis. Neurocomputing 407:232–245
Google Scholar
Sun J, Li H, Fujita H, Fu B, Ai W (2020) Class-imbalanced dynamic financial distress prediction based on adaboost-svm ensemble combined with smote and time weighting. Inform Fusion 54:128–144
Google Scholar
Du X, Li W, Ruan S, Li L (2020) Cus-heterogeneous ensemble-based financial distress prediction for imbalanced dataset with ensemble feature selection. Appl Soft Comput 97:106758
Google Scholar
Tomek I (1976) Two modifications of cnn. IEEE Trans Syst Man Cybern SMC-6(11):769–772
MathSciNet MATH Google Scholar
Kubat M, Matwin S, et al. (1997) Addressing the curse of imbalanced training sets: One-sided selection. In: Icml, citeseer, vol 97, pp 179–186
Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter 6(1):20–29
Google Scholar
Laurikkala J (2001) Improving identification of difficult small classes by balancing class distribution. In: Conference on artificial intelligence in medicine in Europe, Springer, pp 63–66
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16:321–357
MATH Google Scholar
Han H, Wang WY, Mao BH (2005) Borderline-smote: A new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing, Springer, pp 878–887
Marques ML, Villela SM, Borges CCH (2020) Large margin classifiers to generate synthetic data for imbalanced datasets. Appl Intell 50(11):3678–3694
Google Scholar
He H, Bai Y, Garcia EA, Li S (2008) Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International joint conference on neural networks (IEEE world congress on computational intelligence), IEEE, pp 1322– 1328
Viegas F, Rocha L, Gonçalves M, Mourão F, Sá G, Salles T, Andrade G, Sandin I (2018) A genetic programming approach for feature selection in highly dimensional skewed data. Neurocomputing 273:554–569
Google Scholar
Shahee SA, Ananthakumar U (2020) An effective distance based feature selection approach for imbalanced data. Appl Intell 50(3):717–745
Google Scholar
Wong ML, Seng K, Wong PK (2020) Cost-sensitive ensemble of stacked denoising autoencoders for class imbalance problems in business domain. Expert Syst Appl 141:112918
Google Scholar
Zheng W, Zhao H (2020) Cost-sensitive hierarchical classification for imbalance classes. Appl Intell 50(8):2328–2338
MathSciNet Google Scholar
Ding M, Yang Y, Lan Z (2018) Multi-label imbalanced classification based on assessments of cost and value. Appl Intell 48(10):3577–3590
Google Scholar
Prati RC, Batista GE, Monard MC (2004) Class imbalances versus class overlapping: An analysis of a learning system behavior. In: Mexican international conference on artificial intelligence, Springer, pp 312–321
García V, Mollineda RA, Sánchez JS (2008) On the k-nn performance in a challenging scenario of imbalance and overlapping. Pattern Anal Applic 11(3):269–280
MathSciNet Google Scholar
Denil M, Trappenberg TP (2010) Overlap versus imbalance. In: Canadian conference on advances in artificial intelligence
Almutairi W, Janicki R (2020) On relationships between imbalance and overlapping of datasets. In: CATA, pp 141–150
Abdi L, Hashemi S (2015) To combat multi-class imbalanced problems by means of over-sampling techniques. IEEE Transactions on Knowledge and Data Engineering 28(1):238– 251
Google Scholar
Zhang N, Karimoune W, Thompson L, Dang H (2017) A between-class overlapping coherence-based algorithm in knn classification. In: 2017 IEEE international conference on systems, man, and cybernetics (SMC), IEEE, pp 572–577
Vuttipittayamongkol P, Elyan E (2020) Neighbourhood-based undersampling approach for handling imbalanced and overlapped data. Inf Sci 509:47–70
Google Scholar
Chen X, Zhang L, Wei X, Lu X (2020) An effective method using clustering-based adaptive decomposition and editing-based diversified oversamping for multi-class imbalanced datasets. Appl Intell, pp 1–16
Zhu Y, Yan Y, Zhang Y, Zhang Y (2020) Ehso: Evolutionary hybrid sampling in overlapping scenarios for imbalanced learning. Neurocomputing 417:333–346
Google Scholar
Zhu H, Liu G, Zhou M, Xie Y, Abusorrah A, Kang Q (2020) Optimizing weighted extreme learning machines for imbalanced classification and application to credit card fraud detection, vol 407
Liu Z, Jin W, Mu Y (2020) Variances-constrained weighted extreme learning machine for imbalanced classification. Neurocomputing 403:45–52
Google Scholar
Cheng K, Gao S, Dong W, Yang X, Wang Q, Yu H (2020) Boosting label weighted extreme learning machine for classifying multi-label imbalanced data. Neurocomputing 403:360–370
Google Scholar
Wang Y, Gan W, Yang J, Wu W, Yan J (2019) Dynamic curriculum learning for imbalanced data classification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 5017–5026
Fernando KRM, Tsokos CP (2021) Dynamically weighted balanced loss: Class imbalanced learning and confidence calibration of deep neural networks. IEEE Transactions on Neural Networks and Learning Systems
Kaya M, Bilge HŞ (2019) Deep metric learning: A survey. Symmetry 11(9):1066
Google Scholar
Bellet A, Habrard A, Sebban M (2015) Metric learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 9(1):1–151
MATH Google Scholar
Chopra S, Hadsell R, LeCun Y (2005) Learning a similarity metric discriminatively, with application to face verification. In: 2005 IEEE Computer society conference on computer vision and pattern recognition (CVPR’05), vol 1. IEEE, pp 539–546
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823
Wen Y, Zhang K, Li Z, Qiao Y (2016) A discriminative feature learning approach for deep face recognition. In: European conference on computer vision, Springer, pp 499–515
Ma L, Li H, Meng F, Wu Q, Ngan KN (2017) Learning efficient binary codes from high-level feature representations for multilabel image retrieval. IEEE Transactions on Multimedia 19(11):2545–2560
Google Scholar
Ma L, Li H, Meng F, Wu Q, Xu L (2017) Manifold-ranking embedded order preserving hashing for image semantic retrieval. Journal of Visual Communication and Image Representation 44:29–39
Google Scholar
Ma L, Li X, Shi Y, Wu J, Zhang Y (2020) Correlation filtering-based hashing for fine-grained image retrieval. IEEE Signal Process Lett 27:2129–2133
Google Scholar
Ma L, Li X, Shi Y, Huang L, Huang Z, Wu J (2021) Learning discrete class-specific prototypes for deep semantic hashing. Neurocomputing 443:85–95
Google Scholar
Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research 10(2)
Ma L, Li H, Meng F, Wu Q, Ngan KN (2018) Global and local semantics-preserving based deep hashing for cross-modal retrieval. Neurocomputing 312:49–62
Google Scholar
Ma L, Li H, Meng F, Wu Q, Ngan KN (2020) Discriminative deep metric learning for asymmetric discrete hashing. Neurocomputing 380:115–124
Google Scholar
Gautheron L, Habrard A, Morvant E, Sebban M (2020) Metric learning from imbalanced data with generalization guarantees. Pattern Recogn Lett 133:298–304
Google Scholar
Huang C, Li Y, Loy CC, Tang X (2016) Learning deep representation for imbalanced classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5375–5384
Hyun Y, Kim H (2020) Memory-augmented convolutional neural networks with triplet loss for imbalanced wafer defect pattern classification. IEEE Trans Semicond Manuf 33(4):622–634
Google Scholar
Lei W, Zhang R, Yang Y, Wang R, Zheng WS (2020) Class-center involved triplet loss for skin disease classification on imbalanced data. 2020 IEEE 17Th international symposium on biomedical imaging (ISBI), IEEE, pp 1–5
Andresini G, Appice A, Malerba D (2021) Autoencoder-based deep metric learning for network intrusion detection. Inf Sci 569:706–727
MathSciNet Google Scholar
Qiao S, Han N, Huang F, Yue K, Wu T, Yi Y, Mao R, Ca Yuan (2021) Lmnnb: Two-in-one imbalanced classification approach by combining metric learning and ensemble learning. Appl Intell, pp 1–20
Gui X, Zhang J, Tang J, Xu H, Zou J, Fan S (2022) A quadruplet deep metric learning model for imbalanced time-series fault diagnosis. Knowl-Based Syst 238:107932
Google Scholar
Denil M, Trappenberg TP (2010) Overlap versus imbalance. In: Canadian conference on advances in artificial intelligence
Jo T, Japkowicz N (2004) Class imbalances versus small disjuncts. ACM Sigkdd Explorations Newsletter 6(1):40–49
Google Scholar
Ge W (2018) Deep metric learning with hierarchical triplet loss. In: Proceedings of the European conference on computer vision (ECCV), pp 269–285
Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic & Soft Computing 17
Loh WY (2011) Classification and regression trees. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 1(1):14–23
Google Scholar
Douzas G, Bacao F, Last F (2018) Improving imbalanced learning through a heuristic oversampling method based on k-means and smote. Inf Sci 465:1–20
Google Scholar
Douzas G, Bacao F (2017) Self-organizing map oversampling (somo) for imbalanced data set learning. Expert Systems with Applications 82:40–52
Google Scholar
Li J, Fong S, Wong RK, Chu VW (2018) Adaptive multi-objective swarm fusion for imbalanced data classification. Information Fusion 39:1–24
Google Scholar
Chao C, Breiman L (2004) Using random forest to learn imbalanced data
Seiffert C, Khoshgoftaar TM, Van Hulse J, Napolitano A (2009) Rusboost: A hybrid approach to alleviating class imbalance. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 40(1):185–197
Google Scholar
Liu Z, Cao W, Gao Z, Bian J, Chen H, Chang Y, Liu TY (2020) Self-paced ensemble for highly imbalanced massive data classification. In: 2020 IEEE 36Th international conference on data engineering (ICDE), IEEE, pp 841–852
Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
He X, Zhou Y, Zhou Z, Bai S, Bai X (2018) Triplet-center loss for multi-view 3d object retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1945–1954
Lemaître G, Nogueira F, Aridas CK (2017) Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. J Mach Learn Res 18(17):1–5. http://jmlr.org/papers/v18/16-365
Google Scholar
Kovács G (2019) smote-variants: A python implementation of 85 minority oversampling techniques. Neurocomputing 366:352–354. https://doi.org/10.1016/j.neucom.2019.06.100, (IF-2019 = 4.07)
Google Scholar

Download references

Acknowledgements

This work is supported by National Key R&D Program of China (2018YFB1305902) and National Nature Science Foundation under Grant (61773260).

Author information

Authors and Affiliations

Department of Automation, Shanghai Jiao Tong University, Key Laboratory of System Control and Information Processing, Ministry of Education of China, and Shanghai Engineering Research Center of Intelligent Control and Management, Shanghai, 200240, China
Mi Yan & Ning Li

Authors

Mi Yan
View author publications
You can also search for this author in PubMed Google Scholar
Ning Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ning Li.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yan, M., Li, N. Borderline-margin loss based deep metric learning framework for imbalanced data. Appl Intell 53, 1487–1504 (2023). https://doi.org/10.1007/s10489-022-03494-4

Download citation

Accepted: 10 March 2022
Published: 29 April 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s10489-022-03494-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Borderline-margin loss based deep metric learning framework for imbalanced data

Abstract

Access this article

Similar content being viewed by others

Class overlap handling methods in imbalanced domain: A comprehensive survey

ARConvL: Adaptive Region-Based Convolutional Learning for Multi-class Imbalance Classification

Hybrid Loss for Improving Classification Performance with Unbalanced Data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Borderline-margin loss based deep metric learning framework for imbalanced data

Abstract

Access this article

Similar content being viewed by others

Class overlap handling methods in imbalanced domain: A comprehensive survey

ARConvL: Adaptive Region-Based Convolutional Learning for Multi-class Imbalance Classification

Hybrid Loss for Improving Classification Performance with Unbalanced Data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation