A novel density-based adaptive k nearest neighbor method for dealing with overlapping problem in imbalanced datasets

Yuan, Bo-Wen; Luo, Xing-Gang; Zhang, Zhong-Liang; Yu, Yang; Huo, Hong-Wei; Johannes, Tretter; Zou, Xiao-Dong

doi:10.1007/s00521-020-05256-0

A novel density-based adaptive k nearest neighbor method for dealing with overlapping problem in imbalanced datasets

Original Article
Published: 09 August 2020

Volume 33, pages 4457–4481, (2021)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Bo-Wen Yuan^1,3,
Xing-Gang Luo²,
Zhong-Liang Zhang²,
Yang Yu¹,
Hong-Wei Huo³,
Tretter Johannes³ &
…
Xiao-Dong Zou³

879 Accesses
19 Citations
Explore all metrics

Abstract

Although a large number of solutions have been proposed to handle imbalanced classification problems over past decades, many researches pointed out that imbalanced problem does not degrade learning performance by its own but together with other factors. One of these factors is the overlapping problem which plays an even larger role in the classification performance deterioration but is always ignored in previous study. In this paper, we propose a density-based adaptive k nearest neighbor method, namely DBANN, which can handle imbalanced and overlapping problems simultaneously. To do so, a simple but effective distance adjustment strategy is developed to adaptively find the most reliable query neighbors. Concretely, we first partition training data into six parts by density-based method. Next, for each part, we modify distance metric by considering both local and global distribution. Finally, output is made by the query neighbors selected in the new distance metric. Noticeably, the query neighbors of DBANN are adaptively changed according to the degree of imbalance and overlap. To show the validity of our proposed method, experiments are carried out on 16 synthetic datasets and 41 real-world datasets. The results supported by the proper statistical tests show that our proposed method significantly outperforms the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mass-Based Similarity Weighted k-Neighbor for Class Imbalance

A Proximity Weighted Evidential k Nearest Neighbor Classifier for Imbalanced Data

SNN-PDM: An Improved Probability Density Machine Algorithm Based on Shared Nearest Neighbors Clustering Technique

Article 17 May 2024

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Artificial Intelligence

References

Qiwei H, Chakhar S, Siraj S, Labib A (2017) Spare parts classification in industrial manufacturing using the dominance-based rough set approach. Eur J Oper Res 262(3):1136–1163
MATH Google Scholar
Li Z, Wang Y, Wang K (2019) A deep learning driven method for fault classification and degradation assessment in mechanical equipment. Comput Ind 104:1–10
Google Scholar
Lei K, Xie Y, Zhong S, Dai J, Yang M, Shen Y (2019) Generative adversarial fusion network for class imbalance credit scoring. Neural Comput Appl 32:8451–8462
Google Scholar
Villuendas-Rey Y, Rey-Benguría CF, Ferreira-Santiago Á, Camacho-Nieto O, Yáñez-Márquez C (2017) The naïve associative classifier (NAC): a novel, simple, transparent, and accurate classification model evaluated on financial data. Neurocomputing 265:105–115
Google Scholar
Shoaran M, Haghi BA, Taghavi M, Farivar M, Emami-Neyestanak A (2018) Energy-efficient classification for resource-constrained biomedical applications. IEEE J Emerg Sel Top Circuits Syst 8(4):693–707
Google Scholar
Lowrance CJ, Lauf AP (2019) An active and incremental learning framework for the online prediction of link quality in robot networks. Eng Appl Artif Intell 77:197–211
Google Scholar
Guo H, Li Y, Shang J, Mingyun G, Huang Y, Bing G (2017) Learning from class-imbalanced data: review of methods and applications. Expert Syst Appl 73:220–239
Google Scholar
Nekooeimehr I, Lai-Yuen SK (2016) Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets. Expert Syst Appl 46:405–416
Google Scholar
Jian C, Gao J, Ao Y (2016) A new sampling method for classifying imbalanced data based on support vector machine ensemble. Neurocomputing 193:115–122
Google Scholar
Raj V, Magg S, Wermter S (2016) Towards effective classification of imbalanced data with convolutional neural networks. In: IAPR workshop on artificial neural networks in pattern recognition. Springer, pp 150–162
Khan SH, Hayat M, Bennamoun M, Sohel FA, Togneri R (2018) Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Trans Neural Netw Learn Syst 29(8):3573–3587
Google Scholar
García S, Zhang Z-L, Altalhi A, Alshomrani S, Herrera F (2018) Dynamic ensemble selection for multiclass imbalanced datasets. Inf Sci 445:22–37
Google Scholar
Zhang Z, Krawczyk B, Garcìa S, Rosales-Pérez A, Herrera F (2016) Empowering one-vs-one decomposition with ensemble learning for multi-class imbalanced data. Knowl Based Syst 106(C):251–263
Google Scholar
Zhang ZL, Luo XG, González S, García S, Herrera F (2018) DRCW-ASEG: one-versus-one distance-based relative competence weighting with adaptive synthetic example generation for multi-class imbalanced datasets. Neurocomputing 285(12):176–187
Google Scholar
Denil M, Trappenberg T (2010) Overlap versus imbalance. In: Canadian conference on artificial intelligence. Springer, pp 220–231
Tang Y, Gao J (2007) Improved classification for problem involving overlapping patterns. IEICE Trans Inf Syst 90(11):1787–1795
Google Scholar
Peng P, Wang J (2019) Wear particle classification considering particle overlapping. Wear 422(423):119–127
Google Scholar
Liu CL (2006) Artificial neural networks in pattern recognition. In: Second IAPR workshop on artificial neural networks in pattern recognition (ANNPR 2006), pp 37–146
Chowdhury SA, Stepanov EA, Danieli M et al (2019) Automatic classification of speech overlaps: feature representation and algorithms. Comput Speech Lang 55:145–167
Google Scholar
Podder A, Latha N (2017) Data on overlapping brain disorders and emerging drug targets in human Dopamine Receptors Interaction Network. Data Br 12:277–286
Google Scholar
López V, Fernández A, García S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci 250:113–141
Google Scholar
García V, Sánchez J, Mollineda R (2007) An empirical study of the behavior of classifiers on imbalanced and overlapped data sets. In: Iberoamerican congress on pattern recognition. Springer, pp 397–406
Prati RC, Batista GE, Monard MC (2004) Class imbalances versus class overlapping: an analysis of a learning system behavior. In: Mexican international conference on artificial intelligence. Springer, pp 312–321
Yu Q, Hongye S, Guo L, Chu J (2011) A novel svm modeling approach for highly imbalanced and overlapping classification. Intell Data Anal 15(3):319–341
Google Scholar
Alejo R, Valdovinos RM, García V, Horacio Pacheco-Sanchez J (2013) A hybrid method to face class overlap and class imbalance on neural networks and multi-class scenarios. Pattern Recogn Lett 34(4):380–388
Google Scholar
Wasikowski M, Chen X (2010) Combating the small sample class imbalance problem using feature selection. IEEE Trans Knowl Data Eng 22(10):1388–1400
Google Scholar
Xia S-Y, Xiong Z-Y, He Y, Li K, Dong L-M, Zhang M (2014) Relative density-based classification noise detection. Optik Int J Light Electron Opt 125(22):6829–6834
Google Scholar
Sáez JA, Luengo J, Stefanowski J, Herrera F (2015) SMOTE–IPF: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf Sci 291:184–203
Google Scholar
Orriols-Puig A, Bernadó-Mansilla E, Goldberg DE, Sastry K, Lanzi PL (2009) Face twise analysis of XCS for problems with class imbalances. IEEE Trans Evol Comput 13(5):1093–1119
Google Scholar
Prati RC, Batista GE, Monard MC (2004) Learning with class skews and small disjuncts. In: Brazilian symposium on artificial intelligence. Springer, pp 296–306
Adams N (2010) Dataset shift in machine learning. J R Stat Soc Ser A (Stat Soc) 173(1):274
Google Scholar
Subbaswamy A, Saria S (2018) Counterfactual normalization: proactively addressing dataset shift and improving reliability using causal mechanisms. arXiv preprint arXiv:1808.03253
Ho TK, Basu M (2002) Complexity measures of supervised classification problems. IEEE Trans Pattern Anal Mach Intell 24(3):1–300
Google Scholar
Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newsl 6(1):20–29
Google Scholar
Fernández A, del Jesus MJ, Herrera F (2015) Addressing overlapping in classification with imbalanced datasets: a first multi-objective approach for feature and instance selection. In: International conference on intelligent data engineering and automated learning. Springer, pp 36–44
Alshomrani S, Bawakid A, Shim S-O, Fernández A, Herrera F (2015) A proposal for evolutionary fuzzy systems using feature weighting: dealing with overlapping in imbalanced datasets. Knowl Based Syst 73:1–17
Google Scholar
Xiong H, Wu J, Liu L (2010) Classification with class overlapping: a systematic study. In: Proceedings of the 1st international conference on E-business intelligence (ICEBI2010). Atlantis Press
Vorraboot P, Rasmequan S, Chinnasarn K, Lursinsap C (2015) Improving classification rate constrained to imbalanced data between overlapped and non-overlapped regions by hybrid algorithms. Neurocomputing 152:429–443
Google Scholar
Weiss GM (2004) Mining with rarity: a unifying framework. ACM SIGKDD Explor Newsl 6(1):7–19
Google Scholar
Vuttipittayamongkol P, Elyan E, Petrovski A, Jayne C (2018) Overlap-based undersampling for improving imbalanced data classification. In: International conference on intelligent data engineering and automated learning. Springer, Cham, 2018
Liu N, Xing X, Li Y, Zhu A (2019) Sparse representation based image super-resolution on the knn based dictionaries. Opt Laser Technol 110:135–144
Google Scholar
Kuzhali SE, Suresh DS (2018) Patch-based denoising with k-nearest neighbor and SVD for microarray images. In: Computer science on-line conference. Springer, pp 132–147
Kriminger E, Principe JC, Lakshminarayan C (2012) Nearest neighbor distributions for imbalanced classification. In: The 2012 international joint conference on neural networks (IJCNN). IEEE, pp 1–5
García V, Mollineda RA, Sánchez JS (2008) On the k-nn performance in a challenging scenario of imbalance and overlapping. Pattern Anal Appl 11(3–4):269–280
MathSciNet Google Scholar
Dubey H, Pudi V (2013) Class based weighted k-nearest neighbor over imbalance dataset. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 305–316
Harshita P, Thakur GS (2016) A hybrid weighted nearest neighbor approach to mine imbalanced data. In: Proceedings of the international conference on data mining (DMIN). The Steering Committee of the World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp), p 106
Harshita P, Thakur GS (2018) An improved fuzzy K-nearest neighbor algorithm for imbalanced data using adaptive approach. IETE J Res 2018:1–10
Google Scholar
Zhang X, Li Y (2011) A positive-biased nearest neighbor algorithm for imbalanced classification. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 293–304
Zhang X, Li Y, Kotagiri R, Lifang W, Tari Z, Cheriet M (2017) k rare-class nearest neighbor classification. Pattern Recogn 62:33–44
Google Scholar
Mullick SS, Datta S, Das S (2018) Adaptive learning-based k-nearest neighbor classifiers with resilience to class imbalance. IEEE Trans Neural Netw Learn Syst 99:1–13
MathSciNet Google Scholar
Wang J, Neskovic P, Cooper LN (2007) Improving nearest neighbor rule with a simple adaptive distance measure. Pattern Recogn Lett 28(2):207–213
Google Scholar
İnkaya T (2015) A density and connectivity based decision rule for pattern classification. Expert Syst Appl 42(2):906–912
Google Scholar
Van Hulse J, Khoshgoftaar TM, Napolitano A (2010) A novel noise filtering algorithm for imbalanced data. In: 2010 9th international conference on machine learning and applications. IEEE, pp 9–14
Kang Q, Chen XS, Li S, Zhou M (2017) A noise filtered under-sampling scheme for imbalanced classification. IEEE Trans Cybern 47(12):4263–4274
Google Scholar
Schubert E, Sander J, Ester M, Kriegel HP, Xiaowei X (2017) Dbscan revisited, revisited: why and how you should (still) use dbscan. ACM Trans Database Syst (TODS) 42(3):19
MathSciNet Google Scholar
Czerniawski T, Sankaran B, Nahangi M, Haas C, Leite F (2017) 6D DBSCAN-based segmentation of building point clouds for planar object classification. Autom Constr 88:44–58
Google Scholar
Das B, Krishnan NC, Cook DJ (2014) Handling imbalanced and overlapping classes in smart environments prompting dataset. In: Yada K (ed) Data mining for service. Springer, Berlin, pp 199–219
Google Scholar
Alcalafdez J, Sánchez L, García S, Del Jesus MJ, Ventura S, Garrell JM, Otero J, Romero C, Bacardit J, Rivas VM (2009) KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput 13(3):307–318
Google Scholar
Chawla NV, Bowyer KW, Hall LO, Philip Kegelmeyer W (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
MATH Google Scholar
Zhang J, Shi H (2019) Kd-tree based efficient ensemble classification algorithm for imbalanced learning. In: 2019 international conference on machine learning, big data and business intelligence (MLBDBI), pp 203–207
Lu Y, Cheung YM, Tang YY (2016) Hybrid sampling with bagging for class imbalance learning. In: Pacific-Asia conference on knowledge discovery and data mining. Springer International Publishing
Seiffert C, Khoshgoftaar TM, Van Hulse J, Napolitano A (2010) RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans Syst Man Cybern Part A Syst Hum 40(1):185–197
Google Scholar
Demšar J (2010) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Iman RL, Davenport JM (1980) Approximations of the critical region of the fbietkan statistic. Commun Stat Theory Methods 9(6):571–595
MATH Google Scholar
Ester M, Kriegel H-P, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. KDD 96:226–231
Google Scholar
Bader-El-Den M, Teitei E, Perry T (2019) Biased random forest for dealing with the class imbalance problem. IEEE Trans Neural Netw Learn Syst 30(7):2163–2172
Google Scholar

Download references

Acknowledgements

The authors would like to thank the editor and reviewers for their useful comments and suggestions, which are of great help in improving the quality of the paper. This work is financially supported by the National Science Foundation of China (NSFC Proj. 71831006, 71801065, and 71771070), Zhejiang Provincial Natural Science Foundation of China under Grant No. LZ20G010001 and the Promotion China Ph.D Program from BMW Briliance Automotive.

Author information

Authors and Affiliations

Department of Information Science and Engineering, Northeastern University, Shenyang, 110819, China
Bo-Wen Yuan & Yang Yu
Department of Management, Hangzhou Dianzi University, Hangzhou, 310018, China
Xing-Gang Luo & Zhong-Liang Zhang
Department of Foundry, BMW Brilliance Automotive Ltd, Shenyang, 110143, China
Bo-Wen Yuan, Hong-Wei Huo, Tretter Johannes & Xiao-Dong Zou

Authors

Bo-Wen Yuan
View author publications
You can also search for this author inPubMed Google Scholar
Xing-Gang Luo
View author publications
You can also search for this author inPubMed Google Scholar
Zhong-Liang Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Yang Yu
View author publications
You can also search for this author inPubMed Google Scholar
Hong-Wei Huo
View author publications
You can also search for this author inPubMed Google Scholar
Tretter Johannes
View author publications
You can also search for this author inPubMed Google Scholar
Xiao-Dong Zou
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Xing-Gang Luo.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yuan, BW., Luo, XG., Zhang, ZL. et al. A novel density-based adaptive k nearest neighbor method for dealing with overlapping problem in imbalanced datasets. Neural Comput & Applic 33, 4457–4481 (2021). https://doi.org/10.1007/s00521-020-05256-0

Download citation

Received: 05 October 2019
Accepted: 27 July 2020
Published: 09 August 2020
Issue Date: May 2021
DOI: https://doi.org/10.1007/s00521-020-05256-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel density-based adaptive k nearest neighbor method for dealing with overlapping problem in imbalanced datasets

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Mass-Based Similarity Weighted k-Neighbor for Class Imbalance

A Proximity Weighted Evidential k Nearest Neighbor Classifier for Imbalanced Data

SNN-PDM: An Improved Probability Density Machine Algorithm Based on Shared Nearest Neighbors Clustering Technique

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now