Skip to main content
Log in

A novel density-based adaptive k nearest neighbor method for dealing with overlapping problem in imbalanced datasets

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Although a large number of solutions have been proposed to handle imbalanced classification problems over past decades, many researches pointed out that imbalanced problem does not degrade learning performance by its own but together with other factors. One of these factors is the overlapping problem which plays an even larger role in the classification performance deterioration but is always ignored in previous study. In this paper, we propose a density-based adaptive k nearest neighbor method, namely DBANN, which can handle imbalanced and overlapping problems simultaneously. To do so, a simple but effective distance adjustment strategy is developed to adaptively find the most reliable query neighbors. Concretely, we first partition training data into six parts by density-based method. Next, for each part, we modify distance metric by considering both local and global distribution. Finally, output is made by the query neighbors selected in the new distance metric. Noticeably, the query neighbors of DBANN are adaptively changed according to the degree of imbalance and overlap. To show the validity of our proposed method, experiments are carried out on 16 synthetic datasets and 41 real-world datasets. The results supported by the proper statistical tests show that our proposed method significantly outperforms the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Qiwei H, Chakhar S, Siraj S, Labib A (2017) Spare parts classification in industrial manufacturing using the dominance-based rough set approach. Eur J Oper Res 262(3):1136–1163

    MATH  Google Scholar 

  2. Li Z, Wang Y, Wang K (2019) A deep learning driven method for fault classification and degradation assessment in mechanical equipment. Comput Ind 104:1–10

    Google Scholar 

  3. Lei K, Xie Y, Zhong S, Dai J, Yang M, Shen Y (2019) Generative adversarial fusion network for class imbalance credit scoring. Neural Comput Appl 32:8451–8462

    Google Scholar 

  4. Villuendas-Rey Y, Rey-Benguría CF, Ferreira-Santiago Á, Camacho-Nieto O, Yáñez-Márquez C (2017) The naïve associative classifier (NAC): a novel, simple, transparent, and accurate classification model evaluated on financial data. Neurocomputing 265:105–115

    Google Scholar 

  5. Shoaran M, Haghi BA, Taghavi M, Farivar M, Emami-Neyestanak A (2018) Energy-efficient classification for resource-constrained biomedical applications. IEEE J Emerg Sel Top Circuits Syst 8(4):693–707

    Google Scholar 

  6. Lowrance CJ, Lauf AP (2019) An active and incremental learning framework for the online prediction of link quality in robot networks. Eng Appl Artif Intell 77:197–211

    Google Scholar 

  7. Guo H, Li Y, Shang J, Mingyun G, Huang Y, Bing G (2017) Learning from class-imbalanced data: review of methods and applications. Expert Syst Appl 73:220–239

    Google Scholar 

  8. Nekooeimehr I, Lai-Yuen SK (2016) Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets. Expert Syst Appl 46:405–416

    Google Scholar 

  9. Jian C, Gao J, Ao Y (2016) A new sampling method for classifying imbalanced data based on support vector machine ensemble. Neurocomputing 193:115–122

    Google Scholar 

  10. Raj V, Magg S, Wermter S (2016) Towards effective classification of imbalanced data with convolutional neural networks. In: IAPR workshop on artificial neural networks in pattern recognition. Springer, pp 150–162

  11. Khan SH, Hayat M, Bennamoun M, Sohel FA, Togneri R (2018) Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Trans Neural Netw Learn Syst 29(8):3573–3587

    Google Scholar 

  12. García S, Zhang Z-L, Altalhi A, Alshomrani S, Herrera F (2018) Dynamic ensemble selection for multiclass imbalanced datasets. Inf Sci 445:22–37

    Google Scholar 

  13. Zhang Z, Krawczyk B, Garcìa S, Rosales-Pérez A, Herrera F (2016) Empowering one-vs-one decomposition with ensemble learning for multi-class imbalanced data. Knowl Based Syst 106(C):251–263

    Google Scholar 

  14. Zhang ZL, Luo XG, González S, García S, Herrera F (2018) DRCW-ASEG: one-versus-one distance-based relative competence weighting with adaptive synthetic example generation for multi-class imbalanced datasets. Neurocomputing 285(12):176–187

    Google Scholar 

  15. Denil M, Trappenberg T (2010) Overlap versus imbalance. In: Canadian conference on artificial intelligence. Springer, pp 220–231

  16. Tang Y, Gao J (2007) Improved classification for problem involving overlapping patterns. IEICE Trans Inf Syst 90(11):1787–1795

    Google Scholar 

  17. Peng P, Wang J (2019) Wear particle classification considering particle overlapping. Wear 422(423):119–127

    Google Scholar 

  18. Liu CL (2006) Artificial neural networks in pattern recognition. In: Second IAPR workshop on artificial neural networks in pattern recognition (ANNPR 2006), pp 37–146

  19. Chowdhury SA, Stepanov EA, Danieli M et al (2019) Automatic classification of speech overlaps: feature representation and algorithms. Comput Speech Lang 55:145–167

    Google Scholar 

  20. Podder A, Latha N (2017) Data on overlapping brain disorders and emerging drug targets in human Dopamine Receptors Interaction Network. Data Br 12:277–286

    Google Scholar 

  21. López V, Fernández A, García S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci 250:113–141

    Google Scholar 

  22. García V, Sánchez J, Mollineda R (2007) An empirical study of the behavior of classifiers on imbalanced and overlapped data sets. In: Iberoamerican congress on pattern recognition. Springer, pp 397–406

  23. Prati RC, Batista GE, Monard MC (2004) Class imbalances versus class overlapping: an analysis of a learning system behavior. In: Mexican international conference on artificial intelligence. Springer, pp 312–321

  24. Yu Q, Hongye S, Guo L, Chu J (2011) A novel svm modeling approach for highly imbalanced and overlapping classification. Intell Data Anal 15(3):319–341

    Google Scholar 

  25. Alejo R, Valdovinos RM, García V, Horacio Pacheco-Sanchez J (2013) A hybrid method to face class overlap and class imbalance on neural networks and multi-class scenarios. Pattern Recogn Lett 34(4):380–388

    Google Scholar 

  26. Wasikowski M, Chen X (2010) Combating the small sample class imbalance problem using feature selection. IEEE Trans Knowl Data Eng 22(10):1388–1400

    Google Scholar 

  27. Xia S-Y, Xiong Z-Y, He Y, Li K, Dong L-M, Zhang M (2014) Relative density-based classification noise detection. Optik Int J Light Electron Opt 125(22):6829–6834

    Google Scholar 

  28. Sáez JA, Luengo J, Stefanowski J, Herrera F (2015) SMOTE–IPF: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf Sci 291:184–203

    Google Scholar 

  29. Orriols-Puig A, Bernadó-Mansilla E, Goldberg DE, Sastry K, Lanzi PL (2009) Face twise analysis of XCS for problems with class imbalances. IEEE Trans Evol Comput 13(5):1093–1119

    Google Scholar 

  30. Prati RC, Batista GE, Monard MC (2004) Learning with class skews and small disjuncts. In: Brazilian symposium on artificial intelligence. Springer, pp 296–306

  31. Adams N (2010) Dataset shift in machine learning. J R Stat Soc Ser A (Stat Soc) 173(1):274

    Google Scholar 

  32. Subbaswamy A, Saria S (2018) Counterfactual normalization: proactively addressing dataset shift and improving reliability using causal mechanisms. arXiv preprint arXiv:1808.03253

  33. Ho TK, Basu M (2002) Complexity measures of supervised classification problems. IEEE Trans Pattern Anal Mach Intell 24(3):1–300

    Google Scholar 

  34. Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newsl 6(1):20–29

    Google Scholar 

  35. Fernández A, del Jesus MJ, Herrera F (2015) Addressing overlapping in classification with imbalanced datasets: a first multi-objective approach for feature and instance selection. In: International conference on intelligent data engineering and automated learning. Springer, pp 36–44

  36. Alshomrani S, Bawakid A, Shim S-O, Fernández A, Herrera F (2015) A proposal for evolutionary fuzzy systems using feature weighting: dealing with overlapping in imbalanced datasets. Knowl Based Syst 73:1–17

    Google Scholar 

  37. Xiong H, Wu J, Liu L (2010) Classification with class overlapping: a systematic study. In: Proceedings of the 1st international conference on E-business intelligence (ICEBI2010). Atlantis Press

  38. Vorraboot P, Rasmequan S, Chinnasarn K, Lursinsap C (2015) Improving classification rate constrained to imbalanced data between overlapped and non-overlapped regions by hybrid algorithms. Neurocomputing 152:429–443

    Google Scholar 

  39. Weiss GM (2004) Mining with rarity: a unifying framework. ACM SIGKDD Explor Newsl 6(1):7–19

    Google Scholar 

  40. Vuttipittayamongkol P, Elyan E, Petrovski A, Jayne C (2018) Overlap-based undersampling for improving imbalanced data classification. In: International conference on intelligent data engineering and automated learning. Springer, Cham, 2018

  41. Liu N, Xing X, Li Y, Zhu A (2019) Sparse representation based image super-resolution on the knn based dictionaries. Opt Laser Technol 110:135–144

    Google Scholar 

  42. Kuzhali SE, Suresh DS (2018) Patch-based denoising with k-nearest neighbor and SVD for microarray images. In: Computer science on-line conference. Springer, pp 132–147

  43. Kriminger E, Principe JC, Lakshminarayan C (2012) Nearest neighbor distributions for imbalanced classification. In: The 2012 international joint conference on neural networks (IJCNN). IEEE, pp 1–5

  44. García V, Mollineda RA, Sánchez JS (2008) On the k-nn performance in a challenging scenario of imbalance and overlapping. Pattern Anal Appl 11(3–4):269–280

    MathSciNet  Google Scholar 

  45. Dubey H, Pudi V (2013) Class based weighted k-nearest neighbor over imbalance dataset. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 305–316

  46. Harshita P, Thakur GS (2016) A hybrid weighted nearest neighbor approach to mine imbalanced data. In: Proceedings of the international conference on data mining (DMIN). The Steering Committee of the World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp), p 106

  47. Harshita P, Thakur GS (2018) An improved fuzzy K-nearest neighbor algorithm for imbalanced data using adaptive approach. IETE J Res 2018:1–10

    Google Scholar 

  48. Zhang X, Li Y (2011) A positive-biased nearest neighbor algorithm for imbalanced classification. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 293–304

  49. Zhang X, Li Y, Kotagiri R, Lifang W, Tari Z, Cheriet M (2017) k rare-class nearest neighbor classification. Pattern Recogn 62:33–44

    Google Scholar 

  50. Mullick SS, Datta S, Das S (2018) Adaptive learning-based k-nearest neighbor classifiers with resilience to class imbalance. IEEE Trans Neural Netw Learn Syst 99:1–13

    MathSciNet  Google Scholar 

  51. Wang J, Neskovic P, Cooper LN (2007) Improving nearest neighbor rule with a simple adaptive distance measure. Pattern Recogn Lett 28(2):207–213

    Google Scholar 

  52. İnkaya T (2015) A density and connectivity based decision rule for pattern classification. Expert Syst Appl 42(2):906–912

    Google Scholar 

  53. Van Hulse J, Khoshgoftaar TM, Napolitano A (2010) A novel noise filtering algorithm for imbalanced data. In: 2010 9th international conference on machine learning and applications. IEEE, pp 9–14

  54. Kang Q, Chen XS, Li S, Zhou M (2017) A noise filtered under-sampling scheme for imbalanced classification. IEEE Trans Cybern 47(12):4263–4274

    Google Scholar 

  55. Schubert E, Sander J, Ester M, Kriegel HP, Xiaowei X (2017) Dbscan revisited, revisited: why and how you should (still) use dbscan. ACM Trans Database Syst (TODS) 42(3):19

    MathSciNet  Google Scholar 

  56. Czerniawski T, Sankaran B, Nahangi M, Haas C, Leite F (2017) 6D DBSCAN-based segmentation of building point clouds for planar object classification. Autom Constr 88:44–58

    Google Scholar 

  57. Das B, Krishnan NC, Cook DJ (2014) Handling imbalanced and overlapping classes in smart environments prompting dataset. In: Yada K (ed) Data mining for service. Springer, Berlin, pp 199–219

    Google Scholar 

  58. Alcalafdez J, Sánchez L, García S, Del Jesus MJ, Ventura S, Garrell JM, Otero J, Romero C, Bacardit J, Rivas VM (2009) KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput 13(3):307–318

    Google Scholar 

  59. Chawla NV, Bowyer KW, Hall LO, Philip Kegelmeyer W (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    MATH  Google Scholar 

  60. Zhang J, Shi H (2019) Kd-tree based efficient ensemble classification algorithm for imbalanced learning. In: 2019 international conference on machine learning, big data and business intelligence (MLBDBI), pp 203–207

  61. Lu Y, Cheung YM, Tang YY (2016) Hybrid sampling with bagging for class imbalance learning. In: Pacific-Asia conference on knowledge discovery and data mining. Springer International Publishing

  62. Seiffert C, Khoshgoftaar TM, Van Hulse J, Napolitano A (2010) RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans Syst Man Cybern Part A Syst Hum 40(1):185–197

    Google Scholar 

  63. Demšar J (2010) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  64. Iman RL, Davenport JM (1980) Approximations of the critical region of the fbietkan statistic. Commun Stat Theory Methods 9(6):571–595

    MATH  Google Scholar 

  65. Ester M, Kriegel H-P, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. KDD 96:226–231

    Google Scholar 

  66. Bader-El-Den M, Teitei E, Perry T (2019) Biased random forest for dealing with the class imbalance problem. IEEE Trans Neural Netw Learn Syst 30(7):2163–2172

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank the editor and reviewers for their useful comments and suggestions, which are of great help in improving the quality of the paper. This work is financially supported by the National Science Foundation of China (NSFC Proj. 71831006, 71801065, and 71771070), Zhejiang Provincial Natural Science Foundation of China under Grant No. LZ20G010001 and the Promotion China Ph.D Program from BMW Briliance Automotive.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xing-Gang Luo.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yuan, BW., Luo, XG., Zhang, ZL. et al. A novel density-based adaptive k nearest neighbor method for dealing with overlapping problem in imbalanced datasets. Neural Comput & Applic 33, 4457–4481 (2021). https://doi.org/10.1007/s00521-020-05256-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-020-05256-0

Keywords

Navigation