Abstract
Locally Weighted Naïve Nayes (LWNB) method establishes a weighted Naïve Bayes model in different neighborhoods of each query point. LWNB, like other classification methods, is affected by class imbalance. The class imbalance problem is the case where the class variable has a skewed distribution and causes the classification algorithms to be biased towards the majority class. It is possible to overcome this problem with resampling approaches such as undersampling and oversampling. Resampling on the data set may not reflect correctly on local regions, since regions are assumed to be independent of outside. Therefore, local regions should be considered without outside interference. In this study, we proposed a novel resampling approach that is applicable for both undersampling and oversampling. We examined how the imbalance of the data set should be reflected in each local region and aimed to prevent the imbalance problem by resampling data in the local regions separately. In this method, we calculated the appropriate resampling rate and the number of neighbors for each local region based on the data imbalance rate and the resampling rate which can be decided by the researcher. The proposed approach was compared with the classical resampling approaches on 25 datasets that are frequently used in the literature and achieved promising results.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of data and material
The research only uses openly available datasets.
Code availability
Codes are available at https://github.com/fatihsaglam/Locally-Resampling.
References
Atkeson CG, Moore AW, Schaal S (1997) Locally weighted learning. Lazy learning 11–73
Cleveland WS, Devlin SJ (1988) Locally weighted regression: an approach to regression analysis by local fitting. J Am Stati Assoc 83(403):596–610
Naes T, Isaksson T, Kowalski B (1990) Locally weighted regression and scatter correction for near-infrared reflectance data. Anal Chem 62(7):664–673
Zhang X, Kano M, Li Y (2017) Locally weighted kernel partial least squares regression based on sparse nonlinear features for virtual sensing of nonlinear time-varying processes. Comput Chem Eng 104:164–171
Wei L et al (2020) Locally weighted moving regression: a non-parametric method for modeling nanofluid features of dynamic viscosity. Phys A Stat Mech Appl 550:124124
Wang Y, Xiang S, Pan C, Wang L, Meng G (2013) Level set evolution with locally linear classification for image segmentation. Pattern Recognit 46(6):1734–1746
Bevilacqua M, Marini F (2014) Local classification: Locally weighted-partial least squares-discriminant analysis (lw-pls-da). Anal chimica acta 838:20–30
Pan Z, Wang Y, Pan Y (2020) A new locally adaptive k-nearest neighbor algorithm based on discrimination class. Knowl Based Syst 204:106185
Yen HPH et al (2021) Locally weighted learning based hybrid intelligence models for groundwater potential mapping and modeling: A case study at gia lai province, vietnam. Geosci Front 12(5):101154
Tuyen TT et al (2021) Mapping forest fire susceptibility using spatially explicit ensemble models based on the locally weighted learning algorithm. Ecol Inf 63:101292
Jiang L, Cai Z, Zhang H, Wang D (2013) Naive bayes text classifiers: a locally weighted learning approach. J Exp Theor Artif Intell 25(2):273–286
Frank E, Hall M. & Pfahringer B (2012) Locally weighted naive bayes. arXiv preprint arXiv:1212.2487
I Tomek (1976) Two modifications of cnn. IEEE Trans syst man cybern. 1976 6 11: 769-772
Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybernet 3:408–421
Seiffert C, Khoshgoftaar TM, Van Hulse J, Napolitano A (2009) Rusboost: a hybrid approach to alleviating class imbalance. IEEE Trans Syst Man Cybernet Part A Syst Humans 40(1):185–197
Laurikkala J, Quaglini S, Barahona P, Andreassen S (2001) Improving identification of difficult small classes by balancing class distribution. In: Quaglini S, Barahona P, Andreassen S (eds) Artificial Intelligence in Medicine. Springer, Berlin, pp 63–66
Bach M, Werner A, Palt M (2019) The proposal of undersampling method for learning from imbalanced datasets. Proc Comput Sci 159:125–134
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intel Res 16:321–357
Han H, Wang WY, Mao BH, Huang DS, Zhang XP, Huang GB (2005) (eds) Borderline-smote: A new over-sampling method in imbalanced data sets learning. In: Huang D-S, Zhang X-P, Huang G-B (eds) Advances in Intelligent Computing. Springer, Berlin, pp 878–887
He H, Bai Y, Garcia EA. & Li S (2008) Unknown (ed.) Adasyn: Adaptive synthetic sampling approach for imbalanced learning. (ed.Unknown) In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), 1322–1328
Siriseriwan W, Sinapiromsaran K (2017) Adaptive neighbor synthetic minority oversampling technique under 1nn outcast handling. Songklanakarin J Sci Technol 39(5):565–576
Barua S, Islam MM, Yao X, Murase K (2012) Mwmote-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans Knowl Data Eng 26(2):405–425
Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2012) Dbsmote: density-based synthetic minority over-sampling technique. Appl Intel 36(3):664–684
Douzas G, Bacao F (2019) Geometric smote a geometrically enhanced drop-in replacement for smote. Inf sci 501:118–135
Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Ethical approval
The paper is not currently being considered for publication elsewhere. No human or animal involved in this research.
Consent to participate
There are no human or animal participants in the study.
Consent to publication
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sağlam, F., Cengiz, M.A. Local resampling for locally weighted Naïve Bayes in imbalanced data. Computing 106, 185–200 (2024). https://doi.org/10.1007/s00607-023-01219-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-023-01219-0