Abstract
Identification of noisy instances provides an effective solution to improve the predictive performance of machine learning algorithms. The presence of noise in a data set poses two major negative consequences: (i) a decrease in the classification accuracy (ii) an increase in the complexity of the induced model. Therefore, the removal of noisy instances can improve the performance of the induced models. However, noise identification can be especially challenging when learning complex functions which often contain outliers. To detect such noise, we present a novel approach: DRN for detecting instances with noise. In our approach, we ensemble a self-organizing map (SOM) with a classifier. DRN can effectively distinguish between outlier and noisy instances. We evaluate the performance of our proposed algorithm using five different classifiers (viz. J48, Naive Bayes, Support Vector Machine, \(k \)-Nearest Neighbor, Random Forest) and 10 benchmark data sets from the UCI machine learning repository. Experimental results show that DRN removes noisy instances effectively and achieves better accuracy than the existing state-of-the-art algorithm on various datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Han, J., Kamber, M., Pei, J.: Data Preprocessing. Data Mining, 3rd edn. The Morgan Kaufman Series in Data Management System (2012)
Libralon, G., Carvalho, A., Lorena, A.: Prepossessing for noise detection in gene expression classification data. J. Braz. Comput. Soc. 15(1), 3–11 (2009)
Zhu, X., Wu, X.: Class noise vs attribute noise: a quantitative study. Artif. Intell. Rev. 22(3), 177–210 (2004)
Gamberger, D., Lavrac, N., Groselj, C.: Experiments with noise filtering in a medical domain. In: International Conference of Machine Learning, pp. 143–151 (1999)
Farid, D., Zhang, L., Rahman, C., Hossain, M., Strachan, R.: Hybrid decision tree and Naive Bayes classifier for multitask classification task. Expert Syst. Appl. 41(4), 1937–1946 (2014)
Sluban, B., Gamberger, D., Lavrac, N.: Ensembe-based noise detection: noise ranking and visual performance evaluation. Data Min. Knowl. Discov. 28(2), 265–303 (2014)
Tang, W., Khosgoftaar, T.: Noise identification with the k-means algorithm. In: 16th IEEE International Conference on Tools with Artificial Intelligence, pp. 373–378 (2004)
Hulse, J., Khosgoftaar, T., Huang, H.: The pairwise attribute noise detection algorithm. Knowl. Inf. Syst. 11(2), 171–190 (2007)
Kohonen, T.: The self-organizing map. Proc. IEEE 78(9), 1464–1480 (1990). https://doi.org/10.1109/5.58325
Munoz, A., Muruzabal, J.: Self-organizing maps for outlier detection. Neurocomputing 18(1–3), 33–60 (1998)
Gupta, S., Gupta, A.: Dealing with noise problems in machine learning data-sets: a systematic review. Procedia Comput. Sci. 161, 466–474 (2019)
He, Z., Xu, X., Deng, S.: Discovering cluster-based local outliers. Pattern Recognit. Lett. 24(9–10), 1641–1650 (2003)
Yin, H., Dong, H., Li, Y.: A cluster based noise detection algorithm. In: 2009 First International Workshop on Database Technology and Applications, pp. 386–389 (2009)
Sarker, I., Kabir, M., Colman, A., Han, J.: An improved Naive Bayes classifier-based noise detection technique for classifying user phone call behavior. In: Australian Conference on Data Mining, pp. 72–85 (2017)
UCI Machine Learning Repository. https://archive.ics.uci.edu/. Accessed 1 Jan 2022
Acknowledgments
The authors thank the anonymous reviewers whose suggestions helped to clarify and improve our paper. This work was supported in part by the National Science Foundation under grant number OIA-1946231 and the Louisiana Board of Regents for the Louisiana Materials Design Alliance (LAMDA).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Hasan, R., Chu, CH.H. (2022). DRN: Detection and Removal of Noisy Instances with Self Organizing Map. In: El Yacoubi, M., Granger, E., Yuen, P.C., Pal, U., Vincent, N. (eds) Pattern Recognition and Artificial Intelligence. ICPRAI 2022. Lecture Notes in Computer Science, vol 13364. Springer, Cham. https://doi.org/10.1007/978-3-031-09282-4_40
Download citation
DOI: https://doi.org/10.1007/978-3-031-09282-4_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-09281-7
Online ISBN: 978-3-031-09282-4
eBook Packages: Computer ScienceComputer Science (R0)