Abstract
Reducing the impact of outliers is an essential issue in machine learning, including clustering. There are two main approaches to reducing the impact of outliers: one is to build robust models, and the other is to remove outliers through preprocessing. In this paper, we propose a new noise clustering method that combines noise clustering, which builds a model robust to outliers, and local outlier factor, which removes outliers as a preprocessing step. The proposed method is an optimization problem of noise clustering with a weighting of dissimilarities by LOF. Numerical experiments were conducted using four artificial datasets to verify the effectiveness of the proposed method. In the experiments, the proposed method was compared with k-medoids clustering, DBSCAN, and noise clustering. The results show that the proposed method yields good results regarding both clustering performances and detecting outliers. The guideline for determining k and \(\varepsilon \) among the three parameters D, k, and \(\varepsilon \) required by the proposed method was also suggested.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, Morgan Kaufmann, (2012)
Miyamoto, S., Ichihashi, H., Honda, K.: Algorithms for Fuzzy Clustering. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78737-2
Jain, A.K.: Data clustering: 50 years beyond \(K\)-means. Pattern Recogn. Lett. 31(8), 651–666 (2010). https://doi.org/10.1016/j.patrec.2009.09.011
Askari, S.: Fuzzy c-means clustering algorithm for data with unequal cluster sizes and contaminated with noise and outliers: review and development. Expert Syst. Appl. 165(1), 113856 (2021). https://doi.org/10.1016/j.eswa.2020.113856
Davé, R.N.: Characterization and detection of noise in clustering. Pattern Recogn. Lett. 12, 657–664 (1991). https://doi.org/10.1016/0167-8655(91)90002-4
Krishnapuram, R., Keller, J.M.: A possibilistic approach to clustering. IEEE Trans. Fuzzy Syst. 1(2), 98–110 (1993). https://doi.org/10.1109/91.227387
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD 1996), pp. 226–231 (1996). https://doi.org/10.1145/335191.335388
Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. ACM SIGMOD Rec. 29, 93–104 (2000). https://doi.org/10.1145/335191.335388
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)
Rayana, S.: ODDS Library, Stony Brook, NY: Stony Brook University, Department of Computer Science (2016). http://odds.cs.stonybrook.edu
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hamasuna, Y., Mori, Y. (2023). A Novel Noise Clustering Based on Local Outlier Factor. In: Honda, K., Le, B., Huynh, VN., Inuiguchi, M., Kohda, Y. (eds) Integrated Uncertainty in Knowledge Modelling and Decision Making. IUKM 2023. Lecture Notes in Computer Science(), vol 14376. Springer, Cham. https://doi.org/10.1007/978-3-031-46781-3_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-46781-3_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46780-6
Online ISBN: 978-3-031-46781-3
eBook Packages: Computer ScienceComputer Science (R0)