Density-Based Multiscale Analysis for Clustering in Strong Noise Settings

Zhang, Tiantian; Yuan, Bo

doi:10.1007/978-3-319-63004-5_3

Density-Based Multiscale Analysis for Clustering in Strong Noise Settings

Tiantian Zhang¹⁶ &
Bo Yuan¹⁶

Conference paper
First Online: 09 July 2017

1490 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10400))

Abstract

Finding clustering patterns in data is challenging when clusters can be of arbitrary shapes and the data contains high percentage (e.g., 80%) of noise. This paper presents a novel technique named density-based multiscale analysis for clustering (DBMAC) that can conduct noise-robust clustering without any strict assumption on the shapes of clusters. Firstly, DBMAC calculates the r-neighborhood statistics with different r (radius) values. Next, instead of trying to find a single optimal r value, a set of radius values appropriate for separating “clustered” objects and “noisy” objects is identified, using a formal statistical method for multimodality test. Finally, the classical DBSCAN is employed to perform clustering on the subset of data with significantly less amount of noise. Experiment results confirm that DBMAC is superior to classical DBSCAN in strong noise settings and also outperforms the latest technique SkinnyDip when the data contains arbitrarily shaped clusters.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall Advanced Reference Series: Computer Science. Prentice Hall College Div (1988)
Google Scholar
Do, C.B., Batzoglou, S.: What is the expectation maximization algorithm. Nat. Biotechnol. 26(8), 897–899 (2008)
Article Google Scholar
Zelnik-Manor, L., Perona, P.: Self-tuning spectral clustering. Adv. Neural. Inf. Process. Syst. 17, 1601–1608 (2004)
Google Scholar
Ben-David, S., Haghtalab, N.: Clustering in the presence of background noise. In: Proceedings of the 31st International Conference on Machine Learning, vol. 32, pp. 280–288 (2014)
Google Scholar
Murtagh, F., Raftery, A.E.: Fitting straight lines to point patterns. Pattern Recogn. 17(5), 479–483 (1984)
Article Google Scholar
Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49(3), 803–821 (1993)
Article MathSciNet MATH Google Scholar
Dave, R.N.: Characterization and detection of noise in clustering. Pattern Recogn. Lett. 12(11), 657–664 (1991)
Article Google Scholar
Cuesta-Albertos, J.A., Gordaliza, A., Matran, C.: Trimmed k-means: an attempt to robustifyquantizers. Ann. Stat. 25(2), 553–576 (1997)
Article MATH Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, vol. 96, no. 34, pp. 226–231 (1996)
Google Scholar
Ertöz, L., Steinbach, M., Kumar, V.: Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of the 3rd SIAM International Conference on Data Mining, vol. 112, pp. 47–58 (2003)
Google Scholar
Böhm, C., Plant, C., Shao, J., Yang, Q.: Clustering by synchronization. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 583–592 (2010)
Google Scholar
Goebl, S., He, X., Plant, C., Böhm, C.: Finding the optimal subspace for clustering. In: IEEE International Conference on Data Mining, pp. 130–139 (2014)
Google Scholar
Dasgupta, A., Raftery, A.E.: Detecting features in spatial point processes with clutter via model-based clustering. J. Am. Stat. Assoc. Theory Methods 93(441), 294–302 (1998)
Article MATH Google Scholar
Wong, W.K., Moore, A.: Efficient algorithms for non-parametric clustering with clutter. In: Proceedings of the 34th Interface Symposium, vol. 34, pp. 541–553 (2002)
Google Scholar
Cuevas, A., Febrero, M., Fraiman, R.: Estimating the number of clusters. Can. J. Stat. 28(2), 367–382 (2000)
Article MathSciNet MATH Google Scholar
Li, J., Huang, X., Selke, C., Yong, J.: A fast algorithm for finding correlation clusters in noise data. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS, vol. 4426, pp. 639–647. Springer, Heidelberg (2007). doi:10.1007/978-3-540-71701-0_68
Chapter Google Scholar
Maurus, S., Plant, C.: Skinny-dip: clustering in a sea of noise. In: Proceedings of the 22nd International Conference on Knowledge Discovery and Data Mining, pp. 1055–1064 (2016)
Google Scholar
Hartigan, J.A., Hartigan, P.M.: The dip test of unimodality. Ann. Stat. 13(1), 70–84 (1985)
Article MathSciNet MATH Google Scholar
Strehl, A., Ghosh, J.: Cluster ensembles-a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)
MathSciNet MATH Google Scholar
Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: is a correction for chance necessary? In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 1073–1080 (2009)
Google Scholar
Guha, S., Rastogi, R., Shim, K.: CURE: an efficient clustering algorithm for large databases. ACM SIGMOD Rec. Int. Conf. Manag. Data 27(2), 73–84 (1998)
Article MATH Google Scholar
Chaoji, V., Al Hasan, M., Salem, S., Zaki, M.J.: SPARCL: efficient and effective shape-based clustering. In: IEEE International Conference on Data Mining, pp. 93–102 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Intelligent Computing Lab, Division of Informatics, Graduate School at Shenzhen, Tsinghua University, Shenzhen, 518055, People’s Republic of China
Tiantian Zhang & Bo Yuan

Authors

Tiantian Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Bo Yuan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bo Yuan .

Editor information

Editors and Affiliations

La Trobe University, Melbourne, Australia
Wei Peng
La Trobe Business School, La Trobe University, Bundoora, Victoria, Australia
Damminda Alahakoon
RMIT University, Melbourne, Australia
Xiaodong Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, T., Yuan, B. (2017). Density-Based Multiscale Analysis for Clustering in Strong Noise Settings. In: Peng, W., Alahakoon, D., Li, X. (eds) AI 2017: Advances in Artificial Intelligence. AI 2017. Lecture Notes in Computer Science(), vol 10400. Springer, Cham. https://doi.org/10.1007/978-3-319-63004-5_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-63004-5_3
Published: 09 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63003-8
Online ISBN: 978-3-319-63004-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics