A New Fast Clustering Algorithm Based on Reference and Density

Ma, Shuai; Wang, TengJiao; Tang, ShiWei; Yang, DongQing; Gao, Jun

doi:10.1007/978-3-540-45160-0_21

Shuai Ma⁷,
TengJiao Wang⁷,
ShiWei Tang^7,8,
DongQing Yang⁷ &
…
Jun Gao⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2762))

Included in the following conference series:

International Conference on Web-Age Information Management

502 Accesses
10 Citations

Abstract

Density-based clustering is a sort of clustering analysis methods, which can discover clusters with arbitrary shape and is insensitive to noise data. The efficiency of data mining algorithms is strongly needed with data becoming larger and larger. In this paper, we present a new fast clustering algorithm called CURD, which means Clustering Using References and Density. Its creativity is capturing the shape and extent of a cluster with references, and then it analyzes the data based on the references. CURD preserves the ability of density based clustering method’s good advantages, and it is much efficient because of its nearly linear time complexity, so it can be used in mining very large databases. Both our theoretic analysis and experimental results confirm that CURD can discover clusters with arbitrary shape and is insensitive to noise data. In the meanwhile, its executing efficiency is much higher than R^{star}}-tree based DBSCAN algorithm.

Supported by the National High Technology Development 863 Program of China under Grant No. 2002AA4Z3440; the Foundation of the innovation research institute of PKU-IBM; the National Grand Fundamental Research 973 Program of China under Grant No. G1999032705.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Han, J., Kambr, M.: Data mining concepts and techniques, pp. 145–176. Morgan Kaufmann Publisher, San Francisco (2000)
Google Scholar
Anderberg, M.R.: Cluster analysis for applications. Academic Press, London (1973)
MATH Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of International Conference on Knowledge Discovery and Data Mining (1996)
Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: An efficient data clustering method for very large databases. In: Proceedings of ACM SIGMOD International Conference on Management of Data, Montreal, Canada, pp. 103–114 (1996)
Google Scholar
Guha, S., Rastogi, R., Shim, K.: CURE: An efficient clustering algorithm for large databases. In: Proceedings of ACM SIGMOD International Conference on Management of Data, New York, pp. 73–84 (1998)
Google Scholar
Aggrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. In: Proceedings of ACM SIGMOD International Conference on Management of Data, Seattle, Washington, pp. 94–105 (1998)
Google Scholar
Goil, S., Nagesh, H., Choundhary, A.: MAFIA: Efficient and scalable Subspace Clustering for Very Large Data Sets. Technical Report Number CPDC-TR-9906- 019, Center for Parallel and Distributed Computing, Northwestern University (1999)
Google Scholar
Hinneburg, A., Keim, D.A.: Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering. In: Proceedings of the 25th VLDB Conference, Edinburgh, Scotland (1999)
Google Scholar
Guha, S., Rastogi, R., Rock, S.K.: A Robust Clustering Algorithm for Categorical Attributes. In: Proceedings of the International Conference on Data Engineering, Sydney, Australia, pp. 512–521 (1999)
Google Scholar
George, K., Han, E.-H., Kumar, V.: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling. IEEE Computer, 68–75 (1999)
Google Scholar
Estivill-Castro, V., Lee, I.: AMOEBA: Hierarchical Clustering Based on Spatial Proximity Using Delaunay Diagram. In: Proceedings of the 9th International Symposium on Spatial Data Handling, Beijing, China (2000)
Google Scholar
Nanopoulos, A., Theodoridis, Y., Manolopoulos, Y.: C2P: Clustering based on Closest Pairs. In: Proceedings of the 27th VLDB Conference, Roma, Italy (2001)
Google Scholar
Berchtold, S., Bohm, C., Kriegel, H.-P.: The pyramid-technique: Towards breaking the curse of dimensionality. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 142–153 (1998)
Google Scholar
Yu, C., Ooi, B.C., Tan, K.-L., Jagadish, H.V.: Indexing the Distance: An Efficient Method to KNN Processing. In: Proceedings of 27th VLDB Conference, Roma, Italy (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Peking University, Beijing, 100871, China
Shuai Ma, TengJiao Wang, ShiWei Tang, DongQing Yang & Jun Gao
National Laboratory on Machine Perception, Peking University, Beijing, 100871, China
ShiWei Tang

Authors

Shuai Ma
View author publications
You can also search for this author in PubMed Google Scholar
TengJiao Wang
View author publications
You can also search for this author in PubMed Google Scholar
ShiWei Tang
View author publications
You can also search for this author in PubMed Google Scholar
DongQing Yang
View author publications
You can also search for this author in PubMed Google Scholar
Jun Gao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Wright State University, USA
Guozhu Dong
School of Computer Science, Sichuan University, 610065, Chengdu, China
Changjie Tang
UNC Chapel Hill,
Wei Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ma, S., Wang, T., Tang, S., Yang, D., Gao, J. (2003). A New Fast Clustering Algorithm Based on Reference and Density. In: Dong, G., Tang, C., Wang, W. (eds) Advances in Web-Age Information Management. WAIM 2003. Lecture Notes in Computer Science, vol 2762. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45160-0_21

Download citation

DOI: https://doi.org/10.1007/978-3-540-45160-0_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40715-7
Online ISBN: 978-3-540-45160-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics