Skip to main content

A New Fast Clustering Algorithm Based on Reference and Density

  • Conference paper
Advances in Web-Age Information Management (WAIM 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2762))

Included in the following conference series:

Abstract

Density-based clustering is a sort of clustering analysis methods, which can discover clusters with arbitrary shape and is insensitive to noise data. The efficiency of data mining algorithms is strongly needed with data becoming larger and larger. In this paper, we present a new fast clustering algorithm called CURD, which means Clustering Using References and Density. Its creativity is capturing the shape and extent of a cluster with references, and then it analyzes the data based on the references. CURD preserves the ability of density based clustering method’s good advantages, and it is much efficient because of its nearly linear time complexity, so it can be used in mining very large databases. Both our theoretic analysis and experimental results confirm that CURD can discover clusters with arbitrary shape and is insensitive to noise data. In the meanwhile, its executing efficiency is much higher than R{star}}-tree based DBSCAN algorithm.

Supported by the National High Technology Development 863 Program of China under Grant No. 2002AA4Z3440; the Foundation of the innovation research institute of PKU-IBM; the National Grand Fundamental Research 973 Program of China under Grant No. G1999032705.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Han, J., Kambr, M.: Data mining concepts and techniques, pp. 145–176. Morgan Kaufmann Publisher, San Francisco (2000)

    Google Scholar 

  2. Anderberg, M.R.: Cluster analysis for applications. Academic Press, London (1973)

    MATH  Google Scholar 

  3. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of International Conference on Knowledge Discovery and Data Mining (1996)

    Google Scholar 

  4. Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: An efficient data clustering method for very large databases. In: Proceedings of ACM SIGMOD International Conference on Management of Data, Montreal, Canada, pp. 103–114 (1996)

    Google Scholar 

  5. Guha, S., Rastogi, R., Shim, K.: CURE: An efficient clustering algorithm for large databases. In: Proceedings of ACM SIGMOD International Conference on Management of Data, New York, pp. 73–84 (1998)

    Google Scholar 

  6. Aggrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. In: Proceedings of ACM SIGMOD International Conference on Management of Data, Seattle, Washington, pp. 94–105 (1998)

    Google Scholar 

  7. Goil, S., Nagesh, H., Choundhary, A.: MAFIA: Efficient and scalable Subspace Clustering for Very Large Data Sets. Technical Report Number CPDC-TR-9906- 019, Center for Parallel and Distributed Computing, Northwestern University (1999)

    Google Scholar 

  8. Hinneburg, A., Keim, D.A.: Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering. In: Proceedings of the 25th VLDB Conference, Edinburgh, Scotland (1999)

    Google Scholar 

  9. Guha, S., Rastogi, R., Rock, S.K.: A Robust Clustering Algorithm for Categorical Attributes. In: Proceedings of the International Conference on Data Engineering, Sydney, Australia, pp. 512–521 (1999)

    Google Scholar 

  10. George, K., Han, E.-H., Kumar, V.: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling. IEEE Computer, 68–75 (1999)

    Google Scholar 

  11. Estivill-Castro, V., Lee, I.: AMOEBA: Hierarchical Clustering Based on Spatial Proximity Using Delaunay Diagram. In: Proceedings of the 9th International Symposium on Spatial Data Handling, Beijing, China (2000)

    Google Scholar 

  12. Nanopoulos, A., Theodoridis, Y., Manolopoulos, Y.: C2P: Clustering based on Closest Pairs. In: Proceedings of the 27th VLDB Conference, Roma, Italy (2001)

    Google Scholar 

  13. Berchtold, S., Bohm, C., Kriegel, H.-P.: The pyramid-technique: Towards breaking the curse of dimensionality. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 142–153 (1998)

    Google Scholar 

  14. Yu, C., Ooi, B.C., Tan, K.-L., Jagadish, H.V.: Indexing the Distance: An Efficient Method to KNN Processing. In: Proceedings of 27th VLDB Conference, Roma, Italy (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ma, S., Wang, T., Tang, S., Yang, D., Gao, J. (2003). A New Fast Clustering Algorithm Based on Reference and Density. In: Dong, G., Tang, C., Wang, W. (eds) Advances in Web-Age Information Management. WAIM 2003. Lecture Notes in Computer Science, vol 2762. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45160-0_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-45160-0_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40715-7

  • Online ISBN: 978-3-540-45160-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics