A Clustering Algorithm Based Absorbing Nearest Neighbors

Hu, Jian-jun; jie-Tang, Chang; Peng, Jing; Li, Chuan; Yuan, Chang-an; Chen, An-long

doi:10.1007/11563952_67

Jian-jun Hu¹⁹,
Chang jie-Tang¹⁹,
Jing Peng^19,20,
Chuan Li¹⁹,
Chang-an Yuan^19,21 &
…
An-long Chen¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3739))

Included in the following conference series:

International Conference on Web-Age Information Management

766 Accesses
4 Citations

Abstract

The clustering over various granularities for high dimensional data in arbitrary shape is a challenge in data mining. In this paper Nearest Neighbors Absorbed First (NNAF) clustering algorithm is proposed to solve the problem based on the idea that the objects in the same cluster must be near. The main contribution includes: (1) A theorem of searching nearest neighbors (SNN) is proved. Based on it, SNN algorithms are proposed with time complexity O(n*log(n)) or O(n). They are much faster than the traditional searching nearest neighbors algorithm with O(n2). (2)The clustering algorithm of NNAF to process high dimensional data with arbitrary shape is proposed with time complexity O(n). The experiments show that the new algorithms can process efficiently high dimensional data in arbitrary shape with noisy. They can produce clustering over various granularities quickly with little domain knowledge.

Supported by Grant of National Science Foundation of China (60473071), and Specialized Research Fund for Doctoral Program by the Ministry of Education (SRFDP 20020610007).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Han, J.W., Kambr, M.: Data Mining Concepts and Techniques, pp. 145–176. Higher Education Press, Beijing (2001)
Google Scholar
Kaufan, L., Rousseeuw, P.J.: Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons, New York (1990)
Google Scholar
Guha, S., Rastogi, R., Shim, K.: CURE: an efficient clustering algorithm for large databases. In: Haas, L.M., Tiwary, A. (eds.) Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 73–84. ACM Press, Seattle (1998)
Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, E., Han, J.W., Fayyad, U.M. (eds.) Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, pp. 226–231. AAAI Press, Portland (1996)
Google Scholar
Agrawal, R., Gehrke, J., Gunopolos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining application. In: Haas, L.M., Tiwary, A. (eds.) Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 94–105. ACM Press, Seattle (1998)
Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH:An Efficient Data Clustering Method for Very Large Database, Technical Report, Computer Sciences Dept., Univ.of Wisconsin-Madison (1995)
Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: Jagadish, H.V., Mumick, I.S. (eds.) Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pp. 103–114. ACM Press, Quebec (1996)
Chapter Google Scholar
Beyer, K.S., Goldstein, J., Ramakrishnan, R., et al.: When is ‘nearest neighbor’ meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)
Chapter Google Scholar
Karypis, G., Han, E.H., Kumar, V.: CHAMELEON: a hierarchical clustering algorithm using dynamic modeling. IEEE Computer 32(8), 68–75 (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Sichuan University, Chengdu, 610064, China
Jian-jun Hu, Chang jie-Tang, Jing Peng, Chuan Li, Chang-an Yuan & An-long Chen
Department of Science and Technology, Chengdu Public Security Bureau, Chengdu, 610017, China
Jing Peng
Department of Information & Technology, Guangxi Teachers Education University, Nanning, Guangxi, 530001, China
Chang-an Yuan

Authors

Jian-jun Hu
View author publications
You can also search for this author in PubMed Google Scholar
Chang jie-Tang
View author publications
You can also search for this author in PubMed Google Scholar
Jing Peng
View author publications
You can also search for this author in PubMed Google Scholar
Chuan Li
View author publications
You can also search for this author in PubMed Google Scholar
Chang-an Yuan
View author publications
You can also search for this author in PubMed Google Scholar
An-long Chen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Edinburgh & Bell Laboratories,
Wenfei Fan
College of Computer Science, Zhejiang University, 310027, Hangzhou, Zhejiang, China
Zhaohui Wu
Dept. of E. I. E, Huazhong University of Science and Technology, Wuhan, China
Jun Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hu, Jj., jie-Tang, C., Peng, J., Li, C., Yuan, Ca., Chen, Al. (2005). A Clustering Algorithm Based Absorbing Nearest Neighbors. In: Fan, W., Wu, Z., Yang, J. (eds) Advances in Web-Age Information Management. WAIM 2005. Lecture Notes in Computer Science, vol 3739. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11563952_67

Download citation

DOI: https://doi.org/10.1007/11563952_67
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29227-2
Online ISBN: 978-3-540-32087-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics