Abstract
The clustering over various granularities for high dimensional data in arbitrary shape is a challenge in data mining. In this paper Nearest Neighbors Absorbed First (NNAF) clustering algorithm is proposed to solve the problem based on the idea that the objects in the same cluster must be near. The main contribution includes: (1) A theorem of searching nearest neighbors (SNN) is proved. Based on it, SNN algorithms are proposed with time complexity O(n*log(n)) or O(n). They are much faster than the traditional searching nearest neighbors algorithm with O(n2). (2)The clustering algorithm of NNAF to process high dimensional data with arbitrary shape is proposed with time complexity O(n). The experiments show that the new algorithms can process efficiently high dimensional data in arbitrary shape with noisy. They can produce clustering over various granularities quickly with little domain knowledge.
Supported by Grant of National Science Foundation of China (60473071), and Specialized Research Fund for Doctoral Program by the Ministry of Education (SRFDP 20020610007).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Han, J.W., Kambr, M.: Data Mining Concepts and Techniques, pp. 145–176. Higher Education Press, Beijing (2001)
Kaufan, L., Rousseeuw, P.J.: Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons, New York (1990)
Guha, S., Rastogi, R., Shim, K.: CURE: an efficient clustering algorithm for large databases. In: Haas, L.M., Tiwary, A. (eds.) Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 73–84. ACM Press, Seattle (1998)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, E., Han, J.W., Fayyad, U.M. (eds.) Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, pp. 226–231. AAAI Press, Portland (1996)
Agrawal, R., Gehrke, J., Gunopolos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining application. In: Haas, L.M., Tiwary, A. (eds.) Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 94–105. ACM Press, Seattle (1998)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH:An Efficient Data Clustering Method for Very Large Database, Technical Report, Computer Sciences Dept., Univ.of Wisconsin-Madison (1995)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: Jagadish, H.V., Mumick, I.S. (eds.) Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pp. 103–114. ACM Press, Quebec (1996)
Beyer, K.S., Goldstein, J., Ramakrishnan, R., et al.: When is ‘nearest neighbor’ meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)
Karypis, G., Han, E.H., Kumar, V.: CHAMELEON: a hierarchical clustering algorithm using dynamic modeling. IEEE Computer 32(8), 68–75 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hu, Jj., jie-Tang, C., Peng, J., Li, C., Yuan, Ca., Chen, Al. (2005). A Clustering Algorithm Based Absorbing Nearest Neighbors. In: Fan, W., Wu, Z., Yang, J. (eds) Advances in Web-Age Information Management. WAIM 2005. Lecture Notes in Computer Science, vol 3739. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11563952_67
Download citation
DOI: https://doi.org/10.1007/11563952_67
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29227-2
Online ISBN: 978-3-540-32087-6
eBook Packages: Computer ScienceComputer Science (R0)