Abstract
Clustering results often critically depend on density and similarity, and its complexity often changes along with the augment of sample dimensionality. In this paper, we refer to classical shared nearest neighbor clustering algorithm (SNN), and provide a high-dimensional shared nearest neighbor clustering algorithm (DSNN). This DSNN is evaluated using a freeway traffic data set, and experiment results show that DSNN settles many disadvantages in SNN algorithm, such as outliers, statistic, core points, computation complexity etc, also attains better clustering results on multi-dimensional data set than SNN algorithm.
This work is supported by the National Natural Science Foundation of China (60205007) , Natural Science Foundation of Guangdong Province (031558,04300462), Research Foundation of National Science and Technology Plan Project (2004BA721A02), Research Foundation of Science and Technology Plan Project in Guangdong Province (2003C50118) and Research Foundation of Science and Technology Plan Project in Guangzhou City(2002Z3-E0017).
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Clarke, F., Ekeland, I.: Nonlinear oscillations and boundary-value problems for Hamiltonian systems. Arch. Rat. Mech. Anal. 78, 315–333 (1982)
Guha, S., Rastogi, R., Shim, K.: Cure: An efficient clustering algorithm for large databases. In: Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data(SIGMOD 1998), pp. 73–84 (1998)
Ertoz, L., Steinbach, M., Kumar, V.: Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data. In: Proceedings of Third SIAM International Conference on Data Mining, San Francisco, CA, USA (May 2003)
Ertoz, L., Steinbach, M., Kumar, V.: A New Shared Nearest Neighbor Clustering Algorithm and its Applications. In: Workshop on Clustering High Dimensional Data and its Applications. Second SIAM International Conference on Data Mining, Arlington, VA, USA (2002)
Bay, S.D., Schwabacher, M.: Mining Distance-Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule. In: Conference on Knowledge Discovery in Data archive Proceedings of the ninth ACM SIGKDD International Conference (KDD), pp. 29–38 (2003)
Eskin, E., Arnold, A., Prerau, M., Portnov, L., Stolfo, S.: A framework for unsupervised anomaly detection: Detecting intrusions in unlabeled data. In: Applications of Data Mining in Computer Security, December 16, 10/320,259. Kluwer, Dordrecht (2002)
Strehl, A., Ghosh, J., Mooney, R.: Impact of Similarity Measures on Web-page Clustering. In: Proceedings of the 17th National Conference on Artificial Intelligence: Workshop of Artificial Intelligence for Web Search, pp. 58–64. AAAI/MIT Press, Cambridge (2000)
Tan, P.-N., Steinbach, M., Kumar, V., Klooster, S., Potter, C., Torregrosa, A.: Finding spatio-termporal patterns in earth science data. In: KDD Temporal Data Mining Workshop, San Francisco, California, USA (August 2001)
Steinbach, M., Tan, P.-N., Kumar, V., Klooster, S., Potter, C.: Temporal data mining for the discovery and analysis of ocean climate indices. In: Proceedings of the KDD Temporal Data Mining Workshop, Edmonton, Alberta, Canada (August 2002)
Shekhar, S., Lu, C.T., Chawla, S., Zhang, P.: Data Mining and Visualization of Twin-Cities Traffic Data. University of Minnesota Academic report (2001)
Kumar, V., Steinbach, M., Tan, P.-N.: Mining Scientific Data: Discovery of Patterns in the Global Climate System. In: PAKDD, May 7 (2002)
Asanobu, K.: Data mining for Typhoon Image Collection. Journal of Intelligent Information Systems 19(1), 25–41 (2002)
Tan, P.-N., Steinbach, M., Kumar, V., Klooster, S., Potter, C., Torregrosa, A.: Finding spatio-termporal patterns in earth science data. In: KDD Temporal Data Mining Workshop, San Francisco, California, USA (August 2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yin, J., Fan, X., Chen, Y., Ren, J. (2005). High-Dimensional Shared Nearest Neighbor Clustering Algorithm. In: Wang, L., Jin, Y. (eds) Fuzzy Systems and Knowledge Discovery. FSKD 2005. Lecture Notes in Computer Science(), vol 3614. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11540007_60
Download citation
DOI: https://doi.org/10.1007/11540007_60
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28331-7
Online ISBN: 978-3-540-31828-6
eBook Packages: Computer ScienceComputer Science (R0)