High-Dimensional Shared Nearest Neighbor Clustering Algorithm

Yin, Jian; Fan, Xianli; Chen, Yiqun; Ren, Jiangtao

doi:10.1007/11540007_60

Jian Yin²⁰,
Xianli Fan²⁰,
Yiqun Chen^20,21 &
…
Jiangtao Ren²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3614))

Included in the following conference series:

International Conference on Fuzzy Systems and Knowledge Discovery

1247 Accesses

Abstract

Clustering results often critically depend on density and similarity, and its complexity often changes along with the augment of sample dimensionality. In this paper, we refer to classical shared nearest neighbor clustering algorithm (SNN), and provide a high-dimensional shared nearest neighbor clustering algorithm (DSNN). This DSNN is evaluated using a freeway traffic data set, and experiment results show that DSNN settles many disadvantages in SNN algorithm, such as outliers, statistic, core points, computation complexity etc, also attains better clustering results on multi-dimensional data set than SNN algorithm.

This work is supported by the National Natural Science Foundation of China (60205007) , Natural Science Foundation of Guangdong Province (031558,04300462), Research Foundation of National Science and Technology Plan Project (2004BA721A02), Research Foundation of Science and Technology Plan Project in Guangdong Province (2003C50118) and Research Foundation of Science and Technology Plan Project in Guangzhou City(2002Z3-E0017).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Fast Heuristic k-means Algorithm Based on Nearest Neighbor Information

A Clustering Method Based on Improved Density Estimation and Shared Nearest Neighbors

Outlier Robust Geodesic K-means Algorithm for High Dimensional Data

References

Clarke, F., Ekeland, I.: Nonlinear oscillations and boundary-value problems for Hamiltonian systems. Arch. Rat. Mech. Anal. 78, 315–333 (1982)
Article MATH MathSciNet Google Scholar
Guha, S., Rastogi, R., Shim, K.: Cure: An efficient clustering algorithm for large databases. In: Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data(SIGMOD 1998), pp. 73–84 (1998)
Google Scholar
Ertoz, L., Steinbach, M., Kumar, V.: Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data. In: Proceedings of Third SIAM International Conference on Data Mining, San Francisco, CA, USA (May 2003)
Google Scholar
Ertoz, L., Steinbach, M., Kumar, V.: A New Shared Nearest Neighbor Clustering Algorithm and its Applications. In: Workshop on Clustering High Dimensional Data and its Applications. Second SIAM International Conference on Data Mining, Arlington, VA, USA (2002)
Google Scholar
Bay, S.D., Schwabacher, M.: Mining Distance-Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule. In: Conference on Knowledge Discovery in Data archive Proceedings of the ninth ACM SIGKDD International Conference (KDD), pp. 29–38 (2003)
Google Scholar
Eskin, E., Arnold, A., Prerau, M., Portnov, L., Stolfo, S.: A framework for unsupervised anomaly detection: Detecting intrusions in unlabeled data. In: Applications of Data Mining in Computer Security, December 16, 10/320,259. Kluwer, Dordrecht (2002)
Google Scholar
Strehl, A., Ghosh, J., Mooney, R.: Impact of Similarity Measures on Web-page Clustering. In: Proceedings of the 17^th National Conference on Artificial Intelligence: Workshop of Artificial Intelligence for Web Search, pp. 58–64. AAAI/MIT Press, Cambridge (2000)
Google Scholar
Tan, P.-N., Steinbach, M., Kumar, V., Klooster, S., Potter, C., Torregrosa, A.: Finding spatio-termporal patterns in earth science data. In: KDD Temporal Data Mining Workshop, San Francisco, California, USA (August 2001)
Google Scholar
Steinbach, M., Tan, P.-N., Kumar, V., Klooster, S., Potter, C.: Temporal data mining for the discovery and analysis of ocean climate indices. In: Proceedings of the KDD Temporal Data Mining Workshop, Edmonton, Alberta, Canada (August 2002)
Google Scholar
Shekhar, S., Lu, C.T., Chawla, S., Zhang, P.: Data Mining and Visualization of Twin-Cities Traffic Data. University of Minnesota Academic report (2001)
Google Scholar
Kumar, V., Steinbach, M., Tan, P.-N.: Mining Scientific Data: Discovery of Patterns in the Global Climate System. In: PAKDD, May 7 (2002)
Google Scholar
Asanobu, K.: Data mining for Typhoon Image Collection. Journal of Intelligent Information Systems 19(1), 25–41 (2002)
Article Google Scholar
Tan, P.-N., Steinbach, M., Kumar, V., Klooster, S., Potter, C., Torregrosa, A.: Finding spatio-termporal patterns in earth science data. In: KDD Temporal Data Mining Workshop, San Francisco, California, USA (August 2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Zhongshan University, Guangdong, 510275, P.R. China
Jian Yin, Xianli Fan, Yiqun Chen & Jiangtao Ren
Guangdong Institute of Education, Guangdong, P.R. China
Yiqun Chen

Authors

Jian Yin
View author publications
You can also search for this author in PubMed Google Scholar
Xianli Fan
View author publications
You can also search for this author in PubMed Google Scholar
Yiqun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jiangtao Ren
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Electrical and Electronic Engineering, Nanyang Technological University, Block S1, Nanyang Avenue, 639798, Singapore
Lipo Wang
Honda Research Institute Europe GmbH, Offenbach/Main, Germany
Yaochu Jin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yin, J., Fan, X., Chen, Y., Ren, J. (2005). High-Dimensional Shared Nearest Neighbor Clustering Algorithm. In: Wang, L., Jin, Y. (eds) Fuzzy Systems and Knowledge Discovery. FSKD 2005. Lecture Notes in Computer Science(), vol 3614. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11540007_60

Download citation

DOI: https://doi.org/10.1007/11540007_60
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28331-7
Online ISBN: 978-3-540-31828-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics