Abstract
This paper presents a novel clustering algorithm based on clustering coefficient. It includes two steps: First, k-nearest-neighbor method and correlation convergence are employed for a preliminary clustering. Then, the results are further split and merged according to intra-class and inter-class concentration degree based on clustering coefficient. The proposed method takes correlation between each other in a cluster into account, thereby improving the weakness existed in previous methods that consider only the correlation with center or core data element. Experiments show that our algorithm performs better in clustering compact data elements as well as forming some irregular shape clusters. It is more suitable for applications with little prior knowledge, e.g. hotspots discovery.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Sarwar, B.M., Karypis, G., Konstan, J., et al.: Recommender systems for large-scale e-commerce: Scalable neighborhood formation using clustering. In: Proceedings of the Fifth International Conference on Computer and Information Technology (January 2002)
Roy, P.J., Stuart, J.M., Lund, J., et al.: Chromosomal clustering of muscle-expressed genes in Caenorhabditis elegans. Nature 418(6901), 975–979 (2002)
Momtazi, S., Sameti, H., Bahrani, M., et al.: A POS-based fuzzy word clustering algorithm for continuous speech recognition systems. In: 9th International Symposium on Signal Processing and Its Applications, ISSPA 2007, pp. 1–4. IEEE (2007)
Momtazi, S., Klakow, D.: A word clustering approach for language model-based sentence retrieval in question answering systems. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 1911–1914. ACM (1914)
Yasukawa, M., Yokoo, H.: Term Clustering based on Lengths and Co-occurrences of Terms. ADCS 2009, 126 (2009)
Dhillon, I.S., Mallela, S., Kumar, R.: Enhanced word clustering for hierarchical text classification. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 191–200. ACM (2002)
Wu, Y.C., Yang, J.C.: A Weighted Cluster-based Chinese Text Categorization Approach: Incorporating with Word Clusters. In: 2012 IIAI International Conference on Advanced Applied Informatics (IIAIAAI), pp. 279–282. IEEE (2012)
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys (CSUR) 34(1), 1–47 (2002)
Karypis, G., Han, E.H., Kumar, V.: Chameleon: Hierarchical clustering using dynamic modeling. Computer 32(8), 68–75 (1999)
Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recognition Letters 31(8), 651–666 (2010)
Ester, M., Kriegel, H.P., Sander, J., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, vol. 96, pp. 226–231 (1996)
Park, H.S., Jun, C.H.: A simple and fast algorithm for K-medoids clustering. Expert Systems with Applications 36(2), 3336–3341 (2009)
Ertöz, L., Steinbach, M., Kumar, V.: Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: SDM (2003)
Guha, S., Rastogi, R., Shim, K.: CURE: an efficient clustering algorithm for large databases. ACM SIGMOD Record 27(2), 73–84 (1998)
Guha, S., Rastogi, R., Shim, K.: ROCK: A robust clustering algorithm for categorical attributes. In: Proceedings of the 15th International Conference on Data Engineering, pp. 512–521. IEEE (1999)
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8), 888–905 (2000)
Liu, Y., Nan, W., Zheng, T.: Spectral clustering for Chinese word. In: Sixth International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2009, vol. 1, pp. 529–533. IEEE (2009)
Vesanto, J., Alhoniemi, E.: Clustering of the self-organizing map. IEEE Transactions on Neural Networks 11(3), 586–600 (2000)
Soffer, S.N., Vázquez, A.: Network clustering coefficient without degree-correlation biases. Physical Review E 71(5), 057101 (2005)
Guest, P.G., Guest, P.G.: Numerical methods of curve fitting. Cambridge University Press (2012)
Aizawa, A.: An information-theoretic perspective of tf–idf measures. Information Processing & Management 39(1), 45–65 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Zhong, M., Ding, Z., Sun, H., Wang, P. (2014). A Self-learning Clustering Algorithm Based on Clustering Coefficient. In: Benatallah, B., Bestavros, A., Manolopoulos, Y., Vakali, A., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2014. WISE 2014. Lecture Notes in Computer Science, vol 8786. Springer, Cham. https://doi.org/10.1007/978-3-319-11749-2_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-11749-2_6
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11748-5
Online ISBN: 978-3-319-11749-2
eBook Packages: Computer ScienceComputer Science (R0)