Abstract
A novel clustering method based on spectral clustering theory and spectral cut standard is proposed via analyzing the characteristics of short text and the defects of the existing clustering algorithms. First of all, a weighted undirected graph is created according to spectral clustering theory, similarity between node and node is calculated on graph, and a symmetrical documents similarity matrix is constructed, which provides all information for the clustering algorithm. Inspired by Greedy strategy, we utilize prim to develop PrimMAE algorithm for the purpose of partitioning graph into two parts, in which RMcut is termination condition of partitioning process, and then it is fed into CASC algorithm to cut the documents set iteratively. Ultimately, high quality clustering results demonstrate the effectiveness of the new clustering algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
He, H., Chen, B., Xu, W.: Short text feature extraction and clustering for web topic mining. In: Proceedings of IEEE 3rd International Conference on Semantics Knowledge and Grid (SKG 2007), pp. 382–385 (2007)
Sun, Q., Wang, Q., Qiao, H.: The algorithm of short message hot topic detection based on feature. Inf. Technol. J. 8(2), 236–240 (2009)
Tang, J., Wang, X., Gao, H., et al.: Enriching short text representation in microblog for clustering. Front. Comput. Sci. 6(1), 88–101 (2012)
Wang, L., Jia, Y., Han, W.: Instant message clustering based on extended vector space model. In: Kang, L., Liu, Y., Zeng, S. (eds.) ISICA 2007. LNCS, vol. 4683, pp. 435–443. Springer, Heidelberg (2007)
Yin, J., Wang, J.: A dirichlet multinomial mixture model-based approach for short text clustering. In: SIGKDD, pp. 233–242. ACM (2014)
Peng, J., Yang, D.Q., Tang, S.W.: A novel text clustering algorithm based on inner product space model of semantic. Chin. J. Comput. 30(8), 1354–1363 (2007)
Xing, X.S., Pan, J., Jiao, L.C.: A novel K-means clustering based on the immune programming algorithm. Chin. J. Comput. 26(5), 605–610 (2003)
Wang, Y., Wu, L.H., Shao, H.Y.: Clusters merging method for short texts clustering. Open J. Soc. Sci. 2, 186–192 (2014)
Chen, J.C., Hu, G.W., Yang, Z.H., et al.: Text clustering based on global center-determination. Comput. Eng. Appl. 47, 147–150 (2011)
Ni, X., Quan, X., Lu, Z., et al.: Short text clustering by finding core terms. Knowl. Inf. Syst. 27(3), 345–365 (2011)
Qiu, Y., Wang, L., Shao, L.: User interest modeling approach based on short text of micro-blog. Comput. Eng. 40(2), 275–279 (2014)
Man, Y.: Feature extension for short text categorization using frequent term sets. In: Proceedings of 2nd International Conference on Information Technology and Quantitative Management, ITQM 2014. Procedia Computer Science, vol. 31, pp. 663– 670 (2014)
Bach, F.R., Jordan, M.I.: Learning spectral clustering. Adv. Neural Inf. Process. Syst. 7(2), 2006 (2004)
Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. Adv. Neural Inf. Process. Syst. 2(14), 849–856 (2002)
Li, J., Tian, Y., Huang, T., et al.: Multi-polarity text segmentation using graph theory. In: International Conference on Information Processing (ICIP), San Diego, American, pp. 3008–3011. IEEE (2008)
Shi, J., Malik, J.: Normalized cuts and image segmentation. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)
Cai, D., He, X., Han, J.: Document clustering using locality preserving indexing. Knowl. Data Eng. 17(12), 1624–1637 (2005)
Zhao, Y., Karypis, G.: Criterion functions for document clustering: experiments and analysis. Mach. Learn. 55(3), 311–331 (2004)
Hartigan, J.A., Wong, M.A.: Algorithm as 136: a K-means clustering algorithm. Appl. Stat. 28(1), 100–108 (1979)
Chang, P., Feng, N., Ma, H.: Document clustering algorithm based on word co-occurrence. Comput. Eng. 38(2), 213–214, 220 (2012)
He, T., Cao, X.-B., Tan, H.: An immune based algorithm for Chinese network short text clustering. Acta Autom. Sin. 35(7), 896–902 (2009)
Acknowledgments
This work was supported in part by National Natural Science Foundation of China under Grant No. 61272088, the Natural Science Foundation for Young Scientists of Gansu Province, China (Grant No. 1308TJY085, 145RJYA259), Youth Teacher Scientific Capability Promoting Project of Northwest Normal University (No. NWNU-LKQN-13-23).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Li, X., He, T., Ran, H., Lu, X. (2016). A Novel Graph Partitioning Criterion Based Short Text Clustering Method. In: Huang, DS., Han, K., Hussain, A. (eds) Intelligent Computing Methodologies. ICIC 2016. Lecture Notes in Computer Science(), vol 9773. Springer, Cham. https://doi.org/10.1007/978-3-319-42297-8_32
Download citation
DOI: https://doi.org/10.1007/978-3-319-42297-8_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42296-1
Online ISBN: 978-3-319-42297-8
eBook Packages: Computer ScienceComputer Science (R0)