Abstract
The rapid development of Internet, social media, and news portals has provided a large amount of information in various aspects. Confronting such plenty of resources, it is valuable to develop effective clustering approaches. However, performance of traditional clustering models on web resources is not good enough due to the high dimension. In this paper, we propose a clustering model based on topic model and density peaks. Our model combines biterm topic model and clustering by fast search of density peaks, which firstly extract a set of features with the co-occurrence of two words from the original documents, followed by clustering analysis via topical features. Web resources are translated from raw data into clusters, and evaluation on clustering results of center part verifies the effectiveness of the proposed method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bao, S., Xu, S.¸ Zhang, L., Yan, R., Su, Z., Han, D., Yu, Y.: Joint emotion-topic modeling for social affective text mining. In: Proceedings of the 9th IEEE International Conference on Data Mining (ICDM), pp. 699–704 (2009)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inform. Sci. 41(6), 391 (1990)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD), pp. 226–231 (1996)
Fischer, G.: User modeling in humancomputer interaction. User Model. User-Adap. Inter. 11(1–2), 65–86 (2001)
Fukunaga, K., Hostetler, L.: The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans. Inf. Theory 21(1), 32–40 (1975)
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. 101(suppl. 1), 5228–5235 (2004)
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 50–57 (1999)
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: an Introduction to Cluster Analysis. Wiley, New York (2009)
Kuang, W., Luo, N., Sun, Z.: Resource recommendation based on topic model for educational system. In: Proceedings of the 6th IEEE Joint International Information Technology and Artificial Intelligence Conference (ITAIC), pp. 370–374 (2011)
Lakiotaki, K., Matsatsinis, N.F., Tsoukià s, A.: Multicriteria user modeling in recommender systems. IEEE Intell. Syst. 26(2), 64–76 (2011)
Lin, C., He, Y.: Joint sentiment/topic model for sentiment analysis. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM), pp. 375–384 (2009)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability: Statistics, vol. 1, pp. 281–297. University of California Press (1967)
Martın-Guerrero, J.D., Palomares, A., Balaguer-Ballester, E., Soria-Olivas, E., Gómez-Sanchis, J., Soriano-Asensi, A.: Studying the feasibility of a recommender in a citizen web portal based on user modeling and clustering algorithms. Expert Syst. Appl. 30(2), 299–312 (2006)
McLachlan, G., Krishnan, T.: The EM algorithm and extensions. Wiley, New York (2007)
Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)
Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI), pp. 487–494 (2004)
Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical dirichlet processes. J. Am. Stat. Assoc. (2012)
Thollard, F., Dupont, P., Higuera, C.D.L.: Probabilistic dfa inference using kullback-leibler divergence and minimality. In: Proceedings of the 17th International Conference on Machine Learning (ICML), pp. 975–982 (2000)
Trier, Ø.D., Jain, A.K., Taxt, T.: Feature extraction methods for character recognition-a survey. Pattern Recogn. 29(4), 641–662 (1996)
Wang, S., Tang, Z., Rao, Y., Xie, H., Wang, F.L.: A clustering algorithm based on minimum spanning tree with e-learning applications. In: Gong, Z., Chiu, D.K.W., Zou, D. (eds.) ICWL 2015. LNCS, vol. 9584, pp. 3–12. Springer, Heidelberg (2016). doi:10.1007/978-3-319-32865-2_1
Xie, H., Li, Q., Cai, Y.: Community-aware resource profiling for personalized search in folksonomy. J. Comput. Sci. Technol. 27(3), 599–610 (2012)
Xie, H., Li, Q., Mao, X., Li, X., Cai, Y., Rao, Y.: Community-aware user profile enrichment in folksonomy. Neural Netw. 58, 111–121 (2014)
Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)
Yan, X., Guo, J., Lan, Y., Cheng, X.: A biterm topic model for short texts. In: Proceedings of the 22nd International Conference on World Wide Web (WWW), pp. 1445–1456 (2013)
Zhang, T., Ramakrishnan, R., Livny, M.: Birch: an efficient data clustering method for very large databases. ACM Sigmod. Rec. 25(2), 103–114 (1996)
Acknowledgements
The research work described in this article was supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (UGC/FDS11/E06/14).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Zhao, S., Wang, F.L., Wong, L.P. (2017). Topic-Level Clustering on Web Resources. In: Wu, TT., Gennari, R., Huang, YM., Xie, H., Cao, Y. (eds) Emerging Technologies for Education. SETE 2016. Lecture Notes in Computer Science(), vol 10108. Springer, Cham. https://doi.org/10.1007/978-3-319-52836-6_60
Download citation
DOI: https://doi.org/10.1007/978-3-319-52836-6_60
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-52835-9
Online ISBN: 978-3-319-52836-6
eBook Packages: Computer ScienceComputer Science (R0)