Abstract
Tagging behavior on the Internet has seen dramatic increase in recent years, and social tagging has become a popular way to organize and share resources. However, ambiguity and large quantities of tags restrict its effective use for resource searching and classifying. Tag clustering can group tags with similar semantics together, thus helping alleviate these problems. In this paper, we introduce a random walk-based method to measure relevance between tags by exploiting the relationship between tags and resources. Based on this, we also develop a novel clustering method, TagClus, which can address several challenges in tag clustering. Experimental results on a real dataset show that our methods achieve good accuracy and acceptable performance for tag clustering.
Similar content being viewed by others
References
Song, Y, Zhuang, Z, Li, H, Zhao, Q, Li, J, Lee, WC, Giles, CL (2008) Real-time automatic tag recommendation. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’08), Singapore, pp 512–522
Flickr (2009) Available at http://www.flickr.com
Lastfm (2009) Available at http://www.lastfm.com
Del.icio.us. Available at http://delicious.com
Newzingo Your Map to Google News. http://www.newzingo.com
Grigory B, Philipp K, Frank S (2006) Automated tag clustering: improving search and exploration in the tag space. In: collaborative web tagging workshop at WWW2006, Edinburgh, Scotland
Simpson E (2008) Clustering tags in enterprise and web folksonomies. Technical report, HP Labs
Cameron M, Mor N, Danah B, Marc D (2006) HT06, tagging paper, taxonomy, flickr, academic article, to read. In: Proceedings of the 17th conference on hypertext and hypermedia, Odense, Denmark, pp 31–40
Fabian MS, Milan V, Dinan G (2008) Social tags: meaning and suggestions. In: Proceeding of the 17th ACM conference on information and knowledge management (CIKM’08), Napa Valley, CA, USA, 223–232
Kerstin B, Claudiu SF, Wolfgang N, Raluca P (2008) Can all tags be used for search? In: Proceeding of the 17th ACM conference on information and knowledge management (CIKM’08), Napa Valley, CA, USA, pp 193–202
Paul H, Hector G (2006) Collaborative creation of communal hierarchical taxonomies in social tagging. Stanford InfoLab Technical Report, No. 2006–10
Celine VD, Martin H, Katharina S (2007) Folksontology: an integrated approach for turning folksonomies into ontology. In: Proceedings of the ESWC workshop “bridging the gap between semantic web and web 2.0 (SemNet’07)”, 57–70
Christopher HB, Nancy M (2006) Improved annotation of the blogopshere via autotagging and hierarchical clustering. Proceedings of the 15th World Wide Web Conference (WWW’06), Edinburgh, Scotland
Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18: 613–620
Gerard S, Michael JM (1986) Introduction to Modern Information Retrieval. McGraw-Hill, NY
Glen J, Jennifer W (2002) SimRank: a measure of structural-context similarity. In : Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD’02), ACM Press, New York, pp 538–543
Leonard K, Peter JR (1990) Finding groups in data: an introduction to cluster analysis. Wiley, London
Song W, Park S (2010) Latent semantic analysis for vector space expansion and fuzzy logic-based genetic clustering. Knowledge Inf Syst 22: 347–369
Gabriela M, Arthur Z, Peer K, Hans-Pater K, Jorg S (2009) Subspace and projected clustering: experimental evaluation and analysis. Knowledge Inf Syst 21: 299–326
Xiong H, Michael S, Arifin R, Vipin K (2009) Characterizing pattern preserving clustering. Knowledge Inf Syst 19: 311–336
Darius P, Richard L, David P (2009) Characterization and evaluation of similarity measures for pairs of clusterings. Knowledge Inf Syst 19: 361–394
Tian Z, Raghu R, Miron L (1996) BIRCH: an efficient data clustering method for very large databases. In: Jagadish HV, Mumick IS (eds) Proceeding of the 1996 ACM SIGMOD international conference on management of data (SIGMOD’96). ACM Press, Montreal, pp 103–114
Sudipto G, Rajeev R, Kyuseok S (1998) CURE: an efficient clustering algorithm for large databases. In: Haas LM, Tiwary A (eds) Proceeding of the ACM SIGMOD international conference on management of data (SIGMOD’98). ACM Press, Seattle, pp 73–84
Ester M, Kriegel HP, Sander J, Xu X (1996) A density based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis E, Han JW, Fayyad UM (eds) Proceedings of the 2nd international conference on knowledge discovery and data mining (SIGKDD’96). AAAI Press, Portland, pp 226–231
Motwani R, Raghavan P (1995) Randomized algorithms. Cambridge University Press, New York
Kallenberg O (1997) Foundations of modern probability. Springer, New York
Page L, Brin S, Motwani R, Winograd T (1998) The PageRank citation ranking: bringing order to the Web. Technical report, Stanford University Database Group
Wikipedia. Stochastic matrix. (2009) Available at http://en.wikipedia.org/wiki/Stochastic_matrix
Li P, Li ZX, Liu HY, He J, Du XY (2009) Using link-based content analysis to measure document similarity effectively. In: Proceedings of the joint international conferences on advances in data and web management (APWeb/WAIM 2009), Suzhou, China, Lecture Notes In Computer Science, vol 5446, pp 455–467
The stop-words list (2009) Available at http://members.unine.ch/jacques.savoy/clef/englishST.txt
Porter M (1980) An algorithm for suffix stripping. Program, vol 14, no 3, pp 130–137, http://www.tartarus.org/~martin/PorterStemmer
Borkur S, Roelof VZ (2008) Flickr tag recommendation based on collective knowledge. Proceeding of the 17th international conference on World Wide Web(WWW’08), Beijing, China, pp 327-336
Adamic LA (2009) Zipf, power-laws, and pareto—a ranking tutorial. Available at http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html
Reed WJ (2001) The Pareto, zipf and other power laws. Econ Lett 74: 15–19
Ng RT, Han J (1994) Efficient and effective clustering methods for spatial data mining. Proceedings of the 20th international conference on very large data bases(VLDB’1994), San Francisco, CA, USA, pp 144–155
Ben-Dor A, Shamir R, Yakhini Z (1999) Clustering gene expression patterns. J Comput Biol 6(3/4): 281–297
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Cui, J., Liu, H., He, J. et al. TagClus: a random walk-based method for tag clustering. Knowl Inf Syst 27, 193–225 (2011). https://doi.org/10.1007/s10115-010-0307-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-010-0307-y