TagClus: a random walk-based method for tag clustering

Cui, Jianwei; Liu, Hongyan; He, Jun; Li, Pei; Du, Xiaoyong; Wang, Puwei

doi:10.1007/s10115-010-0307-y

TagClus: a random walk-based method for tag clustering

Regular Paper
Published: 13 June 2010

Volume 27, pages 193–225, (2011)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Jianwei Cui^1,2,
Hongyan Liu³,
Jun He^1,2,
Pei Li^1,2,
Xiaoyong Du^1,2 &
…
Puwei Wang^1,2

354 Accesses
17 Citations
Explore all metrics

Abstract

Tagging behavior on the Internet has seen dramatic increase in recent years, and social tagging has become a popular way to organize and share resources. However, ambiguity and large quantities of tags restrict its effective use for resource searching and classifying. Tag clustering can group tags with similar semantics together, thus helping alleviate these problems. In this paper, we introduce a random walk-based method to measure relevance between tags by exploiting the relationship between tags and resources. Based on this, we also develop a novel clustering method, TagClus, which can address several challenges in tag clustering. Experimental results on a real dataset show that our methods achieve good accuracy and acceptable performance for tag clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Song, Y, Zhuang, Z, Li, H, Zhao, Q, Li, J, Lee, WC, Giles, CL (2008) Real-time automatic tag recommendation. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’08), Singapore, pp 512–522
Flickr (2009) Available at http://www.flickr.com
Lastfm (2009) Available at http://www.lastfm.com
Del.icio.us. Available at http://delicious.com
Newzingo Your Map to Google News. http://www.newzingo.com
Grigory B, Philipp K, Frank S (2006) Automated tag clustering: improving search and exploration in the tag space. In: collaborative web tagging workshop at WWW2006, Edinburgh, Scotland
Simpson E (2008) Clustering tags in enterprise and web folksonomies. Technical report, HP Labs
Cameron M, Mor N, Danah B, Marc D (2006) HT06, tagging paper, taxonomy, flickr, academic article, to read. In: Proceedings of the 17th conference on hypertext and hypermedia, Odense, Denmark, pp 31–40
Fabian MS, Milan V, Dinan G (2008) Social tags: meaning and suggestions. In: Proceeding of the 17th ACM conference on information and knowledge management (CIKM’08), Napa Valley, CA, USA, 223–232
Kerstin B, Claudiu SF, Wolfgang N, Raluca P (2008) Can all tags be used for search? In: Proceeding of the 17th ACM conference on information and knowledge management (CIKM’08), Napa Valley, CA, USA, pp 193–202
Paul H, Hector G (2006) Collaborative creation of communal hierarchical taxonomies in social tagging. Stanford InfoLab Technical Report, No. 2006–10
Celine VD, Martin H, Katharina S (2007) Folksontology: an integrated approach for turning folksonomies into ontology. In: Proceedings of the ESWC workshop “bridging the gap between semantic web and web 2.0 (SemNet’07)”, 57–70
Christopher HB, Nancy M (2006) Improved annotation of the blogopshere via autotagging and hierarchical clustering. Proceedings of the 15th World Wide Web Conference (WWW’06), Edinburgh, Scotland
Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18: 613–620
Article MATH Google Scholar
Gerard S, Michael JM (1986) Introduction to Modern Information Retrieval. McGraw-Hill, NY
Google Scholar
Glen J, Jennifer W (2002) SimRank: a measure of structural-context similarity. In : Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD’02), ACM Press, New York, pp 538–543
Leonard K, Peter JR (1990) Finding groups in data: an introduction to cluster analysis. Wiley, London
Google Scholar
Song W, Park S (2010) Latent semantic analysis for vector space expansion and fuzzy logic-based genetic clustering. Knowledge Inf Syst 22: 347–369
Article MathSciNet Google Scholar
Gabriela M, Arthur Z, Peer K, Hans-Pater K, Jorg S (2009) Subspace and projected clustering: experimental evaluation and analysis. Knowledge Inf Syst 21: 299–326
Article Google Scholar
Xiong H, Michael S, Arifin R, Vipin K (2009) Characterizing pattern preserving clustering. Knowledge Inf Syst 19: 311–336
Article Google Scholar
Darius P, Richard L, David P (2009) Characterization and evaluation of similarity measures for pairs of clusterings. Knowledge Inf Syst 19: 361–394
Article Google Scholar
Tian Z, Raghu R, Miron L (1996) BIRCH: an efficient data clustering method for very large databases. In: Jagadish HV, Mumick IS (eds) Proceeding of the 1996 ACM SIGMOD international conference on management of data (SIGMOD’96). ACM Press, Montreal, pp 103–114
Google Scholar
Sudipto G, Rajeev R, Kyuseok S (1998) CURE: an efficient clustering algorithm for large databases. In: Haas LM, Tiwary A (eds) Proceeding of the ACM SIGMOD international conference on management of data (SIGMOD’98). ACM Press, Seattle, pp 73–84
Google Scholar
Ester M, Kriegel HP, Sander J, Xu X (1996) A density based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis E, Han JW, Fayyad UM (eds) Proceedings of the 2nd international conference on knowledge discovery and data mining (SIGKDD’96). AAAI Press, Portland, pp 226–231
Google Scholar
Motwani R, Raghavan P (1995) Randomized algorithms. Cambridge University Press, New York
MATH Google Scholar
Kallenberg O (1997) Foundations of modern probability. Springer, New York
MATH Google Scholar
Page L, Brin S, Motwani R, Winograd T (1998) The PageRank citation ranking: bringing order to the Web. Technical report, Stanford University Database Group
Wikipedia. Stochastic matrix. (2009) Available at http://en.wikipedia.org/wiki/Stochastic_matrix
Li P, Li ZX, Liu HY, He J, Du XY (2009) Using link-based content analysis to measure document similarity effectively. In: Proceedings of the joint international conferences on advances in data and web management (APWeb/WAIM 2009), Suzhou, China, Lecture Notes In Computer Science, vol 5446, pp 455–467
The stop-words list (2009) Available at http://members.unine.ch/jacques.savoy/clef/englishST.txt
Porter M (1980) An algorithm for suffix stripping. Program, vol 14, no 3, pp 130–137, http://www.tartarus.org/~martin/PorterStemmer
Borkur S, Roelof VZ (2008) Flickr tag recommendation based on collective knowledge. Proceeding of the 17th international conference on World Wide Web(WWW’08), Beijing, China, pp 327-336
Adamic LA (2009) Zipf, power-laws, and pareto—a ranking tutorial. Available at http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html
Reed WJ (2001) The Pareto, zipf and other power laws. Econ Lett 74: 15–19
Article MATH Google Scholar
Ng RT, Han J (1994) Efficient and effective clustering methods for spatial data mining. Proceedings of the 20th international conference on very large data bases(VLDB’1994), San Francisco, CA, USA, pp 144–155
Ben-Dor A, Shamir R, Yakhini Z (1999) Clustering gene expression patterns. J Comput Biol 6(3/4): 281–297
Article Google Scholar

Download references

Author information

Authors and Affiliations

Key Labs of Data Engineering and Knowledge Engineering, Ministry of Education, Beijing, China
Jianwei Cui, Jun He, Pei Li, Xiaoyong Du & Puwei Wang
School of Information, Renmin University of China, 100872, Beijing, China
Jianwei Cui, Jun He, Pei Li, Xiaoyong Du & Puwei Wang
School of Economics and Management, Tsinghua University, 100084, Beijing, China
Hongyan Liu

Authors

Jianwei Cui
View author publications
You can also search for this author in PubMed Google Scholar
Hongyan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jun He
View author publications
You can also search for this author in PubMed Google Scholar
Pei Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyong Du
View author publications
You can also search for this author in PubMed Google Scholar
Puwei Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Hongyan Liu or Jun He.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cui, J., Liu, H., He, J. et al. TagClus: a random walk-based method for tag clustering. Knowl Inf Syst 27, 193–225 (2011). https://doi.org/10.1007/s10115-010-0307-y

Download citation

Received: 31 October 2009
Revised: 12 April 2010
Accepted: 15 May 2010
Published: 13 June 2010
Issue Date: May 2011
DOI: https://doi.org/10.1007/s10115-010-0307-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

TagClus: a random walk-based method for tag clustering

Abstract

Access this article

Similar content being viewed by others

A social tag clustering method based on common co-occurrence group similarity

A Collaborative Filtering Recommendation Algorithm Based on Tag Clustering

An Incremental Clustering Approach to Personalized Tag Recommendations

References

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Keywords

Navigation

TagClus: a random walk-based method for tag clustering

Abstract

Access this article

Similar content being viewed by others

A social tag clustering method based on common co-occurrence group similarity

A Collaborative Filtering Recommendation Algorithm Based on Tag Clustering

An Incremental Clustering Approach to Personalized Tag Recommendations

References

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation