Skip to main content
Log in

TagClus: a random walk-based method for tag clustering

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Tagging behavior on the Internet has seen dramatic increase in recent years, and social tagging has become a popular way to organize and share resources. However, ambiguity and large quantities of tags restrict its effective use for resource searching and classifying. Tag clustering can group tags with similar semantics together, thus helping alleviate these problems. In this paper, we introduce a random walk-based method to measure relevance between tags by exploiting the relationship between tags and resources. Based on this, we also develop a novel clustering method, TagClus, which can address several challenges in tag clustering. Experimental results on a real dataset show that our methods achieve good accuracy and acceptable performance for tag clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Song, Y, Zhuang, Z, Li, H, Zhao, Q, Li, J, Lee, WC, Giles, CL (2008) Real-time automatic tag recommendation. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’08), Singapore, pp 512–522

  2. Flickr (2009) Available at http://www.flickr.com

  3. Lastfm (2009) Available at http://www.lastfm.com

  4. Del.icio.us. Available at http://delicious.com

  5. Newzingo Your Map to Google News. http://www.newzingo.com

  6. Grigory B, Philipp K, Frank S (2006) Automated tag clustering: improving search and exploration in the tag space. In: collaborative web tagging workshop at WWW2006, Edinburgh, Scotland

  7. Simpson E (2008) Clustering tags in enterprise and web folksonomies. Technical report, HP Labs

  8. Cameron M, Mor N, Danah B, Marc D (2006) HT06, tagging paper, taxonomy, flickr, academic article, to read. In: Proceedings of the 17th conference on hypertext and hypermedia, Odense, Denmark, pp 31–40

  9. Fabian MS, Milan V, Dinan G (2008) Social tags: meaning and suggestions. In: Proceeding of the 17th ACM conference on information and knowledge management (CIKM’08), Napa Valley, CA, USA, 223–232

  10. Kerstin B, Claudiu SF, Wolfgang N, Raluca P (2008) Can all tags be used for search? In: Proceeding of the 17th ACM conference on information and knowledge management (CIKM’08), Napa Valley, CA, USA, pp 193–202

  11. Paul H, Hector G (2006) Collaborative creation of communal hierarchical taxonomies in social tagging. Stanford InfoLab Technical Report, No. 2006–10

  12. Celine VD, Martin H, Katharina S (2007) Folksontology: an integrated approach for turning folksonomies into ontology. In: Proceedings of the ESWC workshop “bridging the gap between semantic web and web 2.0 (SemNet’07)”, 57–70

  13. Christopher HB, Nancy M (2006) Improved annotation of the blogopshere via autotagging and hierarchical clustering. Proceedings of the 15th World Wide Web Conference (WWW’06), Edinburgh, Scotland

  14. Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18: 613–620

    Article  MATH  Google Scholar 

  15. Gerard S, Michael JM (1986) Introduction to Modern Information Retrieval. McGraw-Hill, NY

    Google Scholar 

  16. Glen J, Jennifer W (2002) SimRank: a measure of structural-context similarity. In : Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD’02), ACM Press, New York, pp 538–543

  17. Leonard K, Peter JR (1990) Finding groups in data: an introduction to cluster analysis. Wiley, London

    Google Scholar 

  18. Song W, Park S (2010) Latent semantic analysis for vector space expansion and fuzzy logic-based genetic clustering. Knowledge Inf Syst 22: 347–369

    Article  MathSciNet  Google Scholar 

  19. Gabriela M, Arthur Z, Peer K, Hans-Pater K, Jorg S (2009) Subspace and projected clustering: experimental evaluation and analysis. Knowledge Inf Syst 21: 299–326

    Article  Google Scholar 

  20. Xiong H, Michael S, Arifin R, Vipin K (2009) Characterizing pattern preserving clustering. Knowledge Inf Syst 19: 311–336

    Article  Google Scholar 

  21. Darius P, Richard L, David P (2009) Characterization and evaluation of similarity measures for pairs of clusterings. Knowledge Inf Syst 19: 361–394

    Article  Google Scholar 

  22. Tian Z, Raghu R, Miron L (1996) BIRCH: an efficient data clustering method for very large databases. In: Jagadish HV, Mumick IS (eds) Proceeding of the 1996 ACM SIGMOD international conference on management of data (SIGMOD’96). ACM Press, Montreal, pp 103–114

    Google Scholar 

  23. Sudipto G, Rajeev R, Kyuseok S (1998) CURE: an efficient clustering algorithm for large databases. In: Haas LM, Tiwary A (eds) Proceeding of the ACM SIGMOD international conference on management of data (SIGMOD’98). ACM Press, Seattle, pp 73–84

    Google Scholar 

  24. Ester M, Kriegel HP, Sander J, Xu X (1996) A density based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis E, Han JW, Fayyad UM (eds) Proceedings of the 2nd international conference on knowledge discovery and data mining (SIGKDD’96). AAAI Press, Portland, pp 226–231

    Google Scholar 

  25. Motwani R, Raghavan P (1995) Randomized algorithms. Cambridge University Press, New York

    MATH  Google Scholar 

  26. Kallenberg O (1997) Foundations of modern probability. Springer, New York

    MATH  Google Scholar 

  27. Page L, Brin S, Motwani R, Winograd T (1998) The PageRank citation ranking: bringing order to the Web. Technical report, Stanford University Database Group

  28. Wikipedia. Stochastic matrix. (2009) Available at http://en.wikipedia.org/wiki/Stochastic_matrix

  29. Li P, Li ZX, Liu HY, He J, Du XY (2009) Using link-based content analysis to measure document similarity effectively. In: Proceedings of the joint international conferences on advances in data and web management (APWeb/WAIM 2009), Suzhou, China, Lecture Notes In Computer Science, vol 5446, pp 455–467

  30. The stop-words list (2009) Available at http://members.unine.ch/jacques.savoy/clef/englishST.txt

  31. Porter M (1980) An algorithm for suffix stripping. Program, vol 14, no 3, pp 130–137, http://www.tartarus.org/~martin/PorterStemmer

  32. Borkur S, Roelof VZ (2008) Flickr tag recommendation based on collective knowledge. Proceeding of the 17th international conference on World Wide Web(WWW’08), Beijing, China, pp 327-336

  33. Adamic LA (2009) Zipf, power-laws, and pareto—a ranking tutorial. Available at http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html

  34. Reed WJ (2001) The Pareto, zipf and other power laws. Econ Lett 74: 15–19

    Article  MATH  Google Scholar 

  35. Ng RT, Han J (1994) Efficient and effective clustering methods for spatial data mining. Proceedings of the 20th international conference on very large data bases(VLDB’1994), San Francisco, CA, USA, pp 144–155

  36. Ben-Dor A, Shamir R, Yakhini Z (1999) Clustering gene expression patterns. J Comput Biol 6(3/4): 281–297

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Hongyan Liu or Jun He.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cui, J., Liu, H., He, J. et al. TagClus: a random walk-based method for tag clustering. Knowl Inf Syst 27, 193–225 (2011). https://doi.org/10.1007/s10115-010-0307-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-010-0307-y

Keywords

Navigation