Skip to main content

A New Vector Space Model Exploiting Semantic Correlations of Social Annotations for Web Page Clustering

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6897))

Abstract

Text clustering can effectively improve search results and user experience of information retrieval system. Traditional text clustering approaches are based on vector space model, in which a document is represented as a vector using term frequency based weighting scheme. The main disadvantage of this model is that it cannot fully exploit semantic correlations between social annotations and document contents because term frequency based weighting scheme only captures the number of occurrences of terms in the document. However, social annotation of web pages implicates fundamental and valuable semantic information thus can be fully utilized to improve information retrieval system. In this paper, we investigate and evaluate several extended vector space models which can combine social annotation and web page text. In particular, we propose a novel vector space model by computing the semantic correlations between social annotations and web page words. Comparing with other vector space models, our experiments show that using semantic correlations between social tags and web page words improves the clustering accuracy with RI score increase of 4% ~ 7%.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   99.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   129.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Zhou, D., Bian, J., Zheng, S., Zha, H., Giles, C.L.: Exploring social annotations for information retrieval. In: The 17th International Conference on World Wide Web (WWW 2008), pp. 715–724. ACM Press, Beijing (2008)

    Google Scholar 

  2. Wu, X., Zhang, L., Yu, Y.: Exploring social annotations for the semantic web. In: The 15th International Conference on World Wide Web (WWW 2006), pp. 417–426. ACM Press, Edinburgh (2006)

    Chapter  Google Scholar 

  3. Ponte, J.M., Croft, W.B.: A Language Modeling Approach to Information Retrieval. In: The 21th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1998), pp. 275–281. ACM Press, Melbourne (1998)

    Chapter  Google Scholar 

  4. Xu, S., Bao, S., Cao, Y., Yu, Y.: Using social annotations to improve language model for information retrieval. In: The 16th International ACM Conference on Conference on Information and Knowledge Management (CIKM 2007), pp. 1003–1006. ACM Press, Lisboa (2007)

    Google Scholar 

  5. Xu, S., Bao, S., Yu, Y., Cao, Y.: Using Social Annotations to Smooth the Language Model for IR. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 1015–1021. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  6. Aberer, K., Cudré-Mauroux, P., Hauswirth, M.: The chatty web: emergent semantics through gossiping. In: The 12th International Conference on World Wide Web (WWW 2003), pp. 197–206. ACM Press, Budapest (2003)

    Google Scholar 

  7. Mika, P.: Ontologies Are Us: A Unified Model of Social Networks and Semantics. In: Gil, Y., Motta, E., Benjamins, V.R., Musen, M.A. (eds.) ISWC 2005. LNCS, vol. 3729, pp. 522–536. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  8. Markines, B., Cattuto, C., Menczer, F., Benz, D., Hotho, A., Stumme, G.: Evaluating similarity measures for emergent semantics of social tagging. In: The 18th International Conference on World Wide Web (WWW 2009), pp. 641–650. ACM Press, Marid (2009)

    Chapter  Google Scholar 

  9. Markines, B., Menczer, F.: A scalable, collaborative similarity measure for social annotation systems. In: The 20th ACM Conference on Hypertext and Hypermedia, pp. 347–348. ACM Press, Torino (2009)

    Chapter  Google Scholar 

  10. Cattuto, C., Benz, D., Hotho, A., Stumme, G.: Semantic Grounding of Tag Relatedness in Social Bookmarking Systems. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 615–631. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  11. Bao, S., Xue, G., Wu, X., Yu, Y., Fei, B., Su, Z.: Optimizing web search using social annotations. In: The 16th International Conference on World Wide Web (WWW 2007), pp. 501–510. ACM Press, Banff (2007)

    Chapter  Google Scholar 

  12. Liu, D., Hua, X., Yang, L., Wang, M., Zhang, H.: Tag ranking. In: The 18th International Conference on World Wide Web (WWW 2009), pp. 351–360. ACM Press, Marid (2009)

    Chapter  Google Scholar 

  13. Schenkel, R., Crecelius, T., Kacimi, M., Michel, S., Neumann, T., Parreira, J.X., Weikum, G.: Efficient top-k querying over social-tagging networks. In: The 31th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2008), pp. 523–530. ACM Press, Singapore (2008)

    Google Scholar 

  14. Pedro, J.S., Siersdorfer, S.: Ranking and classifying attractiveness of photos in folksonomies. In: The 18th International Conference on World Wide Web (WWW 2009), pp. 771–780. ACM Press, Marid (2009)

    Chapter  Google Scholar 

  15. Noll, M.G., Meinel, C.: Exploring social annotations for web document classification. In: The 2008 ACM Symposium on Applied Computing (SAC 2008), pp. 2315–2320. ACM Press, Brazil (2008)

    Google Scholar 

  16. Shepitsen, A., Gemmell, J., Mobasher, B., Burke, R.D.: Personalized recommendation in social tagging systems using hierarchical clustering. In: The 2008 ACM Conference on Recommender Systems, pp. 259–266. ACM Press, Lausanne (2008)

    Chapter  Google Scholar 

  17. Gemmell, J., Shepitsen, A., Mobasher, B.: Personalization in Folksonomies Based on Tag Clustering. In: The AAAI 2008 Workshop on Intelligent Techniques for Web Personalization and Recommender Systems, Chicago, pp. 37–48 (2008)

    Google Scholar 

  18. Begelman, G., Keller, P.: Automated Tag Clustering: Improving Search and Exploration in the Tag Space. In: The 15th International Conference on World Wide Web (WWW 2006), Workshop on Collaborative Web Tagging, Edinburgh, UK (2006)

    Google Scholar 

  19. Ramage, D., Heymann, P.: Clustering the Tagged Web. In: The Second ACM International Conference on Web Search and Data Mining, pp. 54–63. ACM Press, Barcelona (2009)

    Chapter  Google Scholar 

  20. Salton, G., Buckley, C.: Term-Weighting Approaches in Automatic Text Retrieval. Information Processing and Management: an International Journal 24, 513–523 (1988)

    Article  Google Scholar 

  21. Yeung, K.Y., Ruzzo, W.L.: Principal component analysis for clustering gene expression data. Bioinformatics 17, 763–774 (2001)

    Article  Google Scholar 

  22. Manning, C., Raghavan, P., Schütze, H.: Introduction to information retrieval. Cambridge University Press, Cambridge (2008)

    Book  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gu, X., Wang, X., Li, R., Wen, K., Yang, Y., Xiao, W. (2011). A New Vector Space Model Exploiting Semantic Correlations of Social Annotations for Web Page Clustering. In: Wang, H., Li, S., Oyama, S., Hu, X., Qian, T. (eds) Web-Age Information Management. WAIM 2011. Lecture Notes in Computer Science, vol 6897. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23535-1_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23535-1_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23534-4

  • Online ISBN: 978-3-642-23535-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics