Skip to main content

Predicting User Tags in Social Media Repositories Using Semantic Expansion and Visual Analysis

  • Chapter
  • First Online:
Social Media Retrieval

Part of the book series: Computer Communications and Networks ((CCN))

  • 3485 Accesses

Abstract

Manually annotating large scale content such as Internet videos is an expensive and consuming process. Furthermore, community-provided tags lack consistency and present numerous irregularities. This chapter aims to provide a forum for the state-of-the-art research in this emerging field, with particular focus on mechanisms capable of exploiting the full range of information available online to predict user tags automatically. The exploited information covers both semantic metadata including complementary information in external resources and embedded low-level features within the multimedia content. Furthermore, this chapter presents a framework for predicting general tags from the associated textual metadata and visual features. The goal of this framework is to simplify and improve the process of tagging online videos, which are unbounded to any particular domain. In this framework, the first step is to extract named entities exploiting complementary textual resources such as Wikipedia and WordNet. To facilitate the extraction of semantically meaningful tags from a largely unstructured textual corpus, this framework employs GATE natural language processing tools. Extending the functionalities of the built-in GATE named entities, the framework also integrates a bag-of-articles algorithm for effectively extracting relevant articles from the Wikipedia articles. Experiments were conducted for validation of the framework against MediaEval 2010 Wild Wild Web dataset for the tagging task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.flickr.com/

  2. 2.

    www.wikipedia.org/

  3. 3.

    www.youtube.com/

  4. 4.

    http://www.facebook.com/

  5. 5.

    secondlife.com/

  6. 6.

    twitter.com/

  7. 7.

    http://gate.ac.uk/

  8. 8.

    http://www.opencalais.com/

  9. 9.

    The first paragraph of a Wikipedia article contains usually the definition of the article subject, it can be therefore expected to contain more relevant words than the rest of the text.

  10. 10.

    http://www.mediawiki.org/wiki/Extension:Lucene-search

  11. 11.

    http://lucene.apache.org

  12. 12.

    A is said to be related to B, if A links to B, and there is some C that links to both A and B (source: Lucene-Search Extension documentation).

  13. 13.

    http://code.google.com/p/matrix-toolkits-java/

References

  1. Akbas, E., Yarman Vural, F.T.: Automatic image annotation by ensemble of visual descriptors. In: CVPR, Minneapolis, pp. 1–8 (2007)

    Google Scholar 

  2. Al-Khalifa, H.S., Davis, H.C.: Exploring the value of folksonomies for creating semantic metadata. IJSWIS 3(1), 13–39 (2007)

    Google Scholar 

  3. Atomiq, G.S.: Folksonomy: social classification. http://atomiq.org/archives/2004/08/folksonomysocialclassification.html. Accessed August 2004

  4. Bao, S., Xue, G., Wu, X., Yu, Y., Fei, B., Su, Z.: Optimizing web search using social annotations. In: Proceedings of WWW2007, pp. 501–510. ACM, New York (2007)

    Google Scholar 

  5. Barnard, K., Duygulu, P., Forsyth, D., De Freitas, N., Blei, D.M., Jordan, M.I.: Matching words and pictures. J. Mach. Learn. Res. 3, 1107–1135 (2003)

    MATH  Google Scholar 

  6. Bast, H., Dupret, G., Majumdar, D., Piwowarski, B.: Discovering a term taxonomy from term similarities using principal component analysis. In: Semantic Web Mining. Springer, Berlin/New York (2006)

    Google Scholar 

  7. Blohm, S., Cimiano, P.: Using the web to reduce data sparseness in pattern-based information extraction. In: PKDD. Lecture Notes in Computer Science, vol. 4702, pp. 18–29. Springer, Berlin/New York (2007)

    Google Scholar 

  8. Brezeale, D., Cook, D.J.: Automatic video classification: a survey of the literature. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 38(3), 416–430 (2008)

    Article  Google Scholar 

  9. Budanitsky, A., Hirst, G.: Evaluating wordnet-based measures of lexical semantic relatedness. Comput. Linguist. 32(1), 13–47 (2006)

    Article  MATH  Google Scholar 

  10. Carneiro, G., Chan, A.B., Moreno, P.J., Vasconcelos, N.: Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 29(3), 394–410 (2007)

    Article  Google Scholar 

  11. Chandramouli, K., Kliegr, T., Svatek, V., Izquierdo, E.: Towards semantic tagging in collaborative environments. In: 16th International Conference on Digital Signal Processing 2009, pp. 1–6. IEEE, Piscataway (2009)

    Google Scholar 

  12. Chang, E., Goh, K., Sychay, G., Wu, G.: Cbsa: content-based soft annotation for multimodal image retrieval using bayes point machines. IEEE Trans. Circuits Syst. Video Technol. 13(1), 26–38 (2003)

    Article  Google Scholar 

  13. Cimiano, P., Voelker, J.: Text2onto – a framework for ontology learning and data-driven change discovery. In: NLDB 2005, Alicante (2005)

    Google Scholar 

  14. Cucerzan, S.: Large-scale named entity disambiguation based on wikipedia data. In: Proceedings of Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, pp. 708–716 (2007)

    Google Scholar 

  15. Cui, H., Wen, J.R., Nie, J.Y., Ma, W.Y.: Query expansion by mining user logs. IEEE Trans. Knowl. Data Eng. 15(4), 829–839 (2003)

    Article  Google Scholar 

  16. Datta, R., Joshi, D., Li, J., Wang, J.Z.: Image retrieval: ideas, influences, and trends of the new age. ACM Comput. Surv. (CSUR) 40(2), 5 (2008)

    Google Scholar 

  17. Deerwester, D.S., Fumas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. ACM Trans. Inf. Syst. (2000)

    Google Scholar 

  18. Ding, G., Bai, S., Wang, B.: Local co-occurrence based query expansion for information retrieval. J. Chin. Inf. Process. 20, 84–91 (2006)

    Google Scholar 

  19. Duygulu, P., Barnard, K., de Freitas, J., Forsyth, D.: Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In: ECCV 2002, Copenhagen, pp. 349–354 (2002)

    Google Scholar 

  20. Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT, Cambridge/London/England (1998)

    MATH  Google Scholar 

  21. Frank, E., Paynter, G.W., Witten, I.H., Gutwin, C., Nevill-Manning, C.G.: Domain-specific keyphrase extraction. In: Proceeding of 16th International Joint Conference on Artificial Intelligence, Stockholm, pp. 668–673 (1999)

    Google Scholar 

  22. Gabrilovich, E., Markovich, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI 07), Hyderabad (2007)

    Google Scholar 

  23. Gao, Y., Fan, J., Xue, X., Jain, R.: Automatic image annotation by incorporating feature hierarchy and boosting to scale up svm classifiers. In: Proceedings of the 14th Annual ACM International Conference on Multimedia, pp. 901–910. ACM, New York (2006)

    Google Scholar 

  24. Gong, Z., Cheang, C.W., Hou, U.L.: Web query expansion by wordnet. In: DEXA 2005, Copenhagen. LNCS, vol. 3588, pp. 166–175 (2002)

    Google Scholar 

  25. Grootjen, T.P.: Conceptual query expansion. Data Knowl. Eng. 56, 174–193 (2005)

    Article  Google Scholar 

  26. Guillaumin, M., Mensink, T., Verbeek, J.: TagProp: discriminative metric learning in nearest neighbor models for image auto-annotation. In: ICCV, Kyoto, pp. 309–316 (2009)

    Google Scholar 

  27. Hearst, M.: Automatic acquisition of hyponyms from large text corpora. In: Fourteenth International Conference on Computational Linguistics, Nantes, pp. 539–545 (1992)

    Google Scholar 

  28. Hernández-Aranda, D., Granados, R., Cigarran, J., Rodrigo, A., Fresno, V., Garcıa-Serrano, A.: UNED at mediaeval 2010: exploiting text metadata for automatic video tagging. In: MediaEval 2010 Workshop, Pisa (2010)

    Google Scholar 

  29. Heymann, P., Ramage, D., Garcia-Molina, H.: Social tag prediction. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 531–538. ACM, New York (2008)

    Google Scholar 

  30. Hoeber, O., Yang, X.-D., Yao, Y.: Conceptual query expansion. In: Proceedings of the Atlantic Web Intelligence Conference, Lodz (2005)

    Google Scholar 

  31. Hotho, A., Jaschke, R., Schmitz, C., Stumme, G.: Information retrieval in folksonomies: search and ranking. In: Proceedings of ESWC 2006, Budva, pp. 411–426 (2006)

    Google Scholar 

  32. http://mpeg.chiariglione.org/standards/mpeg-7/mpeg-7.htm

  33. Kliegr, T.: Entity classification by bag of wikipedia articles. In: Proceedings of the 3rd Workshop on Ph.D. Students in Information and Knowledge Management, pp. 67–74. ACM, New York (2010)

    Google Scholar 

  34. Kliegr, T., Chandramouli, K., Nemrava, J., Svátek, V., Izquierdo, E.: Combining captions and visual analysis for image concept classification. In: MDM/KDD’08: Proceedings of the 9th International Workshop on Multimedia Data Mining. ACM, New York (2008)

    Google Scholar 

  35. Larson, M., Soleymani, M., Serdyukov, P., Murdock, V., Jones, G. (eds.): In: Working Notes Proceedings of the MediaEval 2010 Workshop, Pisa (2010)

    Google Scholar 

  36. Li, D., Cai, D.: A study of query extension based on query log analysis. In: Proceedings of the Fourth National Student Conference on Computational Linguistics (SWCL-2008) (2008)

    Google Scholar 

  37. Li, Q., Lu, S.C.Y.: Collaborative tagging applications and approaches. IEEE Multimed. 15(3), pp. 14–21 (2008)

    Article  Google Scholar 

  38. Li, J., Wang, J.Z.: Real-time computerized annotation of pictures. In: MM, Santa Barbara, pp. 911–920 (2006)

    Google Scholar 

  39. Li, X., Snoek, C.G.M., Worring, M.: Learning tag relevance by neighbor voting for social image retrieval. In: MIR, Vancouver, pp. 180–187 (2008)

    Google Scholar 

  40. Li, X., Snoek, C.G.M., Worring, M.: Annotating images by harnessing worldwide user-tagged photos. In: ICASSP, Taipei, pp. 3717–3720 (2009)

    Google Scholar 

  41. Lindstaedt, S., Mörzinger, R., Sorschag, R., Pammer, V., Thallinger, G.: Automatic image annotation using visual content and folksonomies. Multimed. Tools Appl. 42(1), 97–113 (2009)

    Article  Google Scholar 

  42. Liu, X., Bruce Croft, W.: Cluster-based retrieval using language models. In: The 2004 ACM 1-58113-881-4/04/0007, 25–29 July 2004

    Google Scholar 

  43. Liu, S., Liu, F., Yu, C., Meng, W.: An effective approach to document retrieval via utilizing wordNet and recognizing phrases. In: Proceedings of the 27th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, Sheffield (2004)

    Google Scholar 

  44. Liu, J., Wang, B., Li, M., Li, Z., Ma, W.Y., Lu, H., Ma, S.: Dual cross-media relevance model for image annotation. In: MM, Augsburg, pp. 605–614 (2007)

    Google Scholar 

  45. Mandel, M., Ellis, D.: A web-based game for collecting music metadata. In: ISMIR, Vienna (2007)

    Google Scholar 

  46. Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT, Cambridge (1999)

    MATH  Google Scholar 

  47. Marlow, C., Naaman, M., Boyd, D., Davis, M.: Position paper, tagging, taxonomy, flickr, article, toRead. In: Proceedings of the 17th Conference on Hypertext and Hypermedia, Odense, pp. 31–40. ACM, New York (2006)

    Google Scholar 

  48. Milne, D., Witten, I.H.: An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In: Advancement of Artificial Intelligence (2008)

    Google Scholar 

  49. Mittal, N., Nayak, R., Govil, M.C., Jain, K.C.: Dynamic query expansion for efficient information retrieval. In: The Proceedings of International Conference on Web Information Systems and Mining, Sanya (2010)

    Google Scholar 

  50. Moehrmann, J., Bernstein, S., Schlegel, T., Werner, G., Heidemann, G.: Improving the usability of hierarchical representations for interactively labeling large image data sets. In: Jacko, J. (ed.) Human-Computer Interaction, Design and Development Approaches. Lecture Notes in Computer Science, vol. 6761, pp. 618–627. Springer, Berlin/New York (2011)

    Chapter  Google Scholar 

  51. Monay, F., Gatica-Perez, D.: On image auto-annotation with latent space models. In: MM, Berkeley, pp. 275–278 (2003)

    Google Scholar 

  52. Nemeth, Y., Shapira, B., Taeib-Maimon, M.: Evaluation of the real and perceived value of automatic and interactive query expansion. In: SIGIR ’04, Sheffield, pp. 526–527 (2006)

    Google Scholar 

  53. Nemrava, J.: Refining search queries using wordnet glosses. In: EKAW 2006, Podebrady, pp. 2–6 (2006)

    Google Scholar 

  54. Paltoglou, G.: A study of information retrieval weighting schemes for sentiment analysis. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, vol. 11–16, pp. 1386–1395 (2010)

    Google Scholar 

  55. Qiu, Y., Frei, H.-P.: Concept based query expansion. In: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 160–169. ACM, Pittsburgh (1993)

    Google Scholar 

  56. Rendle, S., Schmidt-Thieme, L.: Pairwise interaction tensor factorization for personalized tag recommendation. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, pp. 81–90. ACM, New York (2010)

    Google Scholar 

  57. Richardson, R., Smeaton, A.F.: Using wordNet in a knowledge-based approach to information retrieval. In: Proceedings of the BCS-IRSG Colloquium, Crewe (1995)

    Google Scholar 

  58. San Pedro, J., Siersdorfer, S., Sanderson, M.: Content redundancy in YouTube and its application to video Tagging. ACM Trans. Inf. Syst. 29(3), 13:1–13:31 (2011)

    Google Scholar 

  59. Seneviratne, L., Izquierdo, E.: An interactive framework for image annotation through gaming. In: MIR, Philadelphia, pp. 517–526 (2010)

    Google Scholar 

  60. Shapira, B., Taieb-Maimon, M., Nemeth, Y.: Subjective and objective evaluation of interactive and automatic query expansion. In: Online Information Review, pp. 374–390. Emerald, Bradford (2005)

    Google Scholar 

  61. Siersdorfer, S., San Pedro, J., Sanderson, M.: Automatic video tagging using content redundancy. In: SIGIR 2009, Boston, pp. 395–402 (2009)

    Google Scholar 

  62. Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., Jain, R.: Content based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Mach. Intell. 22(12), 1349–1380 (2000)

    Article  Google Scholar 

  63. Snoek, C.G.M., Worring, M.: Concept-based video retrieval. Found. Trends Inf. Retr. 2(4), 215–322 (2008)

    Article  Google Scholar 

  64. Snow, R., Jurafsky, D., Ng, A.: Learning syntactic patterns for automatic hypernym discovery. In: NIPS. Morgan Kaufmann, San Mateo (2005)

    Google Scholar 

  65. Strube, M., Ponzetto, S.P.: WikiRelate! computing semantic relatedness using wikipedia. In: Proceedings of the 21st National Conference on Artificial Intelligence (AAAI-06), Boston, pp. 1419–1424 (2006)

    Google Scholar 

  66. Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: WWW 2007: 16th International World Wide Web Conference. ACM, New York (2007)

    Google Scholar 

  67. Sun, A., Bhowmick, S.S.: Image tag clarity: in search of visual-representative tags for social images. In: WSM, Beijing, pp. 19–26 (2009)

    Google Scholar 

  68. Tingle, D., Kim, Y.E., Turnbull, D.: Exploring automatic music annotation with acoustically-objective tags. In: MIR, Philadelphia, pp. 55–62 (2010)

    Google Scholar 

  69. Turnbull, D., Liu, R., Barrington, L., Lanckriet, G.: A game-based approach for collecting semantic annotations of music. In: ISMIR, Vienna (2007)

    Google Scholar 

  70. Turnbull, D., Barrington, L., Torres, D., Lanckriet, G.: Semantic annotation and retrieval of music and sound effects. IEEE Trans. Audio Speech Lang. Process. 2(16), 467–476 (2008)

    Article  Google Scholar 

  71. Ulges, A., Schulze, C., Koch, M., Breuel, T.M.: Learning automatic concept detectors from online video. Comput. Vis. Image Underst. 114(4), 429–438 (2010)

    Article  Google Scholar 

  72. Ulges, A., Worring, M., Breuel, T.: Learning visual contexts for image annotation from flickr groups. IEEE Trans. Multimed. 13(2), 330–341 (2011)

    Article  Google Scholar 

  73. Varelas, G., Voutsakis, E., Raftopoulou, P.: Semantic similarity methods in wordNet and their application to information retrieval on the web. In: 7th ACM International Workshop on Web Information and Data Management, Bremen (2005)

    Google Scholar 

  74. von Ahn, L., Dabbish, L.: Labeling images with a computer game. In: CHI, Vienna, pp. 319–326 (2004)

    Google Scholar 

  75. Wang, M., Yang, K., Hua, X.S., Zhang, H.J.: Visual tag dictionary: interpreting tags with visual words. In: WSCM, pp. 1–8 (2009)

    Google Scholar 

  76. Wang, Z., Li, X., Xu, R.: Multi-keywords query expansion with OLCA based concept tree pruning. Comput. Sci. 37(4), 132 (2010)

    MathSciNet  Google Scholar 

  77. Wartena, C.: Using a divergence model for mediaeval tagging task. In: MediaEval 2010 Workshop, Pisa (2010)

    Google Scholar 

  78. Wen, N.J., Zhang, H.J.: Clustering user queries of a search engine. In: Proceedings of the 10th International World Wide Web Conference (WWW10), Hong Kong (2001)

    Google Scholar 

  79. Wen, J., Cui, H., Li, M.: A statistical query expansion model based on query logs. J. Softw. (2003)

    Google Scholar 

  80. Wu, X., Zhang, L., Yu, Y.: Exploring social annotations for the semantic web. In: Proceedings of WWW06, Edinburgh, pp. 417–426 (2006)

    Google Scholar 

  81. Wu, L., Yang, L., Hua, X.S., Yu, N.: Learning to tag. In: WWW, Madrid, pp. 361–370 (2009)

    Google Scholar 

  82. Xu, S., Bao, S., Fei, B., Su, Z., Yu, Y.: Exploring folksonomy for personalized search. In: Proceedings of ACM SIGIR, Singapore, pp. 155–162 (2008)

    Google Scholar 

  83. Yan, X., Huang, M., Zhang, S.: Query expansion of pseudo relevance feedback based on matrix-weighted association rules mining. Inst. Softw. Chin. Acad. Sci. 20, 1854–1865 (2009)

    Google Scholar 

  84. Zhang, J., Deng, B., Li, X.: Concept based query expansion using wordNet. In: AST ’09 Proceedings of the 2009 International e-Conference on Advanced Science and Technology, Daejeon, pp 52–55 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tomas Piatrik .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag London

About this chapter

Cite this chapter

Piatrik, T., Zhang, Q., Sevillano, X., Izquierdo, E. (2013). Predicting User Tags in Social Media Repositories Using Semantic Expansion and Visual Analysis. In: Ramzan, N., van Zwol, R., Lee, JS., Clüver, K., Hua, XS. (eds) Social Media Retrieval. Computer Communications and Networks. Springer, London. https://doi.org/10.1007/978-1-4471-4555-4_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-4555-4_7

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-4554-7

  • Online ISBN: 978-1-4471-4555-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics