Predicting User Tags in Social Media Repositories Using Semantic Expansion and Visual Analysis

Piatrik, Tomas; Zhang, Qianni; Sevillano, Xavier; Izquierdo, Ebroul

doi:10.1007/978-1-4471-4555-4_7

Tomas Piatrik⁶,
Qianni Zhang⁶,
Xavier Sevillano⁷ &
…
Ebroul Izquierdo⁶

Part of the book series: Computer Communications and Networks ((CCN))

3554 Accesses

Abstract

Manually annotating large scale content such as Internet videos is an expensive and consuming process. Furthermore, community-provided tags lack consistency and present numerous irregularities. This chapter aims to provide a forum for the state-of-the-art research in this emerging field, with particular focus on mechanisms capable of exploiting the full range of information available online to predict user tags automatically. The exploited information covers both semantic metadata including complementary information in external resources and embedded low-level features within the multimedia content. Furthermore, this chapter presents a framework for predicting general tags from the associated textual metadata and visual features. The goal of this framework is to simplify and improve the process of tagging online videos, which are unbounded to any particular domain. In this framework, the first step is to extract named entities exploiting complementary textual resources such as Wikipedia and WordNet. To facilitate the extraction of semantically meaningful tags from a largely unstructured textual corpus, this framework employs GATE natural language processing tools. Extending the functionalities of the built-in GATE named entities, the framework also integrates a bag-of-articles algorithm for effectively extracting relevant articles from the Wikipedia articles. Experiments were conducted for validation of the framework against MediaEval 2010 Wild Wild Web dataset for the tagging task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Estimation of Tags Using Various Data for Online Videos

Upgrading YouTube Video Search by Generating Tags Through Semantic Analysis of Contextual Data

Exploiting user reviews for automatic movie tagging

Article 06 January 2020

Notes

1.
http://www.flickr.com/
2.
www.wikipedia.org/
3.
www.youtube.com/
4.
http://www.facebook.com/
5.
secondlife.com/
6.
twitter.com/
7.
http://gate.ac.uk/
8.
http://www.opencalais.com/
9.
The first paragraph of a Wikipedia article contains usually the definition of the article subject, it can be therefore expected to contain more relevant words than the rest of the text.
10.
http://www.mediawiki.org/wiki/Extension:Lucene-search
11.
http://lucene.apache.org
12.
A is said to be related to B, if A links to B, and there is some C that links to both A and B (source: Lucene-Search Extension documentation).
13.
http://code.google.com/p/matrix-toolkits-java/

References

Akbas, E., Yarman Vural, F.T.: Automatic image annotation by ensemble of visual descriptors. In: CVPR, Minneapolis, pp. 1–8 (2007)
Google Scholar
Al-Khalifa, H.S., Davis, H.C.: Exploring the value of folksonomies for creating semantic metadata. IJSWIS 3(1), 13–39 (2007)
Google Scholar
Atomiq, G.S.: Folksonomy: social classification. http://atomiq.org/archives/2004/08/folksonomysocialclassification.html. Accessed August 2004
Bao, S., Xue, G., Wu, X., Yu, Y., Fei, B., Su, Z.: Optimizing web search using social annotations. In: Proceedings of WWW2007, pp. 501–510. ACM, New York (2007)
Google Scholar
Barnard, K., Duygulu, P., Forsyth, D., De Freitas, N., Blei, D.M., Jordan, M.I.: Matching words and pictures. J. Mach. Learn. Res. 3, 1107–1135 (2003)
MATH Google Scholar
Bast, H., Dupret, G., Majumdar, D., Piwowarski, B.: Discovering a term taxonomy from term similarities using principal component analysis. In: Semantic Web Mining. Springer, Berlin/New York (2006)
Google Scholar
Blohm, S., Cimiano, P.: Using the web to reduce data sparseness in pattern-based information extraction. In: PKDD. Lecture Notes in Computer Science, vol. 4702, pp. 18–29. Springer, Berlin/New York (2007)
Google Scholar
Brezeale, D., Cook, D.J.: Automatic video classification: a survey of the literature. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 38(3), 416–430 (2008)
Article Google Scholar
Budanitsky, A., Hirst, G.: Evaluating wordnet-based measures of lexical semantic relatedness. Comput. Linguist. 32(1), 13–47 (2006)
Article MATH Google Scholar
Carneiro, G., Chan, A.B., Moreno, P.J., Vasconcelos, N.: Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 29(3), 394–410 (2007)
Article Google Scholar
Chandramouli, K., Kliegr, T., Svatek, V., Izquierdo, E.: Towards semantic tagging in collaborative environments. In: 16th International Conference on Digital Signal Processing 2009, pp. 1–6. IEEE, Piscataway (2009)
Google Scholar
Chang, E., Goh, K., Sychay, G., Wu, G.: Cbsa: content-based soft annotation for multimodal image retrieval using bayes point machines. IEEE Trans. Circuits Syst. Video Technol. 13(1), 26–38 (2003)
Article Google Scholar
Cimiano, P., Voelker, J.: Text2onto – a framework for ontology learning and data-driven change discovery. In: NLDB 2005, Alicante (2005)
Google Scholar
Cucerzan, S.: Large-scale named entity disambiguation based on wikipedia data. In: Proceedings of Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, pp. 708–716 (2007)
Google Scholar
Cui, H., Wen, J.R., Nie, J.Y., Ma, W.Y.: Query expansion by mining user logs. IEEE Trans. Knowl. Data Eng. 15(4), 829–839 (2003)
Article Google Scholar
Datta, R., Joshi, D., Li, J., Wang, J.Z.: Image retrieval: ideas, influences, and trends of the new age. ACM Comput. Surv. (CSUR) 40(2), 5 (2008)
Google Scholar
Deerwester, D.S., Fumas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. ACM Trans. Inf. Syst. (2000)
Google Scholar
Ding, G., Bai, S., Wang, B.: Local co-occurrence based query expansion for information retrieval. J. Chin. Inf. Process. 20, 84–91 (2006)
Google Scholar
Duygulu, P., Barnard, K., de Freitas, J., Forsyth, D.: Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In: ECCV 2002, Copenhagen, pp. 349–354 (2002)
Google Scholar
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT, Cambridge/London/England (1998)
MATH Google Scholar
Frank, E., Paynter, G.W., Witten, I.H., Gutwin, C., Nevill-Manning, C.G.: Domain-specific keyphrase extraction. In: Proceeding of 16th International Joint Conference on Artificial Intelligence, Stockholm, pp. 668–673 (1999)
Google Scholar
Gabrilovich, E., Markovich, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI 07), Hyderabad (2007)
Google Scholar
Gao, Y., Fan, J., Xue, X., Jain, R.: Automatic image annotation by incorporating feature hierarchy and boosting to scale up svm classifiers. In: Proceedings of the 14th Annual ACM International Conference on Multimedia, pp. 901–910. ACM, New York (2006)
Google Scholar
Gong, Z., Cheang, C.W., Hou, U.L.: Web query expansion by wordnet. In: DEXA 2005, Copenhagen. LNCS, vol. 3588, pp. 166–175 (2002)
Google Scholar
Grootjen, T.P.: Conceptual query expansion. Data Knowl. Eng. 56, 174–193 (2005)
Article Google Scholar
Guillaumin, M., Mensink, T., Verbeek, J.: TagProp: discriminative metric learning in nearest neighbor models for image auto-annotation. In: ICCV, Kyoto, pp. 309–316 (2009)
Google Scholar
Hearst, M.: Automatic acquisition of hyponyms from large text corpora. In: Fourteenth International Conference on Computational Linguistics, Nantes, pp. 539–545 (1992)
Google Scholar
Hernández-Aranda, D., Granados, R., Cigarran, J., Rodrigo, A., Fresno, V., Garcıa-Serrano, A.: UNED at mediaeval 2010: exploiting text metadata for automatic video tagging. In: MediaEval 2010 Workshop, Pisa (2010)
Google Scholar
Heymann, P., Ramage, D., Garcia-Molina, H.: Social tag prediction. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 531–538. ACM, New York (2008)
Google Scholar
Hoeber, O., Yang, X.-D., Yao, Y.: Conceptual query expansion. In: Proceedings of the Atlantic Web Intelligence Conference, Lodz (2005)
Google Scholar
Hotho, A., Jaschke, R., Schmitz, C., Stumme, G.: Information retrieval in folksonomies: search and ranking. In: Proceedings of ESWC 2006, Budva, pp. 411–426 (2006)
Google Scholar
http://mpeg.chiariglione.org/standards/mpeg-7/mpeg-7.htm
Kliegr, T.: Entity classification by bag of wikipedia articles. In: Proceedings of the 3rd Workshop on Ph.D. Students in Information and Knowledge Management, pp. 67–74. ACM, New York (2010)
Google Scholar
Kliegr, T., Chandramouli, K., Nemrava, J., Svátek, V., Izquierdo, E.: Combining captions and visual analysis for image concept classification. In: MDM/KDD’08: Proceedings of the 9th International Workshop on Multimedia Data Mining. ACM, New York (2008)
Google Scholar
Larson, M., Soleymani, M., Serdyukov, P., Murdock, V., Jones, G. (eds.): In: Working Notes Proceedings of the MediaEval 2010 Workshop, Pisa (2010)
Google Scholar
Li, D., Cai, D.: A study of query extension based on query log analysis. In: Proceedings of the Fourth National Student Conference on Computational Linguistics (SWCL-2008) (2008)
Google Scholar
Li, Q., Lu, S.C.Y.: Collaborative tagging applications and approaches. IEEE Multimed. 15(3), pp. 14–21 (2008)
Article Google Scholar
Li, J., Wang, J.Z.: Real-time computerized annotation of pictures. In: MM, Santa Barbara, pp. 911–920 (2006)
Google Scholar
Li, X., Snoek, C.G.M., Worring, M.: Learning tag relevance by neighbor voting for social image retrieval. In: MIR, Vancouver, pp. 180–187 (2008)
Google Scholar
Li, X., Snoek, C.G.M., Worring, M.: Annotating images by harnessing worldwide user-tagged photos. In: ICASSP, Taipei, pp. 3717–3720 (2009)
Google Scholar
Lindstaedt, S., Mörzinger, R., Sorschag, R., Pammer, V., Thallinger, G.: Automatic image annotation using visual content and folksonomies. Multimed. Tools Appl. 42(1), 97–113 (2009)
Article Google Scholar
Liu, X., Bruce Croft, W.: Cluster-based retrieval using language models. In: The 2004 ACM 1-58113-881-4/04/0007, 25–29 July 2004
Google Scholar
Liu, S., Liu, F., Yu, C., Meng, W.: An effective approach to document retrieval via utilizing wordNet and recognizing phrases. In: Proceedings of the 27th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, Sheffield (2004)
Google Scholar
Liu, J., Wang, B., Li, M., Li, Z., Ma, W.Y., Lu, H., Ma, S.: Dual cross-media relevance model for image annotation. In: MM, Augsburg, pp. 605–614 (2007)
Google Scholar
Mandel, M., Ellis, D.: A web-based game for collecting music metadata. In: ISMIR, Vienna (2007)
Google Scholar
Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT, Cambridge (1999)
MATH Google Scholar
Marlow, C., Naaman, M., Boyd, D., Davis, M.: Position paper, tagging, taxonomy, flickr, article, toRead. In: Proceedings of the 17th Conference on Hypertext and Hypermedia, Odense, pp. 31–40. ACM, New York (2006)
Google Scholar
Milne, D., Witten, I.H.: An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In: Advancement of Artificial Intelligence (2008)
Google Scholar
Mittal, N., Nayak, R., Govil, M.C., Jain, K.C.: Dynamic query expansion for efficient information retrieval. In: The Proceedings of International Conference on Web Information Systems and Mining, Sanya (2010)
Google Scholar
Moehrmann, J., Bernstein, S., Schlegel, T., Werner, G., Heidemann, G.: Improving the usability of hierarchical representations for interactively labeling large image data sets. In: Jacko, J. (ed.) Human-Computer Interaction, Design and Development Approaches. Lecture Notes in Computer Science, vol. 6761, pp. 618–627. Springer, Berlin/New York (2011)
Chapter Google Scholar
Monay, F., Gatica-Perez, D.: On image auto-annotation with latent space models. In: MM, Berkeley, pp. 275–278 (2003)
Google Scholar
Nemeth, Y., Shapira, B., Taeib-Maimon, M.: Evaluation of the real and perceived value of automatic and interactive query expansion. In: SIGIR ’04, Sheffield, pp. 526–527 (2006)
Google Scholar
Nemrava, J.: Refining search queries using wordnet glosses. In: EKAW 2006, Podebrady, pp. 2–6 (2006)
Google Scholar
Paltoglou, G.: A study of information retrieval weighting schemes for sentiment analysis. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, vol. 11–16, pp. 1386–1395 (2010)
Google Scholar
Qiu, Y., Frei, H.-P.: Concept based query expansion. In: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 160–169. ACM, Pittsburgh (1993)
Google Scholar
Rendle, S., Schmidt-Thieme, L.: Pairwise interaction tensor factorization for personalized tag recommendation. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, pp. 81–90. ACM, New York (2010)
Google Scholar
Richardson, R., Smeaton, A.F.: Using wordNet in a knowledge-based approach to information retrieval. In: Proceedings of the BCS-IRSG Colloquium, Crewe (1995)
Google Scholar
San Pedro, J., Siersdorfer, S., Sanderson, M.: Content redundancy in YouTube and its application to video Tagging. ACM Trans. Inf. Syst. 29(3), 13:1–13:31 (2011)
Google Scholar
Seneviratne, L., Izquierdo, E.: An interactive framework for image annotation through gaming. In: MIR, Philadelphia, pp. 517–526 (2010)
Google Scholar
Shapira, B., Taieb-Maimon, M., Nemeth, Y.: Subjective and objective evaluation of interactive and automatic query expansion. In: Online Information Review, pp. 374–390. Emerald, Bradford (2005)
Google Scholar
Siersdorfer, S., San Pedro, J., Sanderson, M.: Automatic video tagging using content redundancy. In: SIGIR 2009, Boston, pp. 395–402 (2009)
Google Scholar
Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., Jain, R.: Content based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Mach. Intell. 22(12), 1349–1380 (2000)
Article Google Scholar
Snoek, C.G.M., Worring, M.: Concept-based video retrieval. Found. Trends Inf. Retr. 2(4), 215–322 (2008)
Article Google Scholar
Snow, R., Jurafsky, D., Ng, A.: Learning syntactic patterns for automatic hypernym discovery. In: NIPS. Morgan Kaufmann, San Mateo (2005)
Google Scholar
Strube, M., Ponzetto, S.P.: WikiRelate! computing semantic relatedness using wikipedia. In: Proceedings of the 21st National Conference on Artificial Intelligence (AAAI-06), Boston, pp. 1419–1424 (2006)
Google Scholar
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: WWW 2007: 16th International World Wide Web Conference. ACM, New York (2007)
Google Scholar
Sun, A., Bhowmick, S.S.: Image tag clarity: in search of visual-representative tags for social images. In: WSM, Beijing, pp. 19–26 (2009)
Google Scholar
Tingle, D., Kim, Y.E., Turnbull, D.: Exploring automatic music annotation with acoustically-objective tags. In: MIR, Philadelphia, pp. 55–62 (2010)
Google Scholar
Turnbull, D., Liu, R., Barrington, L., Lanckriet, G.: A game-based approach for collecting semantic annotations of music. In: ISMIR, Vienna (2007)
Google Scholar
Turnbull, D., Barrington, L., Torres, D., Lanckriet, G.: Semantic annotation and retrieval of music and sound effects. IEEE Trans. Audio Speech Lang. Process. 2(16), 467–476 (2008)
Article Google Scholar
Ulges, A., Schulze, C., Koch, M., Breuel, T.M.: Learning automatic concept detectors from online video. Comput. Vis. Image Underst. 114(4), 429–438 (2010)
Article Google Scholar
Ulges, A., Worring, M., Breuel, T.: Learning visual contexts for image annotation from flickr groups. IEEE Trans. Multimed. 13(2), 330–341 (2011)
Article Google Scholar
Varelas, G., Voutsakis, E., Raftopoulou, P.: Semantic similarity methods in wordNet and their application to information retrieval on the web. In: 7th ACM International Workshop on Web Information and Data Management, Bremen (2005)
Google Scholar
von Ahn, L., Dabbish, L.: Labeling images with a computer game. In: CHI, Vienna, pp. 319–326 (2004)
Google Scholar
Wang, M., Yang, K., Hua, X.S., Zhang, H.J.: Visual tag dictionary: interpreting tags with visual words. In: WSCM, pp. 1–8 (2009)
Google Scholar
Wang, Z., Li, X., Xu, R.: Multi-keywords query expansion with OLCA based concept tree pruning. Comput. Sci. 37(4), 132 (2010)
MathSciNet Google Scholar
Wartena, C.: Using a divergence model for mediaeval tagging task. In: MediaEval 2010 Workshop, Pisa (2010)
Google Scholar
Wen, N.J., Zhang, H.J.: Clustering user queries of a search engine. In: Proceedings of the 10th International World Wide Web Conference (WWW10), Hong Kong (2001)
Google Scholar
Wen, J., Cui, H., Li, M.: A statistical query expansion model based on query logs. J. Softw. (2003)
Google Scholar
Wu, X., Zhang, L., Yu, Y.: Exploring social annotations for the semantic web. In: Proceedings of WWW06, Edinburgh, pp. 417–426 (2006)
Google Scholar
Wu, L., Yang, L., Hua, X.S., Yu, N.: Learning to tag. In: WWW, Madrid, pp. 361–370 (2009)
Google Scholar
Xu, S., Bao, S., Fei, B., Su, Z., Yu, Y.: Exploring folksonomy for personalized search. In: Proceedings of ACM SIGIR, Singapore, pp. 155–162 (2008)
Google Scholar
Yan, X., Huang, M., Zhang, S.: Query expansion of pseudo relevance feedback based on matrix-weighted association rules mining. Inst. Softw. Chin. Acad. Sci. 20, 1854–1865 (2009)
Google Scholar
Zhang, J., Deng, B., Li, X.: Concept based query expansion using wordNet. In: AST ’09 Proceedings of the 2009 International e-Conference on Advanced Science and Technology, Daejeon, pp 52–55 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

School of EE and CS, Queen Mary University London, Mile End Road, E1 4NS, London, UK
Tomas Piatrik, Qianni Zhang & Ebroul Izquierdo
La Salle - Universitat Ramon Lull, Barcelona, Spain
Xavier Sevillano

Authors

Tomas Piatrik
View author publications
You can also search for this author in PubMed Google Scholar
Qianni Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xavier Sevillano
View author publications
You can also search for this author in PubMed Google Scholar
Ebroul Izquierdo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tomas Piatrik .

Editor information

Editors and Affiliations

Department of Electronic Engineering, Queen Mary University of London, Mile End Road, London, E1 4NS, United Kingdom
Naeem Ramzan
Director of Product Innovation, Search, Netflix, Winchester Circle 100, Los Gatos, 95032, California, USA
Roelof van Zwol
School of Integrated Technology, Yonsei University, 162-1 Songdo-dong, Yeonsu-gu, Incheon, 406-840, Korea, Republic of (South Korea)
Jong-Seok Lee
Institut für Telekommunikationssysteme, Technische Universität Berlin, Einsteinufer 17, Berlin, 10587, Germany
Kai Clüver
Media Computing Group, Microsoft Research, 555 108th Ave NE, Bellevue, 98004, Washington, USA
Xian-Sheng Hua

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Piatrik, T., Zhang, Q., Sevillano, X., Izquierdo, E. (2013). Predicting User Tags in Social Media Repositories Using Semantic Expansion and Visual Analysis. In: Ramzan, N., van Zwol, R., Lee, JS., Clüver, K., Hua, XS. (eds) Social Media Retrieval. Computer Communications and Networks. Springer, London. https://doi.org/10.1007/978-1-4471-4555-4_7

Download citation

DOI: https://doi.org/10.1007/978-1-4471-4555-4_7
Published: 13 October 2012
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4554-7
Online ISBN: 978-1-4471-4555-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics