Skip to main content
Log in

Extracting semantic knowledge from web context for multimedia IR: a taxonomy, survey and challenges

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Since its invention, the Web has evolved into the largest multimedia repository that has ever existed. This evolution is a direct result of the explosion of user-generated content, explained by the wide adoption of social network platforms. The vast amount of multimedia content requires effective management and retrieval techniques. Nevertheless, Web multimedia retrieval is a complex task because users commonly express their information needs in semantic terms, but expect multimedia content in return. This dissociation between semantics and content of multimedia is known as the semantic gap. To solve this, researchers are looking beyond content-based or text-based approaches, integrating novel data sources. New data sources can consist of any type of data extracted from the context of multimedia documents, defined as the data that is not part of the raw content of a multimedia file. The Web is an extraordinary source of context data, which can be found in explicit or implicit relation to multimedia objects, such as surrounding text, tags, hyperlinks, and even in relevance-feedback. Recent advances in Web multimedia retrieval have shown that context data has great potential to bridge the semantic gap. In this article, we present the first comprehensive survey of context-based approaches for multimedia information retrieval on the Web. We introduce a data-driven taxonomy, which we then use in our literature review of the most emblematic and important approaches that use context-based data. In addition, we identify important challenges and opportunities, which had not been previously addressed in this area.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. http://www.facebook.com

  2. http://instagram.com

  3. http://www.youtube.com

  4. http://flickr.com

  5. It is clear that these search engines relied heavily on context data in their beginnings, although their technology is proprietary and it is very likely that they are currently using content data too.

  6. http://images.google.com

  7. https://www.flickr.com/search/

  8. http://images.yahoo.com/

  9. We refer to tags, as text associated to a multimedia resource, and annotations as the text associated to only part of the multimedia object (i.e., sub-area of an image, fragment of an audio).

  10. It refers to a discrete color distribution manually specified by the user.

  11. http://trec.nist.gov

  12. http://www.clef-initiative.eu

  13. http://imageclef.org/2016

  14. http://www.multimediaeval.org

  15. http://traces.cs.umass.edu/index.php/Mmsys/Mmsys

  16. http://www.aimatshape.net/event/SHREC

  17. https://research.google.com/research-outreach.html#/research-outreach/research-datasets

  18. http://research.microsoft.com/en-US/projects/data-science-initiative/datasets.aspx

  19. https://webscope.sandbox.yahoo.com/#datasets

References

  1. Blanken HM, de Vries AP, Blok HE, Feng L (eds) (2007) Multimedia Retrieval. Springer, Berlin

  2. Blei DM, Jordan MI (2003) Modeling annotated data. ACM, New York

    Book  Google Scholar 

  3. Bota H, Zhou K, Jose JM, Lalmas M (2014) Composite retrieval of heterogeneous web search. ACM, New York

    Book  Google Scholar 

  4. Brin S, Page L (2012) Reprint of: the anatomy of a large-scale hypertextual web search engine. Comput Netw 56(18):3825–3833. doi:10.1016/j.comnet.2012.10.007

    Article  Google Scholar 

  5. Cascia ML, Sethi S, Sclaroff S (1998) Combining textual and visual cues for content-based image retrieval on the World Wide Web. In: Proceedings of the IEEE workshop on content-based access of image and video libraries, CBAIVL ’98. IEEE, Washington, p 24

    Chapter  Google Scholar 

  6. Chen DL, Dolan WB (2011) Collecting highly parallel data for paraphrase evaluation Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, HLT ’11, vol 1. Association for Computational Linguistics, Stroudsburg, pp 190–200

  7. Chen Z, Wenyin L, Zhang F, Li M, Zhang H (2001) Web mining for web image retrieval. J Am Soc Inf Sci Tec 52(10):831–839

    Article  Google Scholar 

  8. Chen Y, Yu N, Luo B, Chen X (2010) iLike: integrating visual and textual features for vertical search Proceedings of the 18th international conference on multimedia, MM ’10. ACM, New York, pp 221–230

    Chapter  Google Scholar 

  9. Chen C, Zhu Q, Lin L, Shyu ML (2013) Web media semantic concept retrieval via tag removal and model fusion. ACM Trans Intell Syst Technol 4:61:1–61:22

    Google Scholar 

  10. Choi J, Thomee B, Friedland G, Cao L, Ni K, Borth D, Elizalde B, Gottlieb L, Carrano C, Pearce R, Poland D (2014) The placing task: a large-scale geo-estimation challenge for social-media videos and images Proceedings of the 3rd ACM multimedia workshop on geotagging and its applications in multimedia, geoMM ’14. ACM, New York, pp 27–31. doi:10.1145/2661118.2661125

    Google Scholar 

  11. Craswell N, Szummer M (2007) Random walks on the click graph Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’07. ACM, New York, pp 239–246

    Chapter  Google Scholar 

  12. Datta R, Joshi D, Li J, Wang J (2008) Image retrieval: ideas, influences, and trends of the new age. ACM Comput Surv 40(2):1–60

    Article  Google Scholar 

  13. Duda R, Hart P, Stork D (2001) Pattern classification. 2nd edn. Wiley

  14. Dupplaw DP, Matthews M, Johansson R, Boato G, Costanzo A, Fontani M, Minack E, Demidova E, Blanco R, Griffiths T, Lewis P, Hare J, Moschitti A (2014) Information extraction from multimedia web documents: an open-source platform and testbed. Int J Multimed Inf Retr 3(2):97–111. doi:10.1007/s13735-014-0051-2

    Article  Google Scholar 

  15. Egenhofer MJ (1997) Query processing in spatial-query-by-sketch. J Vis Lang Comput 8(4):403–424. doi:10.1006/jvlc.1997.0054

    Article  Google Scholar 

  16. Eickhoff C, Li W, de Vries A (2013) Exploiting user comments for audio-visual content indexing and retrieval Proceedings of the 35th european conference on advances in information retrieval, ECIR’13. Springer, Berlin, pp 38–49

  17. Feng W, Wang J (2012) Incorporating heterogeneous information for personalized tag recommendation in social tagging systems Proceedings of the 18th international conference on knowledge discovery and data mining, KDD ’12. ACM, New York, pp 1276–1284

  18. Fu Z, Lu G, Ting KM, Zhang D (2011) A survey of audio-based music classification and annotation. IEEE Trans Multimedia 13(2):303–319. doi:10.1109/TMM.2010.2098858

    Article  Google Scholar 

  19. Gao B, Liu TY, Qin T, Zheng X, Cheng QS, Ma WY (2005) Web image clustering by consistent utilization of visual features and surrounding texts Proceedings 13th annual ACM international conference on multimedia, MM ’05. ACM, New York, pp 112–121

  20. Gao Y, Wang M, Zha ZJ, Shen J, Li X, Wu X (2013) Visual-textual joint relevance learning for tag-based social image search. IEEE Trans Image Process 22(1):363–376. doi:10.1109/TIP.2012.2202676

    Article  MathSciNet  MATH  Google Scholar 

  21. Ghias A, Logan J, Chamberlin D, Smith BC (1995) Query by humming: musical information retrieval in an audio database Proceedings of the 3rd international conference on multimedia, MULTIMEDIA ’95. ACM, New York, pp 231–236. doi:10.1145/217279.215273

    Chapter  Google Scholar 

  22. Gilbert A, Piras L, Wang J, Yan F, Dellandrea E, Gaizauskas R, Villegas M, Mikolajczyk K (2015) Overview of the imageclef 2015 scalable image annotation, localization and sentence generation task CLEF (Online working notes/labs/workshop)

    Google Scholar 

  23. Gui C, Liu J, Xu C, Lu H (2009) Web image retrieval via learning semantics of query image Proceedings of the IEEE international conference on multimedia and expo, ICME ’09. IEEE, pp 1476–1479

  24. Hanjalic A, Kofler C, Larson M (2012) Intent and its discontents: The user at the wheel of the online video search engine Proceedings of the 20th ACM international conference on multimedia, MM ’12. doi:10.1145/2393347.2396424. ACM, New York, pp 1239–1248

    Chapter  Google Scholar 

  25. Haslhofer B, Sanderson R, Simon R, van de Sompel H (2014) Open annotations on multimedia web resources. Multimed Tool Appl 70(2):847–867. doi:10.1007/s11042-012-1098-9

  26. Hauff C, Houben GJ (2012) Placing images on the world map: a microblog-based enrichment approach Proceedings of the 35th international conference on research and development in information retrieval, SIGIR ’12. ACM, New York, pp 691–700

    Google Scholar 

  27. He R, Jin H, Tao W, Sun A (2006) Unifying keywords and visual features within one-step search for web image retrieval Advances in multimedia information processing, PCM ’06. Springer, pp 527– 536

  28. He X, Kan MY, Xie P, Chen X (2014) Comment-based multi-view clustering of web 2.0 items Proceedings of the 23rd international conference on World Wide Web, WWW ’14. ACM, New York, pp 771–782

    Google Scholar 

  29. Hu W, Xie N, Li L, Zeng X, Maybank S (2011) A survey on visual content-based video indexing and retrieval. IEEE Trans Syst Man Cybern Part C Appl Rev 41(6):797–819

    Article  Google Scholar 

  30. Ionescu B, Popescu A, Lupu M, Gınsca AL, Müller H (2014) Retrieving diverse social images at mediaeval 2014: challenge, dataset and evaluation Mediaeval 2014 workshop

    Google Scholar 

  31. Ionescu B, Popescu A, Radu AL, Müller H (2016) Result diversification in social image retrieval: a benchmarking framework. Multimed Tool Appl 75(2):1301–1331. doi:10.1007/s11042-014-2369-4

    Article  Google Scholar 

  32. Jain V, Varma M (2011) Learning to re-rank: query-dependent image re-ranking using click data Proceedings of the 20th international conference on World Wide Web, WWW ’11. ACM, New York, pp 277–286

    Google Scholar 

  33. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: Convolutional architecture for fast feature embedding Proceedings of the 22nd ACM international conference on multimedia, MM ’14. ACM, New York, pp 675–678. doi:10.1145/2647868.2654889

    Google Scholar 

  34. Jiang L, Yu SI, Meng D, Mitamura T, Hauptmann AG (2015) Bridging the ultimate semantic gap: a semantic search engine for internet videos Proceedings of the 5th ACM on international conference on multimedia retrieval, ICMR ’15. ACM, New York, pp 27–34. doi:10.1145/2671188.2749399

    Chapter  Google Scholar 

  35. Kamath KY, Caverlee J (2012) Content-based crowd retrieval on the real-time web Proceedings of the 21st international conference on information and knowledge management, CIKM ’12. ACM, New York, pp 195–204

    Google Scholar 

  36. Kaminskas M, Ricci F, Schedl M (2013) Location-aware music recommendation using auto-tagging and hybrid matching Proceedings of the 7th ACM conference on recommender systems, recsys ’13. doi:10.1145/2507157.2507180. ACM, New York, pp 17–24

    Chapter  Google Scholar 

  37. Kannan A, Baker S, Ramnath K, Fiss J, Lin D, Vanderwende L, Ansary R, Kapoor A, Ke Q, Uyttendaele M, Wang XJ, Zhang L (2014) Mining text snippets for images on the web Proceedings of the 20th international conference on knowledge discovery and data mining, KDD ’14. ACM, New York, pp 1534–1543

    Google Scholar 

  38. Kherfi ML, Ziou D, Bernardi A (2004) Image retrieval from the World Wide Web: issues, techniques, and systems. ACM Comput Surv 36(1):35–67. doi:10.1145/1013208.1013210

    Article  Google Scholar 

  39. Kim YA, Ahmad MA (2013) Trust, distrust and lack of confidence of users in online social media-sharing communities. Knowl-Based Syst 37:438–450. doi:10.1016/j.knosys.2012.09.002

    Article  Google Scholar 

  40. Knees P, Schedl M (2013) A survey of music similarity and recommendation from music context data. ACM Trans Multimedia Comput Commun Appl 10(1):2:1–2:21. doi:10.1145/2542205.2542206

    Article  Google Scholar 

  41. Kofler C, Larson M, Hanjalic A (2016) User intent in multimedia search: A survey of the state of the art and future challenges. ACM Comput Surv 49(2):36:1–36:37. doi:10.1145/2954930

    Article  Google Scholar 

  42. van Leuken RH, Garcia L, Olivares X, van Zwol R (2009) Visual diversification of image search results Proceedings of the 18th international conference on World Wide Web, WWW ’09. ACM, New York, pp 341–350

  43. Leung CHC, Chan AWS, Milani A, Liu J, Li Y (2012) Intelligent social media indexing and sharing using an adaptive indexing search engine. ACM Trans Intell Syst Technol 3(3):47:1–47:27

    Article  Google Scholar 

  44. Lew MS, Seve N, Djeraba C, Jain R (2006) Content-based multimedia information retrieval: State of the art and challenges. ACM Comput Surv 2(1):1–19

    Google Scholar 

  45. Li X, Snoek CGM, Worring M, Smeulders AWM (2012) Harvesting social images for bi-concept search. IEEE Trans Multimedia 14(4):1091–1104

    Article  Google Scholar 

  46. Li X, Uricchio T, Ballan L, Bertini M, Snoek CGM, Bimbo AD (2016) Socializing the semantic gap: A comparative survey on image tag assignment, refinement, and retrieval. ACM Comput Surv 49(1):14:1–14:39. doi:10.1145/2906152

    Article  Google Scholar 

  47. Liu X, Hue B (2013) Heterogeneous features and model selection for event-based media classification Proceedings of the 3rd ACM conference on international conference on multimedia retrieval, ICMR ’13. ACM, New York, pp 151–158

    Chapter  Google Scholar 

  48. Low Y, Agarwal D, Smola AJ (2011) Multiple domain user personalization Proceedings of the 17th international conference on knowledge discovery and data mining, KDD ’11. ACM, New York, pp 123–131

    Google Scholar 

  49. Mallik A, Ghosh H, Chaudhury S, Harit G (2013) Mowl: An ontology representation language for web-based multimedia applications. ACM Trans Multimedia Comput Commun Appl 10(1):8:1–8:21. doi:10.1145/2542205.2542210

    Article  Google Scholar 

  50. Mei T, Rui Y, Li S, Tian Q (2014) Multimedia search reranking: A literature survey. ACM Comput Surv 46(3):38:1–38:38. doi:10.1145/2536798

    Article  Google Scholar 

  51. Morrison D, Tsikrika T, Hollink V, Vries AP, Bruno É, Marchand-Maillet S (2013) Topic modelling of clickthrough data in image search. Multimed Tool Appl 66(3):493–515. doi:10.1007/s11042-012-1038-8

    Article  Google Scholar 

  52. Naaman M (2012) Social multimedia: highlighting opportunities for search and mining of multimedia data in social media applications. Multimed Tool Appl 56(1):9–34. doi:10.1007/s11042-010-0538-7

    Article  Google Scholar 

  53. Nie L, Yan S, Wang M, Hong R, Chua TS (2012) Harvesting visual concepts for image search with complex queries Proceedings of the 20th ACM international conference on multimedia, MM ’12. doi:10.1145/2393347.2393363. ACM, New York, pp 59–68

    Chapter  Google Scholar 

  54. Perelman D, Bortnikov E, Lempel R, Sandler R (2012) Lightweight automatic face annotation in media pages Proceedings of the 21st international conference on World Wide Web, WWW ’12. ACM, New York, pp 939–948

    Chapter  Google Scholar 

  55. Petkos G, Papadopoulos S, Mezaris V, Kompatsiaris Y (2014) Social event detection at mediaeval 2014: challenges, datasets, and evaluation Mediaeval 2014 workshop

    Google Scholar 

  56. Poblete B, Bustos B, Mendoza M, Barrios JM (2010) Visual-semantic graphs: using queries to reduce the semantic gap in web image retrieval Proceedings 19th ACM international conference on information and knowledge management (CIKM’10). ACM, New York, pp 1553–1556. doi:10.1145/1871437.1871670

    Google Scholar 

  57. Popescu A, Grefenstette G (2011) Social media driven image retrieval Proceedings of the 1st ACM international conference on multimedia retrieval, ICMR ’11. ACM, New York, pp 33:1–33:8

    Google Scholar 

  58. Popescu A, Spyromitros-Xioufis E, Papadopoulos S, Le Borgne H, Kompatsiaris I (2015) Toward an automatic evaluation of retrieval performance with large scale image collections Proceedings of the 2015 workshop on community-organized multimodal mining: Opportunities for novel solutions, MMCommons ’15. ACM, New York, pp 7–12. doi:10.1145/2814815.2814819

    Chapter  Google Scholar 

  59. Schedl M, Orio N, Liem CCS, Peeters G (2013) A professionally annotated and enriched multimodal data set on popular music Proceedings of the 4th multimedia systems conference, MMSys ’13. doi:10.1145/2483977.2483985. ACM, New York, pp 78–83

    Chapter  Google Scholar 

  60. Schmiedeke S, Xu P, Ferrané I, Eskevich M, Kofler C, Larson MA, Estève Y, Lamel L, Jones GJF, Sikora T (2013) Blip10000: a social video dataset containing spug content for tagging and retrieval Proceedings of the 4th ACM multimedia systems conference, MMSys ’13. ACM, New York, pp 96–101. doi:10.1145/2483977.2483988

    Chapter  Google Scholar 

  61. Shen HT, Ooi BC, Tan KL (2000) Giving meanings to WWW images Proceedings of the 8th international conference on multimedia, MM ’00. ACM, New York, pp 39–47

    Google Scholar 

  62. Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380

    Article  Google Scholar 

  63. Song Y, Vallmitjana J, Stent A, Jaimes A (2015) Tvsum: summarizing web videos using titles IEEE Conference on computer vision and pattern recognition, CVPR ’15. IEEE, pp 5179–5187. doi:10.1109/CVPR.2015.7299154

  64. Tan HK, Ngo CW (2011) Fusing heterogeneous modalities for video and image re-ranking Proceedings of the 1st international conference on multimedia retrieval, ICMR ’11. ACM, New York, pp 15:1–15:8

    Google Scholar 

  65. Tan S, Ngo CW, Tan HK, Pang L (2011) Cross media hyperlinking for search topic browsing Proceedings of the 19th international conference on multimedia, MM ’11. ACM, New York, pp 243– 252

    Chapter  Google Scholar 

  66. Tsikrika T, Diou C, de Vries A, Delopoulos A (2011) Reliability and effectiveness of clickthrough data for automatic image annotation. Multimed Tool Appl 55(1):27–52. doi:10.1007/s11042-010-0584-1

  67. Typke R, Wiering F, Veltkamp RC (2005) A survey of music information retrieval systems Proceedings of the 6th international conference on music information retrieval, ISMIR 2005, pp 153– 160

    Google Scholar 

  68. Villegas M, Paredes R (2012) Overview of the imageclef 2012 scalable web image annotation task CLEF (Online working notes/labs/workshop)

    Google Scholar 

  69. Wang J, Hua XS (2011) Interactive image search by color map. ACM Trans Intell Syst Technol 3(1):12:1–12:23

    Article  Google Scholar 

  70. Wang XJ, Ma WY, Li X (2004) Data-driven approach for bridging the cognitive gap in image retrieval IEEE International conference on multimedia and expo, ICME ’04, vol 3, pp 2231–2234

  71. Wang D, Hoi S, Wu P, Zhu J, He Y, Miao C (2013) Learning to name faces: a multimodal learning scheme for search-based face annotation Proceedings of the 36th international conference on research and development in information retrieval, SIGIR ’13. ACM, New York, pp 443–452

    Google Scholar 

  72. Westerveld T (2000) Image retrieval: Content versus context. In: content-based multimedia information access, RIAO ’00, pp 276–284

    Google Scholar 

  73. White RW, Roth RA (2009) Exploratory search: beyond the query-response paradigm, vol 1. Morgan & Claypool Publishers, San Rafael

  74. Wu L, Hoi S, Yu N (2009) Semantics-preserving bag-of-words models for efficient image annotation Proceedings 1st ACM workshop on large-scale multimedia retrieval and mining, LS-MMRM ’09. ACM, New York, pp 19–26

    Google Scholar 

  75. Xu S, Jiang H, Lau FCM (2011) Retrieving and ranking unannotated images through collaboratively mining online search results Proceedings of the 20th international conference on information and knowledge management, CIKM ’11. ACM, New York, pp 485–494

    Google Scholar 

  76. Yang CC, Chan KY (2005) Retrieving multimedia web objects based on pagerank algorithm Special interest tracks and posters of the 14th international conference on World Wide Web, WWW ’05. ACM, New York, pp 906–907

    Chapter  Google Scholar 

  77. Yatskar M, Vanderwende L, Zettlemoyer L (2014) See no evil, say no evil: description generation from densely labeled images. Lexical Comput Semant (*SEM 2014):110

  78. Yu J, Tao D, Wang M, Rui Y (2015) Learning to rank using user clicks and visual features for image retrieval. IEEE Trans Cybern 4(45):767–779

    Article  Google Scholar 

  79. Zhao R, Grosky WI (2002) Narrowing the semantic gap—improved text-based web document retrieval using visual features. IEEE Trans Multimed 4(2):189–200

    Article  Google Scholar 

Download references

Acknowledgements

This work was partially supported by the Millennium Nucleus Center for Semantic Web Research, Grant No. NC120004. In addition, B. Poblete was also partially supported by Project Enlace-Fondecyt ENL011/16 and Project Fondef ID16—10222. T. Bracamonte was also supported by PhD Scholarship Program of Conicyt, Chile (CONICYT-PCHA/Doctorado Nacional/2013-63130260).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Teresa Bracamonte.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bracamonte, T., Bustos, B., Poblete, B. et al. Extracting semantic knowledge from web context for multimedia IR: a taxonomy, survey and challenges. Multimed Tools Appl 77, 13853–13889 (2018). https://doi.org/10.1007/s11042-017-4997-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-017-4997-y

Keywords

Navigation