Skip to main content
Log in

Multimodal concept fusion using semantic closeness for image concept disambiguation

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper we show how to resolve the ambiguity of concepts that are extracted from visual stream with the help of identified concepts from associated textual stream. The disambiguation is performed at the concept-level based on semantic closeness over the domain ontology. The semantic closeness is a function of the distance between the concept to be disambiguated and selected associated concepts in the ontology. In this process, the image concepts will be disambiguated with any associated concept from the image and/or the text. The ability of the text concepts to resolve the ambiguity in the image concepts is varied. The best talent to resolve the ambiguity of an image concept occurs when the same concept(s) is stated clearly in both image and text, while, the worst case occurs when the image concept is an isolated concept that has no semantically close text concept. WordNet and the image labels with selected senses are used to construct the domain ontology used in the disambiguation process. The improved accuracy, as shown in the results, proves the ability of the proposed disambiguation process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Angelo C, Vincenzo M, Antonio P, Antonio P (2008) Scene detection using visual and audio attention. Paper presented at the Proceedings of the 2008 Ambi-Sys workshop on Ambient media delivery and interactive television, Quebec City, Canada

  2. Athanasiadis T, Mylonas P, Yannis A, Stefanos K (2008) Semantic image segmentation and object labeling. IEEE Trans Circuits Syst Video Technol 17(3):298–312

    Article  Google Scholar 

  3. Barnard K, Forsyth D (2001) Learning the semantics of words and pictures. Paper presented at the International Conference on Computer Vision

  4. Barnard K, Johnson M (2005) Word sense disambiguation with pictures. Artif Intell 167(1–2):13–30. doi:10.1016/j.artint.2005.04.009

    Article  Google Scholar 

  5. Benitez AB, Chang S-F (2002) Semantic knowledge construction from annotated image collections. ICME Lausanne, Switzerland

    Google Scholar 

  6. Boyd-Graber J, Blei DM, Zhu X (2007) A topic model for word sense disambiguation. Paper presented at the Empirical Methods in Natural Language Processing, Prague, Czech Republic

  7. Chin Y, Khan L, Wang L, Awad M (2005) “Image annotations by combining multiple evidence & WordNet” In Proc. of 13th Annual ACM International Conference on Multimedia (MM 2005), Singapore,November 2005, pp 706–715

  8. Cunningham H, Maynard D, Bontcheva K, Tablan V (2002) GATE: a framework and graphical development environment for robust NLP tools and applications. Paper presented at the the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL‘02). Philadelphia, July 2002

  9. Fan X (2004) Contextual disambiguation for multi-class object detection. Paper presented at the International Conference on Image Processing

  10. FELLBAUM Ce (1998) WordNet: an electronic lexical database. MIT Press

  11. Galleguillos C, Rabinovich A, Belongie S (2008) Object categorization using co-occurrence, location and appearance. In: IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008. pp 1–8

  12. Garcia ACB, Ferraz I, Santarosa Vivacqua A (2009) From data to knowledge mining. Artif Intell Eng Des Anal Manuf 23(4):427–441. doi:10.1017/S089006040900016X

    Article  Google Scholar 

  13. Jie Y, Jiebo L (2008) Leveraging probabilistic season and location context models for scene understanding. Paper presented at the Proceedings of the 2008 international conference on Content-based image and video retrieval, Niagara Falls, Canada

  14. Knublauch H, Fergerson R, Noy N, Musen M (2004) The Protege OWL Plugin: An Open Development Environment for Semantic Web Applications. In: The Semantic Web ISWC 2004, pp 229-243

  15. Leacock C, Chodorow M (1998) Combining local context and WordNet similarity for word sense identification. In Fellbaum 1998, pp 265–283

  16. Manjunath KN, Renuka A, Niranjan UC (2007) Linear models of cumulative distribution function for content-based medical image retrieval. J Med Syst 31(6):433–443. doi:10.1007/s10916-007-9075-y

    Article  Google Scholar 

  17. Manolis D, Guillaume G, Patrick G (2008) Audiovisual integration with segment models for tennis video parsing. Comput Vis Image Underst 111(2):142–154. doi:10.1016/j.cviu.2007.09.002

    Article  Google Scholar 

  18. Margarita K, Emmanouil B, Constantine K, Ioannis P (2007) A neural network approach to audio-assisted movie dialogue detection. Neurocomput 71(1–3):157–166. doi:10.1016/j.neucom.2007.08.006

    Google Scholar 

  19. Michael G, D. CP, Henning M, Thomas D (2006) The IAPR benchmark: a new evaluation resource for visual information systems. Paper presented at the International Conference on Language Resources and Evaluation, Genoa, Italy, 24/05/2006

  20. Miller G (1995) WordNet: a lexical database for english. Commun ACM 38(11)

  21. Ming-Fang W, Yung-Yu C (2008) Multi-cue fusion for semantic video indexing. Paper presented at the Proceeding of the 16th ACM international conference on Multimedia, Vancouver, British Columbia, Canada

  22. Park K-W, Lee D-H (2006) Full-automatic high-level concept extraction from images using ontologies and semantic inference rules. In: ASWC, pp 307–321

  23. Recommendation WC (10 February 2004 ) OWL: Web Ontology Language Overview http://www.w3.org/TR/owl-features/

  24. Sanjiv K, Martial H (2005) A hierarchical field framework for unified context-based classification. Paper presented at the Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2

  25. Singhal A, Luo J, Zhu W (2003) Probabilistic spatial context models for scene content understanding. Paper presented at the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Los Alamitos, CA, USA

  26. Thies C, Herzog H, Schmitz-Rode T, Deserno TM (2007) Bridging the semantic gap for object extraction from biomedical images by classification. Biomed Tech 52

  27. Wu Y, Tseng BL, Smith JR (2004) Ontology-based multi-classification learning for video concept detection. In: IEEE International Conference on Multimedia and Expo, ICME '04, pp 1003–1006

  28. Ying L, Dengsheng Z, Guojun L, Wei-Ying M (2007) A survey of content-based image retrieval with high-level semantics. Pattern Recogn 40(1):262–282. doi:10.1016/j.patcog.2006.04.045

    Article  MATH  Google Scholar 

  29. Zlatoff N, Tellez B, Baskurt A (2004) Image understanding and scene models: a generic framework integrating domain knowledge and Gestalt theory. In: International Conference on Image Processing, ICIP '04, Vol. 2354, pp 2355–2358

Download references

Acknowledgments

This work was supported by a Research University grant titled ‘Multimodal Meaning Normalization through Ontologies’ (No:1001/PKOMP/811021).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rajeswari Mandava.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abu-Shareha, A.A., Mandava, R., Khan, L. et al. Multimodal concept fusion using semantic closeness for image concept disambiguation. Multimed Tools Appl 61, 69–86 (2012). https://doi.org/10.1007/s11042-010-0707-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-010-0707-8

Keywords

Navigation