Abstract
In this paper we show how to resolve the ambiguity of concepts that are extracted from visual stream with the help of identified concepts from associated textual stream. The disambiguation is performed at the concept-level based on semantic closeness over the domain ontology. The semantic closeness is a function of the distance between the concept to be disambiguated and selected associated concepts in the ontology. In this process, the image concepts will be disambiguated with any associated concept from the image and/or the text. The ability of the text concepts to resolve the ambiguity in the image concepts is varied. The best talent to resolve the ambiguity of an image concept occurs when the same concept(s) is stated clearly in both image and text, while, the worst case occurs when the image concept is an isolated concept that has no semantically close text concept. WordNet and the image labels with selected senses are used to construct the domain ontology used in the disambiguation process. The improved accuracy, as shown in the results, proves the ability of the proposed disambiguation process.
Similar content being viewed by others
References
Angelo C, Vincenzo M, Antonio P, Antonio P (2008) Scene detection using visual and audio attention. Paper presented at the Proceedings of the 2008 Ambi-Sys workshop on Ambient media delivery and interactive television, Quebec City, Canada
Athanasiadis T, Mylonas P, Yannis A, Stefanos K (2008) Semantic image segmentation and object labeling. IEEE Trans Circuits Syst Video Technol 17(3):298–312
Barnard K, Forsyth D (2001) Learning the semantics of words and pictures. Paper presented at the International Conference on Computer Vision
Barnard K, Johnson M (2005) Word sense disambiguation with pictures. Artif Intell 167(1–2):13–30. doi:10.1016/j.artint.2005.04.009
Benitez AB, Chang S-F (2002) Semantic knowledge construction from annotated image collections. ICME Lausanne, Switzerland
Boyd-Graber J, Blei DM, Zhu X (2007) A topic model for word sense disambiguation. Paper presented at the Empirical Methods in Natural Language Processing, Prague, Czech Republic
Chin Y, Khan L, Wang L, Awad M (2005) “Image annotations by combining multiple evidence & WordNet” In Proc. of 13th Annual ACM International Conference on Multimedia (MM 2005), Singapore,November 2005, pp 706–715
Cunningham H, Maynard D, Bontcheva K, Tablan V (2002) GATE: a framework and graphical development environment for robust NLP tools and applications. Paper presented at the the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL‘02). Philadelphia, July 2002
Fan X (2004) Contextual disambiguation for multi-class object detection. Paper presented at the International Conference on Image Processing
FELLBAUM Ce (1998) WordNet: an electronic lexical database. MIT Press
Galleguillos C, Rabinovich A, Belongie S (2008) Object categorization using co-occurrence, location and appearance. In: IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008. pp 1–8
Garcia ACB, Ferraz I, Santarosa Vivacqua A (2009) From data to knowledge mining. Artif Intell Eng Des Anal Manuf 23(4):427–441. doi:10.1017/S089006040900016X
Jie Y, Jiebo L (2008) Leveraging probabilistic season and location context models for scene understanding. Paper presented at the Proceedings of the 2008 international conference on Content-based image and video retrieval, Niagara Falls, Canada
Knublauch H, Fergerson R, Noy N, Musen M (2004) The Protege OWL Plugin: An Open Development Environment for Semantic Web Applications. In: The Semantic Web ISWC 2004, pp 229-243
Leacock C, Chodorow M (1998) Combining local context and WordNet similarity for word sense identification. In Fellbaum 1998, pp 265–283
Manjunath KN, Renuka A, Niranjan UC (2007) Linear models of cumulative distribution function for content-based medical image retrieval. J Med Syst 31(6):433–443. doi:10.1007/s10916-007-9075-y
Manolis D, Guillaume G, Patrick G (2008) Audiovisual integration with segment models for tennis video parsing. Comput Vis Image Underst 111(2):142–154. doi:10.1016/j.cviu.2007.09.002
Margarita K, Emmanouil B, Constantine K, Ioannis P (2007) A neural network approach to audio-assisted movie dialogue detection. Neurocomput 71(1–3):157–166. doi:10.1016/j.neucom.2007.08.006
Michael G, D. CP, Henning M, Thomas D (2006) The IAPR benchmark: a new evaluation resource for visual information systems. Paper presented at the International Conference on Language Resources and Evaluation, Genoa, Italy, 24/05/2006
Miller G (1995) WordNet: a lexical database for english. Commun ACM 38(11)
Ming-Fang W, Yung-Yu C (2008) Multi-cue fusion for semantic video indexing. Paper presented at the Proceeding of the 16th ACM international conference on Multimedia, Vancouver, British Columbia, Canada
Park K-W, Lee D-H (2006) Full-automatic high-level concept extraction from images using ontologies and semantic inference rules. In: ASWC, pp 307–321
Recommendation WC (10 February 2004 ) OWL: Web Ontology Language Overview http://www.w3.org/TR/owl-features/
Sanjiv K, Martial H (2005) A hierarchical field framework for unified context-based classification. Paper presented at the Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Singhal A, Luo J, Zhu W (2003) Probabilistic spatial context models for scene content understanding. Paper presented at the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Los Alamitos, CA, USA
Thies C, Herzog H, Schmitz-Rode T, Deserno TM (2007) Bridging the semantic gap for object extraction from biomedical images by classification. Biomed Tech 52
Wu Y, Tseng BL, Smith JR (2004) Ontology-based multi-classification learning for video concept detection. In: IEEE International Conference on Multimedia and Expo, ICME '04, pp 1003–1006
Ying L, Dengsheng Z, Guojun L, Wei-Ying M (2007) A survey of content-based image retrieval with high-level semantics. Pattern Recogn 40(1):262–282. doi:10.1016/j.patcog.2006.04.045
Zlatoff N, Tellez B, Baskurt A (2004) Image understanding and scene models: a generic framework integrating domain knowledge and Gestalt theory. In: International Conference on Image Processing, ICIP '04, Vol. 2354, pp 2355–2358
Acknowledgments
This work was supported by a Research University grant titled ‘Multimodal Meaning Normalization through Ontologies’ (No:1001/PKOMP/811021).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Abu-Shareha, A.A., Mandava, R., Khan, L. et al. Multimodal concept fusion using semantic closeness for image concept disambiguation. Multimed Tools Appl 61, 69–86 (2012). https://doi.org/10.1007/s11042-010-0707-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-010-0707-8