ABSTRACT
Multimedia information retrieval stands to benefit from the availability of additional information about tags and how they relate to the content visually depicted in images. We propose a generic approach that contributes to improving the informativeness of image tags by combining generalizations about the distributional tendencies of physical objects in the real world and statistics of natural language use patterns that have been mined from the Web. The approach, which we refer to as 'Reading between the Tags,' provides for each tag associated with an image, first, a prediction concerning corporeality, i.e., whether or not the tag denotes a physical entity, and, then, concerning the real-world size of that entity, i.e., large, medium or small. Mining takes place using a set of Language Use Frames (LUFs) that are composed of natural language neighborhoods characteristic of tag classes. We validate our approach with a series of experiments on a set of images from the MIRFLICKR data set using ground truth created with standard crowdsourcing techniques. The main experiments demonstrate the effectiveness of our approach for size-class prediction. A further experiment shows that size-class prediction can be improved and made image-specific using general and relatively small sets of visual concepts. A final experiment confirms that the set of LUFs can also be chosen automatically via statistical feature selection.
- Berg, T.L., and Berg, A.C. 2009. Finding iconic images, In Proc. of the Internet Vision Workshop at the Conference on Computer Vision and Pattern Recognition (CVPR '09), 1--8.Google Scholar
- Brodley, C., Lane, T., and Stough, T. 1999. Knowledge Discovery and Data Mining. American Scientist. 87(1), 5410.Google ScholarCross Ref
- Cilibrasi, R.L. and Vitanyi, P.M.B. 2007. The Google Simi-larity Distance. IEEE Trans. on Knowl. and Data Eng. 19(3), 370--383. Google ScholarDigital Library
- Doursat, R. and Petitot, J. 2005. Bridging the gap between vision and language: A mophodynamical model of spatial categories. In Proceedings of the International Joint Conference on Neural Networks (IJCNN '05).Google Scholar
- Fellbaum, C. 1998. WordNet: An Electronic Lexical Database. Bradford Books.Google Scholar
- Gupta, A. and Davis, L.S. 2008. Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers. In Proceedings of the 10th European Conference on Computer Vision: Part I (ECCV '08), 16--29. Google ScholarDigital Library
- Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H. 2009. The WEKA Data Mining Software: An Update. SIGKDD Explorations, 11(1). Google ScholarDigital Library
- Hayward, W. and Tarr, M. 1995. Spatial language and spatial representation. Cognition. 55(1), 39--84.Google ScholarCross Ref
- Hearst, M. 1992. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th conference on Computational linguistics -- Vol. 2 (COLING '92), 539--545. Google ScholarDigital Library
- Huiskes, M.J. and Lew, M.S. 2008. The MIR Flickr retrieval evaluation. In Proceedings of the ACM International Confer-ence on Multimedia Information Retrieval (MIR '08), 39--43. Google ScholarDigital Library
- Huiskes, M.J., Thomee, B., and Lew, M.S. 2010. New trends and ideas in visual concept detection: the MIR Flickr retrieval evaluation initiative. In Proceedings of the ACM International Conference on Multimedia Information Retrieval (MIR '10), 527--536. Google ScholarDigital Library
- Hung, S.-H., Lin, C.-H., and Hong, J.-S. 2010. Web mining for event-based commonsense knowledge using lexico-syntactic pattern matching and semantic role labeling, Expert Systems with Applications, 37(1), 341--347. Google ScholarDigital Library
- Keller, F. and Lapata, M. 2003. Using the web to obtain frequencies for unseen bigrams. Comput. Linguist. 29(3), 459--484. Google ScholarDigital Library
- Lee, S., Neve, W.D., and Ro, Y.M. 2010. Tag refinement in an image folksonomy using visual similarity and tag co-occurrence statistics. Image Commun. 25(10), 761--773. Google ScholarDigital Library
- Li, X., Snoek, C.G.M., and Worring, M. 2008. Learning tag relevance by neighbor voting for social image retrieval. In Proceedings of the ACM International Conference on Multi-media Information Retrieval (MIR '08), 180--187. Google ScholarDigital Library
- Liu, D., Hua, X.-S., Yang, L., Wang, M., and Zhang, H.-J.. 2009. Tag ranking. In Proceedings of the International World Wide Web Conference (WWW '09), 351--360. Google ScholarDigital Library
- Purdue Online Writing Lab, Retrieved April 11, 2011, from http://owl.english.purdue.edu/owl/resource/594/01/Google Scholar
- Quick Shot Artist. How to Compose a Picture, Retrieved April 11, 2011, from http://quickshotartist.com/Compose/Google Scholar
- Randolph, J. J. 2008. Online Kappa Calculator. Retrieved April 10, 2011, from http://justus.randolph.name/kappaGoogle Scholar
- Resnik, P. 1997. Selectional preference and sense disambig-uation. In Proc. of the SIGLEX Workshop on Tagging Text with Lexical Semantics: Why, What, and How?, 52--57.Google Scholar
- Sánchez, D. 2010. A methodology to learn ontological at-tributes from the web. Data & Knowledge Engineering, 69(6), 573--597. Google ScholarDigital Library
- Sawant, N., Li, J., Wang, J. 2011. Automatic image semantic interpretation using social action and tagging data. Multimedia Tools and Applications, 51(1), 213--246. Google ScholarDigital Library
- Sigurbjörnsson, B. and van Zwol, R. 2008. Flickr tag rec-ommendation based on collective knowledge. In Proceedings of the International World Wide Web Conference (WWW '08), 327--336. Google ScholarDigital Library
- Snoek, C. G. M. and Worring, M. 2009. Concept-Based Video Retrieval, Foundations and Trends in Information Re-trieval, 4(2), 215--322. Google ScholarDigital Library
- Wan, K.-W., Roy, S. 2010. Identifying and learning visual attributes for object recognition. In Proceedings of the Inter-national Conference on Image processing (ICIP'2010), 3893--3896.Google ScholarCross Ref
- Yang, K., Hua, X.-S., Wang, M., and Zhang, H.-J. 2010. Tagging tags. In Proceedings of ACM Multimedia (MM '10), 619--622. Google ScholarDigital Library
Index Terms
- Reading between the tags to predict real-world size-class for visually depicted objects in images
Recommendations
Bridging the Semantic Gap Between Image Contents and Tags
With the exponential growth of Web 2.0 applications, tags have been used extensively to describe the image contents on the Web. Due to the noisy and sparse nature in the human generated tags, how to understand and utilize these tags for image retrieval ...
Social image tag enrichment based on textual similarity modeling
In social image sharing websites, users provide several descriptive tags to annotate their shared images. Usually, the user annotated tags are noisy, biased and incomplete. How to improve tag quality is very important for tag based applications. The ...
An exploratory study on joint analysis of visual classification in narrow domains and the discriminative power of tags
MS '08: Proceedings of the 2nd ACM workshop on Multimedia semanticsThe popularity of social media sharing sites such as Flickr has driven a significant amount of research on the analysis of information contained in the tags used to annotate images. Many of such tags are not useful to describe the contents of an image ...
Comments