skip to main content
10.1145/2072298.2072335acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Reading between the tags to predict real-world size-class for visually depicted objects in images

Published:28 November 2011Publication History

ABSTRACT

Multimedia information retrieval stands to benefit from the availability of additional information about tags and how they relate to the content visually depicted in images. We propose a generic approach that contributes to improving the informativeness of image tags by combining generalizations about the distributional tendencies of physical objects in the real world and statistics of natural language use patterns that have been mined from the Web. The approach, which we refer to as 'Reading between the Tags,' provides for each tag associated with an image, first, a prediction concerning corporeality, i.e., whether or not the tag denotes a physical entity, and, then, concerning the real-world size of that entity, i.e., large, medium or small. Mining takes place using a set of Language Use Frames (LUFs) that are composed of natural language neighborhoods characteristic of tag classes. We validate our approach with a series of experiments on a set of images from the MIRFLICKR data set using ground truth created with standard crowdsourcing techniques. The main experiments demonstrate the effectiveness of our approach for size-class prediction. A further experiment shows that size-class prediction can be improved and made image-specific using general and relatively small sets of visual concepts. A final experiment confirms that the set of LUFs can also be chosen automatically via statistical feature selection.

References

  1. Berg, T.L., and Berg, A.C. 2009. Finding iconic images, In Proc. of the Internet Vision Workshop at the Conference on Computer Vision and Pattern Recognition (CVPR '09), 1--8.Google ScholarGoogle Scholar
  2. Brodley, C., Lane, T., and Stough, T. 1999. Knowledge Discovery and Data Mining. American Scientist. 87(1), 5410.Google ScholarGoogle ScholarCross RefCross Ref
  3. Cilibrasi, R.L. and Vitanyi, P.M.B. 2007. The Google Simi-larity Distance. IEEE Trans. on Knowl. and Data Eng. 19(3), 370--383. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Doursat, R. and Petitot, J. 2005. Bridging the gap between vision and language: A mophodynamical model of spatial categories. In Proceedings of the International Joint Conference on Neural Networks (IJCNN '05).Google ScholarGoogle Scholar
  5. Fellbaum, C. 1998. WordNet: An Electronic Lexical Database. Bradford Books.Google ScholarGoogle Scholar
  6. Gupta, A. and Davis, L.S. 2008. Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers. In Proceedings of the 10th European Conference on Computer Vision: Part I (ECCV '08), 16--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H. 2009. The WEKA Data Mining Software: An Update. SIGKDD Explorations, 11(1). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Hayward, W. and Tarr, M. 1995. Spatial language and spatial representation. Cognition. 55(1), 39--84.Google ScholarGoogle ScholarCross RefCross Ref
  9. Hearst, M. 1992. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th conference on Computational linguistics -- Vol. 2 (COLING '92), 539--545. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Huiskes, M.J. and Lew, M.S. 2008. The MIR Flickr retrieval evaluation. In Proceedings of the ACM International Confer-ence on Multimedia Information Retrieval (MIR '08), 39--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Huiskes, M.J., Thomee, B., and Lew, M.S. 2010. New trends and ideas in visual concept detection: the MIR Flickr retrieval evaluation initiative. In Proceedings of the ACM International Conference on Multimedia Information Retrieval (MIR '10), 527--536. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Hung, S.-H., Lin, C.-H., and Hong, J.-S. 2010. Web mining for event-based commonsense knowledge using lexico-syntactic pattern matching and semantic role labeling, Expert Systems with Applications, 37(1), 341--347. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Keller, F. and Lapata, M. 2003. Using the web to obtain frequencies for unseen bigrams. Comput. Linguist. 29(3), 459--484. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Lee, S., Neve, W.D., and Ro, Y.M. 2010. Tag refinement in an image folksonomy using visual similarity and tag co-occurrence statistics. Image Commun. 25(10), 761--773. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Li, X., Snoek, C.G.M., and Worring, M. 2008. Learning tag relevance by neighbor voting for social image retrieval. In Proceedings of the ACM International Conference on Multi-media Information Retrieval (MIR '08), 180--187. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Liu, D., Hua, X.-S., Yang, L., Wang, M., and Zhang, H.-J.. 2009. Tag ranking. In Proceedings of the International World Wide Web Conference (WWW '09), 351--360. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Purdue Online Writing Lab, Retrieved April 11, 2011, from http://owl.english.purdue.edu/owl/resource/594/01/Google ScholarGoogle Scholar
  18. Quick Shot Artist. How to Compose a Picture, Retrieved April 11, 2011, from http://quickshotartist.com/Compose/Google ScholarGoogle Scholar
  19. Randolph, J. J. 2008. Online Kappa Calculator. Retrieved April 10, 2011, from http://justus.randolph.name/kappaGoogle ScholarGoogle Scholar
  20. Resnik, P. 1997. Selectional preference and sense disambig-uation. In Proc. of the SIGLEX Workshop on Tagging Text with Lexical Semantics: Why, What, and How?, 52--57.Google ScholarGoogle Scholar
  21. Sánchez, D. 2010. A methodology to learn ontological at-tributes from the web. Data & Knowledge Engineering, 69(6), 573--597. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Sawant, N., Li, J., Wang, J. 2011. Automatic image semantic interpretation using social action and tagging data. Multimedia Tools and Applications, 51(1), 213--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Sigurbjörnsson, B. and van Zwol, R. 2008. Flickr tag rec-ommendation based on collective knowledge. In Proceedings of the International World Wide Web Conference (WWW '08), 327--336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Snoek, C. G. M. and Worring, M. 2009. Concept-Based Video Retrieval, Foundations and Trends in Information Re-trieval, 4(2), 215--322. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Wan, K.-W., Roy, S. 2010. Identifying and learning visual attributes for object recognition. In Proceedings of the Inter-national Conference on Image processing (ICIP'2010), 3893--3896.Google ScholarGoogle ScholarCross RefCross Ref
  26. Yang, K., Hua, X.-S., Wang, M., and Zhang, H.-J. 2010. Tagging tags. In Proceedings of ACM Multimedia (MM '10), 619--622. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Reading between the tags to predict real-world size-class for visually depicted objects in images

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      MM '11: Proceedings of the 19th ACM international conference on Multimedia
      November 2011
      944 pages
      ISBN:9781450306164
      DOI:10.1145/2072298

      Copyright © 2011 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 28 November 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate995of4,171submissions,24%

      Upcoming Conference

      MM '24
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader