research-article

Reading between the tags to predict real-world size-class for visually depicted objects in images

Authors:
Martha Larson

Delft University of Technology, Delft, Netherlands

Delft University of Technology, Delft, Netherlands
View Profile

,
Christoph Kofler

Delft University of Technology, Delft, Netherlands

Delft University of Technology, Delft, Netherlands
View Profile

,
Alan Hanjalic

Delft University of Technology, Delft, Netherlands

Delft University of Technology, Delft, Netherlands
View Profile

MM '11: Proceedings of the 19th ACM international conference on MultimediaNovember 2011Pages 273–282https://doi.org/10.1145/2072298.2072335

Published:28 November 2011Publication History

MM '11: Proceedings of the 19th ACM international conference on Multimedia

Pages 273–282

ABSTRACT

Multimedia information retrieval stands to benefit from the availability of additional information about tags and how they relate to the content visually depicted in images. We propose a generic approach that contributes to improving the informativeness of image tags by combining generalizations about the distributional tendencies of physical objects in the real world and statistics of natural language use patterns that have been mined from the Web. The approach, which we refer to as 'Reading between the Tags,' provides for each tag associated with an image, first, a prediction concerning corporeality, i.e., whether or not the tag denotes a physical entity, and, then, concerning the real-world size of that entity, i.e., large, medium or small. Mining takes place using a set of Language Use Frames (LUFs) that are composed of natural language neighborhoods characteristic of tag classes. We validate our approach with a series of experiments on a set of images from the MIRFLICKR data set using ground truth created with standard crowdsourcing techniques. The main experiments demonstrate the effectiveness of our approach for size-class prediction. A further experiment shows that size-class prediction can be improved and made image-specific using general and relatively small sets of visual concepts. A final experiment confirms that the set of LUFs can also be chosen automatically via statistical feature selection.

References

Berg, T.L., and Berg, A.C. 2009. Finding iconic images, In Proc. of the Internet Vision Workshop at the Conference on Computer Vision and Pattern Recognition (CVPR '09), 1--8.Google Scholar
Brodley, C., Lane, T., and Stough, T. 1999. Knowledge Discovery and Data Mining. American Scientist. 87(1), 5410.Google ScholarCross Ref
Cilibrasi, R.L. and Vitanyi, P.M.B. 2007. The Google Simi-larity Distance. IEEE Trans. on Knowl. and Data Eng. 19(3), 370--383. Google ScholarDigital Library
Doursat, R. and Petitot, J. 2005. Bridging the gap between vision and language: A mophodynamical model of spatial categories. In Proceedings of the International Joint Conference on Neural Networks (IJCNN '05).Google Scholar
Fellbaum, C. 1998. WordNet: An Electronic Lexical Database. Bradford Books.Google Scholar
Gupta, A. and Davis, L.S. 2008. Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers. In Proceedings of the 10th European Conference on Computer Vision: Part I (ECCV '08), 16--29. Google ScholarDigital Library
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H. 2009. The WEKA Data Mining Software: An Update. SIGKDD Explorations, 11(1). Google ScholarDigital Library
Hayward, W. and Tarr, M. 1995. Spatial language and spatial representation. Cognition. 55(1), 39--84.Google ScholarCross Ref
Hearst, M. 1992. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th conference on Computational linguistics -- Vol. 2 (COLING '92), 539--545. Google ScholarDigital Library
Huiskes, M.J. and Lew, M.S. 2008. The MIR Flickr retrieval evaluation. In Proceedings of the ACM International Confer-ence on Multimedia Information Retrieval (MIR '08), 39--43. Google ScholarDigital Library
Huiskes, M.J., Thomee, B., and Lew, M.S. 2010. New trends and ideas in visual concept detection: the MIR Flickr retrieval evaluation initiative. In Proceedings of the ACM International Conference on Multimedia Information Retrieval (MIR '10), 527--536. Google ScholarDigital Library
Hung, S.-H., Lin, C.-H., and Hong, J.-S. 2010. Web mining for event-based commonsense knowledge using lexico-syntactic pattern matching and semantic role labeling, Expert Systems with Applications, 37(1), 341--347. Google ScholarDigital Library
Keller, F. and Lapata, M. 2003. Using the web to obtain frequencies for unseen bigrams. Comput. Linguist. 29(3), 459--484. Google ScholarDigital Library
Lee, S., Neve, W.D., and Ro, Y.M. 2010. Tag refinement in an image folksonomy using visual similarity and tag co-occurrence statistics. Image Commun. 25(10), 761--773. Google ScholarDigital Library
Li, X., Snoek, C.G.M., and Worring, M. 2008. Learning tag relevance by neighbor voting for social image retrieval. In Proceedings of the ACM International Conference on Multi-media Information Retrieval (MIR '08), 180--187. Google ScholarDigital Library
Liu, D., Hua, X.-S., Yang, L., Wang, M., and Zhang, H.-J.. 2009. Tag ranking. In Proceedings of the International World Wide Web Conference (WWW '09), 351--360. Google ScholarDigital Library
Purdue Online Writing Lab, Retrieved April 11, 2011, from http://owl.english.purdue.edu/owl/resource/594/01/Google Scholar
Quick Shot Artist. How to Compose a Picture, Retrieved April 11, 2011, from http://quickshotartist.com/Compose/Google Scholar
Randolph, J. J. 2008. Online Kappa Calculator. Retrieved April 10, 2011, from http://justus.randolph.name/kappaGoogle Scholar
Resnik, P. 1997. Selectional preference and sense disambig-uation. In Proc. of the SIGLEX Workshop on Tagging Text with Lexical Semantics: Why, What, and How?, 52--57.Google Scholar
Sánchez, D. 2010. A methodology to learn ontological at-tributes from the web. Data & Knowledge Engineering, 69(6), 573--597. Google ScholarDigital Library
Sawant, N., Li, J., Wang, J. 2011. Automatic image semantic interpretation using social action and tagging data. Multimedia Tools and Applications, 51(1), 213--246. Google ScholarDigital Library
Sigurbjörnsson, B. and van Zwol, R. 2008. Flickr tag rec-ommendation based on collective knowledge. In Proceedings of the International World Wide Web Conference (WWW '08), 327--336. Google ScholarDigital Library
Snoek, C. G. M. and Worring, M. 2009. Concept-Based Video Retrieval, Foundations and Trends in Information Re-trieval, 4(2), 215--322. Google ScholarDigital Library
Wan, K.-W., Roy, S. 2010. Identifying and learning visual attributes for object recognition. In Proceedings of the Inter-national Conference on Image processing (ICIP'2010), 3893--3896.Google ScholarCross Ref
Yang, K., Hua, X.-S., Wang, M., and Zhang, H.-J. 2010. Tagging tags. In Proceedings of ACM Multimedia (MM '10), 619--622. Google ScholarDigital Library

Index Terms

Reading between the tags to predict real-world size-class for visually depicted objects in images
1. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

Bridging the Semantic Gap Between Image Contents and Tags

With the exponential growth of Web 2.0 applications, tags have been used extensively to describe the image contents on the Web. Due to the noisy and sparse nature in the human generated tags, how to understand and utilize these tags for image retrieval ...
Read More
Social image tag enrichment based on textual similarity modeling

In social image sharing websites, users provide several descriptive tags to annotate their shared images. Usually, the user annotated tags are noisy, biased and incomplete. How to improve tag quality is very important for tag based applications. The ...
Read More
An exploratory study on joint analysis of visual classification in narrow domains and the discriminative power of tags
MS '08: Proceedings of the 2nd ACM workshop on Multimedia semantics

The popularity of social media sharing sites such as Flickr has driven a significant amount of research on the analysis of information contained in the tags used to annotate images. Many of such tags are not useful to describe the contents of an image ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '11: Proceedings of the 19th ACM international conference on Multimedia
November 2011
944 pages
ISBN:9781450306164
DOI:10.1145/2072298
General Chairs:
K. Selçuk Candan
Arizona State University, USA
,
Sethuraman Panchanathan
Arizona State University, USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA
,
Program Chairs:
Hari Sundaram
Arizona State University, USA
,
Wu-Chi Feng
Portland State University, USA
,
Nicu Sebe
University of Trento, Italy
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 November 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
crowdsourcing
image annotation
lexico-syntactic patterns
real-world scale
selectional restrictions
size
user-contributed tags
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 367
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Reading between the tags to predict real-world size-class for visually depicted objects in images

MM '11: Proceedings of the 19th ACM international conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Bridging the Semantic Gap Between Image Contents and Tags

Social image tag enrichment based on textual similarity modeling

An exploratory study on joint analysis of visual classification in narrow domains and the discriminative power of tags

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Reading between the tags to predict real-world size-class for visually depicted objects in images

MM '11: Proceedings of the 19th ACM international conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Bridging the Semantic Gap Between Image Contents and Tags

Social image tag enrichment based on textual similarity modeling

An exploratory study on joint analysis of visual classification in narrow domains and the discriminative power of tags

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media