research-article

Learning from the uncertain: leveraging social communities to generate reliable training data for visual concept detection tasks

Authors:
Christian Hentschel

Hasso Plattner Institute for Software Systems Engineering, Potsdam, Germany

Hasso Plattner Institute for Software Systems Engineering, Potsdam, Germany
View Profile

,
Harald Sack

Hasso Plattner Institute for Software Systems Engineering, Potsdam, Germany

Hasso Plattner Institute for Software Systems Engineering, Potsdam, Germany
View Profile

i-KNOW '15: Proceedings of the 15th International Conference on Knowledge Technologies and Data-driven BusinessOctober 2015Article No.: 3Pages 1–8https://doi.org/10.1145/2809563.2809587

Published:21 October 2015Publication History

i-KNOW '15: Proceedings of the 15th International Conference on Knowledge Technologies and Data-driven Business

Pages 1–8

ABSTRACT

Recent advances for visual concept detection based on deep convolutional neural networks have only been successful because of the availability of huge training datasets provided by benchmarking initiatives such as ImageNet. Assembly of reliably annotated training data still is a largely manual effort and can only be approached efficiently as crowd-working tasks. On the other hand, user generated photos and annotations are available at almost no costs in social photo communities such as Flickr. Leveraging the information available in these communities may help to extend existing datasets as well as to create new ones for completely different classification scenarios. However, user generated annotations of photos are known to be incomplete, subjective and do not necessarily relate to the depicted content. In this paper, we therefore present an approach to reliably identify photos relevant for a given visual concept category. We have downloaded additional metadata for 1 million Flickr images and have trained a language model based on user generated annotations. Relevance estimation is based on accordance of an image's annotation data with our language model and on subsequent visual re-ranking. Experimental results demonstrate the potential of the proposed method -- comparison with a baseline approach based on single tag matching shows significant improvements.

References

K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman. Return of the Devil in the Details: Delving Deep into Convolutional Nets. In M. Valstar, A. French, and T. Pridmore, editors, Proceedings of the British Machine Vision Conference. BMVA Press, 2014.Google Scholar
D. Cireşan, U. Meier, J. Masci, and J. Schmidhuber. Multi-column deep neural network for traffic sign classification. Neural Networks, 32:333--338, 2012. Google ScholarDigital Library
J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. International Conference on Machine Learning, pages 647--655, 2014.Google Scholar
S. A. Golder and B. A. Huberman. Usage patterns of collaborative tagging systems, 2006.Google Scholar
C. Hentschel, H. Sack, and N. Steinmetz. Cross-Dataset Learning of Visual Concepts. In A. Nürnberger, S. Stober, B. Larsen, and M. Detyniecki, editors, Adaptive Multimedia Retrieval: Semantics, Context, and Adaptation, volume 8382, pages 87--101. Springer International Publishing, 2013.Google Scholar
W. H. Hsu, L. S. Kennedy, and S.-F. Chang. Video search reranking via information bottleneck principle. In Proceedings of the 14th Annual ACM International Conference on Multimedia, MULTIMEDIA '06, pages 35--44, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
M. J. Huiskes, B. Thomee, and M. S. Lew. New trends and ideas in visual concept detection. In Proceedings of the international conference on Multimedia information retrieval - MIR '10, page 527, New York, New York, USA, 2010. ACM Press. Google ScholarDigital Library
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional Architecture for Fast Feature Embedding. In Proceedings of the ACM International Conference on Multimedia - MM '14, pages 675--678, 2014. Google ScholarDigital Library
L. Kennedy, M. Naaman, S. Ahern, R. Nair, and T. Rattenbury. How flickr helps us make sense of the world: Context and content in community-contributed media collections. In Proceedings of the 15th International Conference on Multimedia, MULTIMEDIA '07, pages 631--640, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. Advances In Neural Information Processing Systems, pages 1--9, 2012.Google ScholarDigital Library
S. Lee, W. De Neve, and Y. M. Ro. Image tag refinement along the 'what' dimension using tag categorization and neighbor voting. In Multimedia and Expo (ICME), 2010 IEEE International Conference on, pages 48--53, July 2010.Google ScholarCross Ref
X. Li, C. G. M. Snoek, and M. Worring. Learning Tag Relevance by Neighbor Voting for Social Image Retrieval. In Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval, MIR '08, pages 180--187, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
D. Liu, X.-S. Hua, M. Wang, and H.-J. Zhang. Image retagging. In Proceedings of the International Conference on Multimedia, MM '10, pages 491--500, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
Matusiak and K. K. Towards user-centered indexing in digital image collections, 2006.Google Scholar
T. Mikolov, G. Corrado, K. Chen, and J. Dean. Efficient Estimation of Word Representations in Vector Space. Proceedings of the International Conference on Learning Representations (ICLR 2013), pages 1--12, 2013.Google Scholar
G. Park, Y. Baek, and H.-K. Lee. Majority based ranking approach in web image retrieval. In E. Bakker, M. Lew, T. Huang, N. Sebe, and X. Zhou, editors, Image and Video Retrieval, volume 2728 of Lecture Notes in Computer Science, pages 111--120. Springer Berlin Heidelberg, 2003. Google ScholarDigital Library
A. Popescu and G. Grefenstette. Deducing trip related information from flickr. In Proceedings of the 18th international conference on World wide web, pages 1183--1184. ACM, 2009. Google ScholarDigital Library
R. Rehurek and P. Sojka. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pages 45--50, Valletta, Malta, May 2010. ELRA. http://is.muni.cz/publication/884893/en.Google Scholar
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, 2015. Google ScholarDigital Library
A. Sun and S. S. Bhowmick. Quantifying tag representativeness of visual content of social images. In Proceedings of the 18th International Conference on Multimedia 2010, Firenze, Italy, October 25-29, 2010, pages 471--480, 2010. Google ScholarDigital Library
X.-J. Wang, L. Zhang, X. Li, and W.-Y. Ma. Annotating images by mining image search results. IEEE Trans. Pattern Anal. Mach. Intell., 30(11):1919--1932, Nov. 2008. Google ScholarDigital Library
Y. Yang, Y. Gao, H. Zhang, J. Shao, and T.-S. Chua. Image tagging with social assistance. In Proceedings of International Conference on Multimedia Retrieval, ICMR '14, pages 81:81--81:88, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
M. D. Zeiler and R. Fergus. Visualizing and Understanding Convolutional Networks. In D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, editors, Computer Vision -- ECCV 2014, 13th European Conference, volume 8689, pages 818--833. Springer International Publishing, 2014.Google Scholar
G. Zhu, S. Yan, and Y. Ma. Image tag refinement towards low-rank, content-tag prior and error sparsity. In Proceedings of the International Conference on Multimedia, MM '10, pages 461--470, New York, NY, USA, 2010. ACM. Google ScholarDigital Library

Index Terms

Learning from the uncertain: leveraging social communities to generate reliable training data for visual concept detection tasks
1. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval
        Image search
  2. Information systems applications
    1. Data mining

Recommendations

Unsupervised multi-feature tag relevance learning for social image retrieval
CIVR '10: Proceedings of the ACM International Conference on Image and Video Retrieval

Interpreting the relevance of a user-contributed tag with respect to the visual content of an image is an emerging problem in social image retrieval. In the literature this problem is tackled by analyzing the correlation between tags and images ...
Read More
Learning tag relevance by neighbor voting for social image retrieval
MIR '08: Proceedings of the 1st ACM international conference on Multimedia information retrieval

Social image retrieval is important for exploiting the increasing amounts of amateur-tagged multimedia such as Flickr images. Since amateur tagging is known to be uncontrolled, ambiguous, and personalized, a fundamental problem is how to reliably ...
Read More
Tag relevance fusion for social image retrieval

Due to the subjective nature of social tagging, measuring the relevance of social tags with respect to the visual content is crucial for retrieving the increasing amounts of social-networked images. Witnessing the limit of a single measurement of tag ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
i-KNOW '15: Proceedings of the 15th International Conference on Knowledge Technologies and Data-driven Business
October 2015
314 pages
ISBN:9781450337212
DOI:10.1145/2809563
General Chairs:
Stefanie Lindstaedt
Know-Center Graz, Austria & Graz University of Technology
,
Tobias Ley
Tallin University, Estonia
,
Harald Sack
Hasso-Platter Institute for IT Systems Engineering, Germany
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 October 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
language model
relevance estimation
social image retrieval
visual re-ranking
Qualifiers
- research-article
Conference

Acceptance Rates
i-KNOW '15 Paper Acceptance Rate25of78submissions,32%Overall Acceptance Rate77of238submissions,32%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 64
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Learning from the uncertain: leveraging social communities to generate reliable training data for visual concept detection tasks

i-KNOW '15: Proceedings of the 15th International Conference on Knowledge Technologies and Data-driven Business

ABSTRACT

References

Cited By

Index Terms

Recommendations

Unsupervised multi-feature tag relevance learning for social image retrieval

Learning tag relevance by neighbor voting for social image retrieval

Tag relevance fusion for social image retrieval

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Learning from the uncertain: leveraging social communities to generate reliable training data for visual concept detection tasks

i-KNOW '15: Proceedings of the 15th International Conference on Knowledge Technologies and Data-driven Business

ABSTRACT

References

Cited By

Index Terms

Recommendations

Unsupervised multi-feature tag relevance learning for social image retrieval

Learning tag relevance by neighbor voting for social image retrieval

Tag relevance fusion for social image retrieval

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media