ABSTRACT
Recent advances for visual concept detection based on deep convolutional neural networks have only been successful because of the availability of huge training datasets provided by benchmarking initiatives such as ImageNet. Assembly of reliably annotated training data still is a largely manual effort and can only be approached efficiently as crowd-working tasks. On the other hand, user generated photos and annotations are available at almost no costs in social photo communities such as Flickr. Leveraging the information available in these communities may help to extend existing datasets as well as to create new ones for completely different classification scenarios. However, user generated annotations of photos are known to be incomplete, subjective and do not necessarily relate to the depicted content. In this paper, we therefore present an approach to reliably identify photos relevant for a given visual concept category. We have downloaded additional metadata for 1 million Flickr images and have trained a language model based on user generated annotations. Relevance estimation is based on accordance of an image's annotation data with our language model and on subsequent visual re-ranking. Experimental results demonstrate the potential of the proposed method -- comparison with a baseline approach based on single tag matching shows significant improvements.
- K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman. Return of the Devil in the Details: Delving Deep into Convolutional Nets. In M. Valstar, A. French, and T. Pridmore, editors, Proceedings of the British Machine Vision Conference. BMVA Press, 2014.Google Scholar
- D. Cireşan, U. Meier, J. Masci, and J. Schmidhuber. Multi-column deep neural network for traffic sign classification. Neural Networks, 32:333--338, 2012. Google ScholarDigital Library
- J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. International Conference on Machine Learning, pages 647--655, 2014.Google Scholar
- S. A. Golder and B. A. Huberman. Usage patterns of collaborative tagging systems, 2006.Google Scholar
- C. Hentschel, H. Sack, and N. Steinmetz. Cross-Dataset Learning of Visual Concepts. In A. Nürnberger, S. Stober, B. Larsen, and M. Detyniecki, editors, Adaptive Multimedia Retrieval: Semantics, Context, and Adaptation, volume 8382, pages 87--101. Springer International Publishing, 2013.Google Scholar
- W. H. Hsu, L. S. Kennedy, and S.-F. Chang. Video search reranking via information bottleneck principle. In Proceedings of the 14th Annual ACM International Conference on Multimedia, MULTIMEDIA '06, pages 35--44, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
- M. J. Huiskes, B. Thomee, and M. S. Lew. New trends and ideas in visual concept detection. In Proceedings of the international conference on Multimedia information retrieval - MIR '10, page 527, New York, New York, USA, 2010. ACM Press. Google ScholarDigital Library
- Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional Architecture for Fast Feature Embedding. In Proceedings of the ACM International Conference on Multimedia - MM '14, pages 675--678, 2014. Google ScholarDigital Library
- L. Kennedy, M. Naaman, S. Ahern, R. Nair, and T. Rattenbury. How flickr helps us make sense of the world: Context and content in community-contributed media collections. In Proceedings of the 15th International Conference on Multimedia, MULTIMEDIA '07, pages 631--640, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. Advances In Neural Information Processing Systems, pages 1--9, 2012.Google ScholarDigital Library
- S. Lee, W. De Neve, and Y. M. Ro. Image tag refinement along the 'what' dimension using tag categorization and neighbor voting. In Multimedia and Expo (ICME), 2010 IEEE International Conference on, pages 48--53, July 2010.Google ScholarCross Ref
- X. Li, C. G. M. Snoek, and M. Worring. Learning Tag Relevance by Neighbor Voting for Social Image Retrieval. In Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval, MIR '08, pages 180--187, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- D. Liu, X.-S. Hua, M. Wang, and H.-J. Zhang. Image retagging. In Proceedings of the International Conference on Multimedia, MM '10, pages 491--500, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- Matusiak and K. K. Towards user-centered indexing in digital image collections, 2006.Google Scholar
- T. Mikolov, G. Corrado, K. Chen, and J. Dean. Efficient Estimation of Word Representations in Vector Space. Proceedings of the International Conference on Learning Representations (ICLR 2013), pages 1--12, 2013.Google Scholar
- G. Park, Y. Baek, and H.-K. Lee. Majority based ranking approach in web image retrieval. In E. Bakker, M. Lew, T. Huang, N. Sebe, and X. Zhou, editors, Image and Video Retrieval, volume 2728 of Lecture Notes in Computer Science, pages 111--120. Springer Berlin Heidelberg, 2003. Google ScholarDigital Library
- A. Popescu and G. Grefenstette. Deducing trip related information from flickr. In Proceedings of the 18th international conference on World wide web, pages 1183--1184. ACM, 2009. Google ScholarDigital Library
- R. Rehurek and P. Sojka. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pages 45--50, Valletta, Malta, May 2010. ELRA. http://is.muni.cz/publication/884893/en.Google Scholar
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, 2015. Google ScholarDigital Library
- A. Sun and S. S. Bhowmick. Quantifying tag representativeness of visual content of social images. In Proceedings of the 18th International Conference on Multimedia 2010, Firenze, Italy, October 25-29, 2010, pages 471--480, 2010. Google ScholarDigital Library
- X.-J. Wang, L. Zhang, X. Li, and W.-Y. Ma. Annotating images by mining image search results. IEEE Trans. Pattern Anal. Mach. Intell., 30(11):1919--1932, Nov. 2008. Google ScholarDigital Library
- Y. Yang, Y. Gao, H. Zhang, J. Shao, and T.-S. Chua. Image tagging with social assistance. In Proceedings of International Conference on Multimedia Retrieval, ICMR '14, pages 81:81--81:88, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
- M. D. Zeiler and R. Fergus. Visualizing and Understanding Convolutional Networks. In D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, editors, Computer Vision -- ECCV 2014, 13th European Conference, volume 8689, pages 818--833. Springer International Publishing, 2014.Google Scholar
- G. Zhu, S. Yan, and Y. Ma. Image tag refinement towards low-rank, content-tag prior and error sparsity. In Proceedings of the International Conference on Multimedia, MM '10, pages 461--470, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
Index Terms
- Learning from the uncertain: leveraging social communities to generate reliable training data for visual concept detection tasks
Recommendations
Unsupervised multi-feature tag relevance learning for social image retrieval
CIVR '10: Proceedings of the ACM International Conference on Image and Video RetrievalInterpreting the relevance of a user-contributed tag with respect to the visual content of an image is an emerging problem in social image retrieval. In the literature this problem is tackled by analyzing the correlation between tags and images ...
Learning tag relevance by neighbor voting for social image retrieval
MIR '08: Proceedings of the 1st ACM international conference on Multimedia information retrievalSocial image retrieval is important for exploiting the increasing amounts of amateur-tagged multimedia such as Flickr images. Since amateur tagging is known to be uncontrolled, ambiguous, and personalized, a fundamental problem is how to reliably ...
Tag relevance fusion for social image retrieval
Due to the subjective nature of social tagging, measuring the relevance of social tags with respect to the visual content is crucial for retrieving the increasing amounts of social-networked images. Witnessing the limit of a single measurement of tag ...
Comments