skip to main content
10.1145/2809563.2809587acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesi-knowConference Proceedingsconference-collections
research-article

Learning from the uncertain: leveraging social communities to generate reliable training data for visual concept detection tasks

Authors Info & Claims
Published:21 October 2015Publication History

ABSTRACT

Recent advances for visual concept detection based on deep convolutional neural networks have only been successful because of the availability of huge training datasets provided by benchmarking initiatives such as ImageNet. Assembly of reliably annotated training data still is a largely manual effort and can only be approached efficiently as crowd-working tasks. On the other hand, user generated photos and annotations are available at almost no costs in social photo communities such as Flickr. Leveraging the information available in these communities may help to extend existing datasets as well as to create new ones for completely different classification scenarios. However, user generated annotations of photos are known to be incomplete, subjective and do not necessarily relate to the depicted content. In this paper, we therefore present an approach to reliably identify photos relevant for a given visual concept category. We have downloaded additional metadata for 1 million Flickr images and have trained a language model based on user generated annotations. Relevance estimation is based on accordance of an image's annotation data with our language model and on subsequent visual re-ranking. Experimental results demonstrate the potential of the proposed method -- comparison with a baseline approach based on single tag matching shows significant improvements.

References

  1. K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman. Return of the Devil in the Details: Delving Deep into Convolutional Nets. In M. Valstar, A. French, and T. Pridmore, editors, Proceedings of the British Machine Vision Conference. BMVA Press, 2014.Google ScholarGoogle Scholar
  2. D. Cireşan, U. Meier, J. Masci, and J. Schmidhuber. Multi-column deep neural network for traffic sign classification. Neural Networks, 32:333--338, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. International Conference on Machine Learning, pages 647--655, 2014.Google ScholarGoogle Scholar
  4. S. A. Golder and B. A. Huberman. Usage patterns of collaborative tagging systems, 2006.Google ScholarGoogle Scholar
  5. C. Hentschel, H. Sack, and N. Steinmetz. Cross-Dataset Learning of Visual Concepts. In A. Nürnberger, S. Stober, B. Larsen, and M. Detyniecki, editors, Adaptive Multimedia Retrieval: Semantics, Context, and Adaptation, volume 8382, pages 87--101. Springer International Publishing, 2013.Google ScholarGoogle Scholar
  6. W. H. Hsu, L. S. Kennedy, and S.-F. Chang. Video search reranking via information bottleneck principle. In Proceedings of the 14th Annual ACM International Conference on Multimedia, MULTIMEDIA '06, pages 35--44, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. J. Huiskes, B. Thomee, and M. S. Lew. New trends and ideas in visual concept detection. In Proceedings of the international conference on Multimedia information retrieval - MIR '10, page 527, New York, New York, USA, 2010. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional Architecture for Fast Feature Embedding. In Proceedings of the ACM International Conference on Multimedia - MM '14, pages 675--678, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L. Kennedy, M. Naaman, S. Ahern, R. Nair, and T. Rattenbury. How flickr helps us make sense of the world: Context and content in community-contributed media collections. In Proceedings of the 15th International Conference on Multimedia, MULTIMEDIA '07, pages 631--640, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. Advances In Neural Information Processing Systems, pages 1--9, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Lee, W. De Neve, and Y. M. Ro. Image tag refinement along the 'what' dimension using tag categorization and neighbor voting. In Multimedia and Expo (ICME), 2010 IEEE International Conference on, pages 48--53, July 2010.Google ScholarGoogle ScholarCross RefCross Ref
  12. X. Li, C. G. M. Snoek, and M. Worring. Learning Tag Relevance by Neighbor Voting for Social Image Retrieval. In Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval, MIR '08, pages 180--187, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Liu, X.-S. Hua, M. Wang, and H.-J. Zhang. Image retagging. In Proceedings of the International Conference on Multimedia, MM '10, pages 491--500, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Matusiak and K. K. Towards user-centered indexing in digital image collections, 2006.Google ScholarGoogle Scholar
  15. T. Mikolov, G. Corrado, K. Chen, and J. Dean. Efficient Estimation of Word Representations in Vector Space. Proceedings of the International Conference on Learning Representations (ICLR 2013), pages 1--12, 2013.Google ScholarGoogle Scholar
  16. G. Park, Y. Baek, and H.-K. Lee. Majority based ranking approach in web image retrieval. In E. Bakker, M. Lew, T. Huang, N. Sebe, and X. Zhou, editors, Image and Video Retrieval, volume 2728 of Lecture Notes in Computer Science, pages 111--120. Springer Berlin Heidelberg, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Popescu and G. Grefenstette. Deducing trip related information from flickr. In Proceedings of the 18th international conference on World wide web, pages 1183--1184. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Rehurek and P. Sojka. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pages 45--50, Valletta, Malta, May 2010. ELRA. http://is.muni.cz/publication/884893/en.Google ScholarGoogle Scholar
  19. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Sun and S. S. Bhowmick. Quantifying tag representativeness of visual content of social images. In Proceedings of the 18th International Conference on Multimedia 2010, Firenze, Italy, October 25-29, 2010, pages 471--480, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. X.-J. Wang, L. Zhang, X. Li, and W.-Y. Ma. Annotating images by mining image search results. IEEE Trans. Pattern Anal. Mach. Intell., 30(11):1919--1932, Nov. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Y. Yang, Y. Gao, H. Zhang, J. Shao, and T.-S. Chua. Image tagging with social assistance. In Proceedings of International Conference on Multimedia Retrieval, ICMR '14, pages 81:81--81:88, New York, NY, USA, 2014. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. D. Zeiler and R. Fergus. Visualizing and Understanding Convolutional Networks. In D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, editors, Computer Vision -- ECCV 2014, 13th European Conference, volume 8689, pages 818--833. Springer International Publishing, 2014.Google ScholarGoogle Scholar
  24. G. Zhu, S. Yan, and Y. Ma. Image tag refinement towards low-rank, content-tag prior and error sparsity. In Proceedings of the International Conference on Multimedia, MM '10, pages 461--470, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Learning from the uncertain: leveraging social communities to generate reliable training data for visual concept detection tasks

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        i-KNOW '15: Proceedings of the 15th International Conference on Knowledge Technologies and Data-driven Business
        October 2015
        314 pages
        ISBN:9781450337212
        DOI:10.1145/2809563
        • General Chairs:
        • Stefanie Lindstaedt,
        • Tobias Ley,
        • Harald Sack

        Copyright © 2015 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 21 October 2015

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        i-KNOW '15 Paper Acceptance Rate25of78submissions,32%Overall Acceptance Rate77of238submissions,32%
      • Article Metrics

        • Downloads (Last 12 months)3
        • Downloads (Last 6 weeks)0

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader