skip to main content
10.1145/2324796.2324842acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article

Multimodal feature generation framework for semantic image classification

Authors Info & Claims
Published:05 June 2012Publication History

ABSTRACT

The automatic attribution of semantic labels to unlabeled or weakly labeled images has received considerable attention but, given the complexity of the problem, remains a hard research topic. Here we propose a unified classification framework which mixes textual and visual information in a seamless manner. Unlike most recent previous works, computer vision techniques are used as inspiration to process textual information. To do so, we consider two types of complementary tag similarities, respectively computed from a conceptual hierarchy and from data collected from a photo sharing platform. Visual content is processed using recent techniques for bag-of visual-words feature generation. A central contribution of our work is to infer the coding step of the general bag-of-word framework with such similarities and to aggregate these tag-codes by max-pooling to obtain a single representative vector (signature). Final image annotations are obtained via late fusion, where the three modalities (two text-based and one visual-based) are merged during the classification step. Experimental results on the Pascal VOC 2007 and MIR Flickr datasets show an improvement over the state-of-the-art methods, while significantly decreasing the computational complexity of the learning system.

References

  1. A. Binder, W. Samek, M. Kloft, C. Müller, K.-R. Müller, and M. Kawanabe. The Joint Submission of the TU Berlin and Fraunhofer FIRST (TUBFI) to the ImageCLEF2011 Photo Annotation Task. In CLEF (Notebook Papers/Labs/Workshop), 2011.Google ScholarGoogle Scholar
  2. Y.-L. Boureau, F. Bach, Y. LeCun, and J. Ponce. Learning mid-level features for recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2559--2566, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  3. A. Coates and A. Ng. The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization. In ACM International Conference on Machine Learning (ICML), pages 921--928, 2011.Google ScholarGoogle Scholar
  4. G. Csurka, C. R. Dance, L. Fan, J. Willamowski, and C. Bray. Visual categorization with bags of keypoints. In Workshop on Statistical Learning in Computer Vision (ECCV), pages 1--22, 2004.Google ScholarGoogle Scholar
  5. G. Dork and C. Schmid. Object class recognition using discriminative local features. Rapport de recherche RR-5497, INRIA, 2005.Google ScholarGoogle Scholar
  6. R. P. W. Duin. The Combining Classifier: To Train or Not to Train? In International Conference on Pattern Recognition (ICPR), pages 765--770, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  7. M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results.Google ScholarGoogle Scholar
  8. C. Fellbaum, editor. WordNet: An Electronic Lexical Database (Language, Speech, and Communication). The MIT Press, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  9. S. Gao, I. Tsang, L. Chia, and P. Zhao. Local features are not lonely - Laplacian sparse coding for image classification. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3555--3561, 2011.Google ScholarGoogle Scholar
  10. M. Guillaumin, J. Verbeek, and C. Schmid. Multimodal semi-supervised learning for image classification. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 902--909, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  11. Y. Huang, K. Huang, Y. Yu, and T. Tan. Salient Coding for Image Classification. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1753--1760, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. J. Huiskes and M. S. Lew. The MIR flickr retrieval evaluation. In ACM international conference on Multimedia information retrieval (ICMR), pages 39--43, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Kawanabe, A. Binder, C. Muller, and W. Wojcikiewicz. Multi-modal visual concept classification of images via Markov random walk over tags. In IEEE Workshop on Applications of Computer Vision, pages 396--401, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Lazebnik, C. Schmid, and J. Ponce. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2169--2178, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. L. Liu, L. Wang, and X. Liu. In Defense of Soft-assignment Coding. In IEEE International Conference on Computer Vision (ICCV), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. G. Lowe. Distinctive image features from scale-invariant keypoints. International journal of computer vision (IJCV), 60(2):91--110, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Oliva and A. Torralba. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision (IJCV), 42(3):145--175, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Popescu and G. Grefenstette. Social media driven image retrieval. In ACM International Conference on Multimedia Retrieval (ICMR), pages 33:1--33:8, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Sivic and A. Zisserman. Video Google: A Text Retrieval Approach to Object Matching in Videos. In IEEE International Conference on Computer Vision (ICCV), volume 2, pages 1470--1477, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 22:1349--1380, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. van Gemert, C. Veenman, A. Smeulders, and J. Geusebroek. Visual word ambiguity. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), pages 1271--1283, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. G. Wang, D. Hoiem, and D. Forsyth. Building text features for object image classification. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1367--1374, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  24. J. Wang, J. Yang, K. Yu, F. Lv, T. S. Huang, and Y. Gong. Locality-constrained linear coding for image classification. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3360--3367, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  25. D. H. Wolpert. Stacked generalization. Neural Networks, 5:241--259, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Z. Wu and M. Palmer. Verb semantics and lexical selection. In Annual Meeting of the Association for Computational Linguistics, pages 133--138, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Yang, K. Yu, Y. Gong, and T. Huang. Linear spatial pyramid matching using sparse coding for image classification. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1794--1801, 2009.Google ScholarGoogle Scholar
  28. K. Yu, T. Zhang, and Y. Gong. Nonlinear learning using local coordinate coding. Advances in Neural Information Processing Systems, 22:2223--2231, 2009.Google ScholarGoogle Scholar

Index Terms

  1. Multimodal feature generation framework for semantic image classification

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ICMR '12: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
        June 2012
        489 pages
        ISBN:9781450313292
        DOI:10.1145/2324796

        Copyright © 2012 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 5 June 2012

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        ICMR '12 Paper Acceptance Rate50of145submissions,34%Overall Acceptance Rate254of830submissions,31%

        Upcoming Conference

        ICMR '24
        International Conference on Multimedia Retrieval
        June 10 - 14, 2024
        Phuket , Thailand

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader