skip to main content
10.1145/2324796.2324832acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article

Learning to summarize web image and text mutually

Published:05 June 2012Publication History

ABSTRACT

We consider the problem of learning to summarize images by text and visualize text utilizing images, which we call Mutual-Summarization. We divide the web image-text data space into three subspaces, namely pure image space (PIS), pure text space (PTS) and image-text joint space (ITJS). Naturally, we treat the ITJS as a knowledge base.

For summarizing images by sentence issue, we map images from PIS to ITJS via image classification models and use text summarization on the corresponding texts in ITJS to summarize images. For text visualization problem, we map texts from PTS to ITJS via text categorization models and generate the visualization by choosing the semantic related images from ITJS, where the selected images are ranked by their confidence. In above approaches images are represented by color histograms, dense visual words and feature descriptors at different levels of spatial pyramid; and the texts are generated according to the Latent Dirichlet Allocation (LDA) topic model. Multiple Kernel (MK) methodologies are used to learn classifiers for image and text respectively. We show the Mutual-Summarization results on our newly collected dataset of six big events ("Gulf Oil Spill", "Haiti Earthquake", etc.) as well as demonstrate improved cross-media retrieval performance over existing methods in terms of MAP, Precision and Recall.

References

  1. R. Baeza-Yates, B. Ribeiro-Neto, et al. Modern information retrieval, volume 463. ACM press New York, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. Barnard, P. Duygulu, D. Forsyth, N. De Freitas, D. Blei, and M. Jordan. Matching words and pictures. The Journal of Machine Learning Research, 3:1107--1135, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Blei and M. Jordan. Modeling annotated data. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pages 127--134. ACM, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. The Journal of Machine Learning Research, 3:993--1022, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. Carneiro, A. Chan, P. Moreno, and N. Vasconcelos. Supervised learning of semantic classes for image annotation and retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 394--410, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In IEEE Conference on Computer Vision and Pattern Recognition, volume 1, pages 886--893. IEEE, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. V. Delaitre, L. I., and S. J. Recognizing human actions in still images: a study of bag-of-features and part-based representations. In British Machine Vision Conference, 2009.Google ScholarGoogle Scholar
  9. A. Farhadi, M. Hejrati, M. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every Picture Tells a Story: Generating Sentences from Images. ECCV 2010, pages 15--29, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L. Fei-Fei and L. Li. What, Where and Who? Telling the Story of an Image by Activity Classification, Scene Recognition and Object Categorization. Computer Vision, pages 157--171, 2010.Google ScholarGoogle Scholar
  11. P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Feng, R. Manmatha, and V. Lavrenko. Multiple bernoulli relevance models for image and video annotation. In IEEE Conference on Computer Vision and Pattern Recognition, volume 2. IEEE, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Gupta, P. Srinivasan, J. Shi, and L. Davis. Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos. In IEEE Conference on Computer Vision and Pattern Recognition., pages 2012--2019. Citeseer, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  14. H. Hotelling. Relations between two sets of variates. Biometrika, 28(3-4):321, 1936.Google ScholarGoogle ScholarCross RefCross Ref
  15. J. Jeon, V. Lavrenko, and R. Manmatha. Automatic image annotation and retrieval using cross-media relevance models. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pages 119--126. ACM, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. Kliegr, K. Chandramouli, J. Nemrava, V. Svatek, and E. Izquierdo. Combining image captions and visual analysis for image concept classification. In Proceedings of the 9th International Workshop on Multimedia Data Mining: held in conjunction with the ACM SIGKDD 2008, pages 8--17. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In IEEE Conference on Computer Vision and Pattern Recognition, volume 2, pages 2169--2178. IEEE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. L. Li and L. Fei-Fei. Optimol: automatic online picture collection via incremental model learning. International Journal of Computer Vision, 88(2):147--168, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. P. Li and J. Ma. What is happening in a still picture? In First Asian Conference on Pattern Recognition (ACPR), pages 32--36. IEEE, 2011.Google ScholarGoogle Scholar
  20. A. Nakagawa, A. Kutics, K. Tanaka, and M. Nakajima. Combining words and object-based visual features in image retrieval. 2003.Google ScholarGoogle ScholarCross RefCross Ref
  21. A. Oliva and A. Torralba. Building the gist of a scene: The role of global image features in recognition. Progress in Brain Research, 155:23--36, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  22. T. Pham, N. Maillot, J. Lim, and J. Chevallet. Latent semantic fusion model for image retrieval and annotation. In Proceedings of the 16th ACM Conference on Information and Knowledge Management, pages 439--444. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Quattoni, M. Collins, and T. Darrell. Learning visual representations using images with captions. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1--8. IEEE, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  24. D. Radev, T. Allison, S. Blair-Goldensohn, J. Blitzer, A. Çelebi, S. Dimitrov, E. Drabek, A. Hakim, W. Lam, D. Liu, J. Otterbacher, H. Qi, H. Saggion, S. Teufel, M. Topper, A. Winkel, and Z. Zhang. MEAD - a platform for multidocument multilingual text summarization. In LREC 2004, Lisbon, Portugal, May 2004.Google ScholarGoogle Scholar
  25. N. Rasiwasia, J. Pereira, E. Coviello, G. Doyle, G. Lanckriet, R. Levy, and N. Vasconcelos. A New Approach to Cross-Modal Multimedia Retrieval. In Proceedings of ACM International Conference on Multimedia. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Vedaldi, V. Gulshan, M. Varma, and A. Zisserman. Multiple kernels for object detection. In IEEE International Conference on Computer Vision, pages 606--613. IEEE, 2010.Google ScholarGoogle Scholar
  27. G. Wang, D. Hoiem, and D. Forsyth. Building text features for object image classification. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1367--1374. IEEE, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  28. T. Westerveld. Probabilistic multimedia retrieval. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pages 437--438. ACM, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. B. Yao, X. Yang, L. Lin, M. Lee, and S. Zhu. I2T: Image parsing to text description. Proceedings of the IEEE, 98(8):1485--1508, 2010.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Learning to summarize web image and text mutually

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICMR '12: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
      June 2012
      489 pages
      ISBN:9781450313292
      DOI:10.1145/2324796

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 5 June 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      ICMR '12 Paper Acceptance Rate50of145submissions,34%Overall Acceptance Rate254of830submissions,31%

      Upcoming Conference

      ICMR '24
      International Conference on Multimedia Retrieval
      June 10 - 14, 2024
      Phuket , Thailand

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader