ABSTRACT
We consider the problem of learning to summarize images by text and visualize text utilizing images, which we call Mutual-Summarization. We divide the web image-text data space into three subspaces, namely pure image space (PIS), pure text space (PTS) and image-text joint space (ITJS). Naturally, we treat the ITJS as a knowledge base.
For summarizing images by sentence issue, we map images from PIS to ITJS via image classification models and use text summarization on the corresponding texts in ITJS to summarize images. For text visualization problem, we map texts from PTS to ITJS via text categorization models and generate the visualization by choosing the semantic related images from ITJS, where the selected images are ranked by their confidence. In above approaches images are represented by color histograms, dense visual words and feature descriptors at different levels of spatial pyramid; and the texts are generated according to the Latent Dirichlet Allocation (LDA) topic model. Multiple Kernel (MK) methodologies are used to learn classifiers for image and text respectively. We show the Mutual-Summarization results on our newly collected dataset of six big events ("Gulf Oil Spill", "Haiti Earthquake", etc.) as well as demonstrate improved cross-media retrieval performance over existing methods in terms of MAP, Precision and Recall.
- R. Baeza-Yates, B. Ribeiro-Neto, et al. Modern information retrieval, volume 463. ACM press New York, 1999. Google ScholarDigital Library
- K. Barnard, P. Duygulu, D. Forsyth, N. De Freitas, D. Blei, and M. Jordan. Matching words and pictures. The Journal of Machine Learning Research, 3:1107--1135, 2003. Google ScholarDigital Library
- D. Blei and M. Jordan. Modeling annotated data. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pages 127--134. ACM, 2003. Google ScholarDigital Library
- D. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. The Journal of Machine Learning Research, 3:993--1022, 2003. Google ScholarDigital Library
- G. Carneiro, A. Chan, P. Moreno, and N. Vasconcelos. Supervised learning of semantic classes for image annotation and retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 394--410, 2007. Google ScholarDigital Library
- C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. Google ScholarDigital Library
- N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In IEEE Conference on Computer Vision and Pattern Recognition, volume 1, pages 886--893. IEEE, 2005. Google ScholarDigital Library
- V. Delaitre, L. I., and S. J. Recognizing human actions in still images: a study of bag-of-features and part-based representations. In British Machine Vision Conference, 2009.Google Scholar
- A. Farhadi, M. Hejrati, M. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every Picture Tells a Story: Generating Sentences from Images. ECCV 2010, pages 15--29, 2010. Google ScholarDigital Library
- L. Fei-Fei and L. Li. What, Where and Who? Telling the Story of an Image by Activity Classification, Scene Recognition and Object Categorization. Computer Vision, pages 157--171, 2010.Google Scholar
- P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009. Google ScholarDigital Library
- S. Feng, R. Manmatha, and V. Lavrenko. Multiple bernoulli relevance models for image and video annotation. In IEEE Conference on Computer Vision and Pattern Recognition, volume 2. IEEE, 2004. Google ScholarDigital Library
- A. Gupta, P. Srinivasan, J. Shi, and L. Davis. Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos. In IEEE Conference on Computer Vision and Pattern Recognition., pages 2012--2019. Citeseer, 2009.Google ScholarCross Ref
- H. Hotelling. Relations between two sets of variates. Biometrika, 28(3-4):321, 1936.Google ScholarCross Ref
- J. Jeon, V. Lavrenko, and R. Manmatha. Automatic image annotation and retrieval using cross-media relevance models. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pages 119--126. ACM, 2003. Google ScholarDigital Library
- T. Kliegr, K. Chandramouli, J. Nemrava, V. Svatek, and E. Izquierdo. Combining image captions and visual analysis for image concept classification. In Proceedings of the 9th International Workshop on Multimedia Data Mining: held in conjunction with the ACM SIGKDD 2008, pages 8--17. ACM, 2008. Google ScholarDigital Library
- S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In IEEE Conference on Computer Vision and Pattern Recognition, volume 2, pages 2169--2178. IEEE, 2006. Google ScholarDigital Library
- L. Li and L. Fei-Fei. Optimol: automatic online picture collection via incremental model learning. International Journal of Computer Vision, 88(2):147--168, 2010. Google ScholarDigital Library
- P. Li and J. Ma. What is happening in a still picture? In First Asian Conference on Pattern Recognition (ACPR), pages 32--36. IEEE, 2011.Google Scholar
- A. Nakagawa, A. Kutics, K. Tanaka, and M. Nakajima. Combining words and object-based visual features in image retrieval. 2003.Google ScholarCross Ref
- A. Oliva and A. Torralba. Building the gist of a scene: The role of global image features in recognition. Progress in Brain Research, 155:23--36, 2006.Google ScholarCross Ref
- T. Pham, N. Maillot, J. Lim, and J. Chevallet. Latent semantic fusion model for image retrieval and annotation. In Proceedings of the 16th ACM Conference on Information and Knowledge Management, pages 439--444. ACM, 2007. Google ScholarDigital Library
- A. Quattoni, M. Collins, and T. Darrell. Learning visual representations using images with captions. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1--8. IEEE, 2007.Google ScholarCross Ref
- D. Radev, T. Allison, S. Blair-Goldensohn, J. Blitzer, A. Çelebi, S. Dimitrov, E. Drabek, A. Hakim, W. Lam, D. Liu, J. Otterbacher, H. Qi, H. Saggion, S. Teufel, M. Topper, A. Winkel, and Z. Zhang. MEAD - a platform for multidocument multilingual text summarization. In LREC 2004, Lisbon, Portugal, May 2004.Google Scholar
- N. Rasiwasia, J. Pereira, E. Coviello, G. Doyle, G. Lanckriet, R. Levy, and N. Vasconcelos. A New Approach to Cross-Modal Multimedia Retrieval. In Proceedings of ACM International Conference on Multimedia. ACM, 2010. Google ScholarDigital Library
- A. Vedaldi, V. Gulshan, M. Varma, and A. Zisserman. Multiple kernels for object detection. In IEEE International Conference on Computer Vision, pages 606--613. IEEE, 2010.Google Scholar
- G. Wang, D. Hoiem, and D. Forsyth. Building text features for object image classification. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1367--1374. IEEE, 2009.Google ScholarCross Ref
- T. Westerveld. Probabilistic multimedia retrieval. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pages 437--438. ACM, 2002. Google ScholarDigital Library
- B. Yao, X. Yang, L. Lin, M. Lee, and S. Zhu. I2T: Image parsing to text description. Proceedings of the IEEE, 98(8):1485--1508, 2010.Google ScholarCross Ref
Index Terms
- Learning to summarize web image and text mutually
Recommendations
Topic-driven reader comments summarization
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge managementReaders of a news article often read its comments contributed by other readers. By reading comments, readers obtain not only complementary information about this news article but also the opinions from other readers. However, the existing ranking ...
Topic sentiment change analysis
MLDM'11: Proceedings of the 7th international conference on Machine learning and data mining in pattern recognitionPublic opinions on a topic may change over time. Topic Sentiment change analysis is a new research problem consisting of two main components: (a) mining opinions on a certain topic, and (b) detect significant changes of sentiment of the opinions on the ...
Research on Multi-document Summarization Based on LDA Topic Model
IHMSC '14: Proceedings of the 2014 Sixth International Conference on Intelligent Human-Machine Systems and Cybernetics - Volume 02Compared with VSM (Vector Space Model) and graph-ranking models, LDA (Latent Dirichlet Allocation) Model can discover latent topics in the corpus and latent topics are beneficial to use sentence-ranking mechanisms to form a good summary. In the paper, ...
Comments