research-article

Learning to summarize web image and text mutually

Authors:
Piji Li

Shandong University, Jinan, China

Shandong University, Jinan, China
View Profile

,
Jun Ma

Shandong University, Jinan, China

Shandong University, Jinan, China
View Profile

,
Shuai Gao

Shandong University, Jinan, China

Shandong University, Jinan, China
View Profile

ICMR '12: Proceedings of the 2nd ACM International Conference on Multimedia RetrievalJune 2012Article No.: 28Pages 1–8https://doi.org/10.1145/2324796.2324832

Published:05 June 2012Publication History

ICMR '12: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval

Pages 1–8

ABSTRACT

We consider the problem of learning to summarize images by text and visualize text utilizing images, which we call Mutual-Summarization. We divide the web image-text data space into three subspaces, namely pure image space (PIS), pure text space (PTS) and image-text joint space (ITJS). Naturally, we treat the ITJS as a knowledge base.

For summarizing images by sentence issue, we map images from PIS to ITJS via image classification models and use text summarization on the corresponding texts in ITJS to summarize images. For text visualization problem, we map texts from PTS to ITJS via text categorization models and generate the visualization by choosing the semantic related images from ITJS, where the selected images are ranked by their confidence. In above approaches images are represented by color histograms, dense visual words and feature descriptors at different levels of spatial pyramid; and the texts are generated according to the Latent Dirichlet Allocation (LDA) topic model. Multiple Kernel (MK) methodologies are used to learn classifiers for image and text respectively. We show the Mutual-Summarization results on our newly collected dataset of six big events ("Gulf Oil Spill", "Haiti Earthquake", etc.) as well as demonstrate improved cross-media retrieval performance over existing methods in terms of MAP, Precision and Recall.

References

R. Baeza-Yates, B. Ribeiro-Neto, et al. Modern information retrieval, volume 463. ACM press New York, 1999. Google ScholarDigital Library
K. Barnard, P. Duygulu, D. Forsyth, N. De Freitas, D. Blei, and M. Jordan. Matching words and pictures. The Journal of Machine Learning Research, 3:1107--1135, 2003. Google ScholarDigital Library
D. Blei and M. Jordan. Modeling annotated data. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pages 127--134. ACM, 2003. Google ScholarDigital Library
D. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. The Journal of Machine Learning Research, 3:993--1022, 2003. Google ScholarDigital Library
G. Carneiro, A. Chan, P. Moreno, and N. Vasconcelos. Supervised learning of semantic classes for image annotation and retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 394--410, 2007. Google ScholarDigital Library
C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. Google ScholarDigital Library
N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In IEEE Conference on Computer Vision and Pattern Recognition, volume 1, pages 886--893. IEEE, 2005. Google ScholarDigital Library
V. Delaitre, L. I., and S. J. Recognizing human actions in still images: a study of bag-of-features and part-based representations. In British Machine Vision Conference, 2009.Google Scholar
A. Farhadi, M. Hejrati, M. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every Picture Tells a Story: Generating Sentences from Images. ECCV 2010, pages 15--29, 2010. Google ScholarDigital Library
L. Fei-Fei and L. Li. What, Where and Who? Telling the Story of an Image by Activity Classification, Scene Recognition and Object Categorization. Computer Vision, pages 157--171, 2010.Google Scholar
P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009. Google ScholarDigital Library
S. Feng, R. Manmatha, and V. Lavrenko. Multiple bernoulli relevance models for image and video annotation. In IEEE Conference on Computer Vision and Pattern Recognition, volume 2. IEEE, 2004. Google ScholarDigital Library
A. Gupta, P. Srinivasan, J. Shi, and L. Davis. Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos. In IEEE Conference on Computer Vision and Pattern Recognition., pages 2012--2019. Citeseer, 2009.Google ScholarCross Ref
H. Hotelling. Relations between two sets of variates. Biometrika, 28(3-4):321, 1936.Google ScholarCross Ref
J. Jeon, V. Lavrenko, and R. Manmatha. Automatic image annotation and retrieval using cross-media relevance models. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pages 119--126. ACM, 2003. Google ScholarDigital Library
T. Kliegr, K. Chandramouli, J. Nemrava, V. Svatek, and E. Izquierdo. Combining image captions and visual analysis for image concept classification. In Proceedings of the 9th International Workshop on Multimedia Data Mining: held in conjunction with the ACM SIGKDD 2008, pages 8--17. ACM, 2008. Google ScholarDigital Library
S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In IEEE Conference on Computer Vision and Pattern Recognition, volume 2, pages 2169--2178. IEEE, 2006. Google ScholarDigital Library
L. Li and L. Fei-Fei. Optimol: automatic online picture collection via incremental model learning. International Journal of Computer Vision, 88(2):147--168, 2010. Google ScholarDigital Library
P. Li and J. Ma. What is happening in a still picture? In First Asian Conference on Pattern Recognition (ACPR), pages 32--36. IEEE, 2011.Google Scholar
A. Nakagawa, A. Kutics, K. Tanaka, and M. Nakajima. Combining words and object-based visual features in image retrieval. 2003.Google ScholarCross Ref
A. Oliva and A. Torralba. Building the gist of a scene: The role of global image features in recognition. Progress in Brain Research, 155:23--36, 2006.Google ScholarCross Ref
T. Pham, N. Maillot, J. Lim, and J. Chevallet. Latent semantic fusion model for image retrieval and annotation. In Proceedings of the 16th ACM Conference on Information and Knowledge Management, pages 439--444. ACM, 2007. Google ScholarDigital Library
A. Quattoni, M. Collins, and T. Darrell. Learning visual representations using images with captions. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1--8. IEEE, 2007.Google ScholarCross Ref
D. Radev, T. Allison, S. Blair-Goldensohn, J. Blitzer, A. Çelebi, S. Dimitrov, E. Drabek, A. Hakim, W. Lam, D. Liu, J. Otterbacher, H. Qi, H. Saggion, S. Teufel, M. Topper, A. Winkel, and Z. Zhang. MEAD - a platform for multidocument multilingual text summarization. In LREC 2004, Lisbon, Portugal, May 2004.Google Scholar
N. Rasiwasia, J. Pereira, E. Coviello, G. Doyle, G. Lanckriet, R. Levy, and N. Vasconcelos. A New Approach to Cross-Modal Multimedia Retrieval. In Proceedings of ACM International Conference on Multimedia. ACM, 2010. Google ScholarDigital Library
A. Vedaldi, V. Gulshan, M. Varma, and A. Zisserman. Multiple kernels for object detection. In IEEE International Conference on Computer Vision, pages 606--613. IEEE, 2010.Google Scholar
G. Wang, D. Hoiem, and D. Forsyth. Building text features for object image classification. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1367--1374. IEEE, 2009.Google ScholarCross Ref
T. Westerveld. Probabilistic multimedia retrieval. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pages 437--438. ACM, 2002. Google ScholarDigital Library
B. Yao, X. Yang, L. Lin, M. Lee, and S. Zhu. I2T: Image parsing to text description. Proceedings of the IEEE, 98(8):1485--1508, 2010.Google ScholarCross Ref

Index Terms

Learning to summarize web image and text mutually
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Topic-driven reader comments summarization
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

Readers of a news article often read its comments contributed by other readers. By reading comments, readers obtain not only complementary information about this news article but also the opinions from other readers. However, the existing ranking ...
Read More
Topic sentiment change analysis
MLDM'11: Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition

Public opinions on a topic may change over time. Topic Sentiment change analysis is a new research problem consisting of two main components: (a) mining opinions on a certain topic, and (b) detect significant changes of sentiment of the opinions on the ...
Read More
Research on Multi-document Summarization Based on LDA Topic Model
IHMSC '14: Proceedings of the 2014 Sixth International Conference on Intelligent Human-Machine Systems and Cybernetics - Volume 02

Compared with VSM (Vector Space Model) and graph-ranking models, LDA (Latent Dirichlet Allocation) Model can discover latent topics in the corpus and latent topics are beneficial to use sentence-ranking mechanisms to form a good summary. In the paper, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMR '12: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
June 2012
489 pages
ISBN:9781450313292
DOI:10.1145/2324796
Conference Chairs:
Horace H. S. Ip
City University of Hong Kong
,
Yong Rui
Microsoft, China
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 June 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cross-media retrieval
image-text joint space
multiple kernel learning
mutual-summarization
topic model
Qualifiers
- research-article
Conference

Acceptance Rates
ICMR '12 Paper Acceptance Rate50of145submissions,34%Overall Acceptance Rate254of830submissions,31%
More
Upcoming Conference
ICMR '24

Sponsor:

sigmm

International Conference on Multimedia Retrieval

June 10 - 14, 2024

Phuket , Thailand
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 492
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Learning to summarize web image and text mutually

ICMR '12: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Topic-driven reader comments summarization

Topic sentiment change analysis

Research on Multi-document Summarization Based on LDA Topic Model