research-article

Enhancing diversity, coverage and balance for summarization through structure learning

Authors:
Liangda Li

Shanghai Jiao-Tong University, Shanghai, China

Shanghai Jiao-Tong University, Shanghai, China
View Profile

,
Ke Zhou

Shanghai Jiao-Tong University, Shanghai, China

Shanghai Jiao-Tong University, Shanghai, China
View Profile

,
Gui-Rong Xue

Shanghai Jiao-Tong University, Shanghai, China

Shanghai Jiao-Tong University, Shanghai, China
View Profile

,
Hongyuan Zha

Georgia Institute of Technology, Atlanta, GA, USA

Georgia Institute of Technology, Atlanta, GA, USA
View Profile

,
Yong Yu

Shanghai Jiao-Tong University, Shanghai, China

Shanghai Jiao-Tong University, Shanghai, China
View Profile

WWW '09: Proceedings of the 18th international conference on World wide webApril 2009Pages 71–80https://doi.org/10.1145/1526709.1526720

Published:20 April 2009Publication History

WWW '09: Proceedings of the 18th international conference on World wide web

Pages 71–80

ABSTRACT

Document summarization plays an increasingly important role with the exponential growth of documents on the Web. Many supervised and unsupervised approaches have been proposed to generate summaries from documents. However, these approaches seldom simultaneously consider summary diversity, coverage, and balance issues which to a large extent determine the quality of summaries. In this paper, we consider extract-based summarization emphasizing the following three requirements: 1) diversity in summarization, which seeks to reduce redundancy among sentences in the summary; 2) sufficient coverage, which focuses on avoiding the loss of the document's main information when generating the summary; and 3) balance, which demands that different aspects of the document need to have about the same relative importance in the summary. We formulate the extract-based summarization problem as learning a mapping from a set of sentences of a given document to a subset of the sentences that satisfies the above three requirements. The mapping is learned by incorporating several constraints in a structure learning framework, and we explore the graph structure of the output variables and employ structural SVM for solving the resulted optimization problem. Experiments on the DUC2001 data sets demonstrate significant performance improvements in terms of F1 and ROUGE metrics.

References

Ibm many aspects document summarization tool, http://www.alphaworks.ibm.com/tech/manyaspects.Google Scholar
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In WWW, pages 107--117, Amsterdam, The Netherlands, The Netherlands, 1998. Elsevier Science Publishers B. V. Google ScholarDigital Library
J. Carbonell and J. Goldstein. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In SIGIR, pages 335--336, New York, NY, USA, 1998. ACM. Google ScholarDigital Library
C. L. A. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Buttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In SIGIR, pages 659--666, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
J. M. Conroy and D. P. O'leary. Text summarization via hidden markov models. In SIGIR, pages 406--407, New York, NY, USA, 2001. ACM. Google ScholarDigital Library
M. desJardins, E. Eaton, and K. Wagsta. Learning user preferences for sets of objects. In ICML, pages 273--280, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
G. ErKan and D. R. Radev. Lexpagerank: Prestige in multi-document text summarization. In EMNLP, Barcelona, Spain, 2004.Google Scholar
J. Friedman, T. Hastie, and R. Tibshirani. The Elements of Statistical Learning: Data Mining, Inference and Prediction. newblock 2001.Google Scholar
J. Goldstein, V. Mittal, J. Carbonell, and M. Kantrowitz. Multi--document summarization by sentence extraction. In NAACL-ANLP, pages 40--48, Morristown, NJ, USA, 2000. Association for Computational Linguistics. Google ScholarDigital Library
Y. H. Gong and X. Liu. Generic text summarization using relevance measure and latent semantic analysis. In SIGIR, pages 19--25, New York, NY, USA, 2001. ACM. Google ScholarDigital Library
S. Harabagiu and F. Lacatusu. Topic themes for multi-document summarization. In SIGIR, pages 202--209, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
H. Hardy, N. Shimizu, T. Strzalkowski, L. Ting, G. B. Wise, and X. Zhang. Cross--document summarization by concept classification. In SIGIR, pages 121--128, New York, NY, USA, 2002. ACM. Google ScholarDigital Library
H. Jing and K. R. McKeown. Cut and paste based text summarization. In ANLP, pages 178--185, San Francisco, CA, USA, 2000. Morgan Kaufmann Publishers Inc. Google ScholarDigital Library
S. Khuller, A. Moss, and J. Naor. The budgeted maximum coverage problem. Information Processing Letters, 70(1):39--45, 1999. Google ScholarDigital Library
J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46(5):604--632, 1999. Google ScholarDigital Library
K. Knight and D. Marcu. Summarization beyond sentence extraction: A probabilistic approach to sentence compression. Artificial Intelligence, 139(1):91--107, 2002. Google ScholarDigital Library
J. Kupiec, J. Pedersen, and F. Chen. A trainable document summarizer. In SIGIR, pages 68--73, New York, NY, USA, 1995. ACM. Google ScholarDigital Library
C. Y. Lin and E. Hovy. Automatic evaluation of summaries using n-gram co-occurrence statistics. In NAACL, pages 71--78, Morristown, NJ, USA, 2003. Association for Computational Linguistics. Google ScholarDigital Library
D. Metzler and T. Kanungo. Machine learned sentence selection strategies for query-biased summarization. In SIGIR, 2008.Google Scholar
R. Mihalcea. Language independent extractive summarization. In AAAI, pages 1688--1689, 2005. Google ScholarDigital Library
R. Mihalcea and P. Tarau. Textrank: Bringing order into texts. In EMNLP, Barcelona, Spain, 2004.Google Scholar
M. Naaman and L. Kennedy. Generating diverse and representative image search results for landmarks. In WWW, pages 297--306, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
T. Nomoto and Y. Matsumoto. A new approach to unsupervised text summarization. In SIGIR, pages 26--34, New York, NY, USA, 2001. ACM. Google ScholarDigital Library
C. V. Rijsbergen. Information Retrieval. 1979. Google ScholarDigital Library
D. Shen, Z. Chen, Q. Yang, H. J. Zeng, B. Zhang, Y. Lu, and W. Y. Ma. Web-page classification through summarization. In SIGIR, pages 242--249, New York, NY, USA, 2004. ACM. Google ScholarDigital Library
D. Shen, J. T. Sun, H. Li, Q. Yang, and Z. Chen. Document summarization using conditional random fields. In IJCAI, pages 2862--2867, 2007. Google ScholarDigital Library
J. T. Sun, D. Shen, H. J. Zeng, Q. Yang, Y. C. Lu, and Z. Chen. Web-page summarization using clickthrough data. In SIGIR, pages 194--201, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Large margin methods for structured and interdependent output variables. JMLR, 6:1453--1484, 2005. Google ScholarDigital Library
K. Wagsta, M. desJardins, E. Eaton, and J. Montminy. Learning and visualizing user preferences over sets. In AAAI, 2007.Google Scholar
X. Wan, J. Yang, and J. Xiao. Collabsum: Exploiting multiple document clustering for collaborative single document summarizations. In SIGIR, pages 143--150, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
Y. Yue and T. Joachims. Predicting diverse subsets using structural svms. In ICML, pages 1224--1231, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
H. Y. Zha. Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering. In SIGIR, pages 113--120, New York, NY, USA, 2002. ACM. Google ScholarDigital Library

Index Terms

Enhancing diversity, coverage and balance for summarization through structure learning
1. Information systems
  1. Information retrieval
    1. Document representation
2. Theory of computation
  1. Semantics and reasoning
    1. Program reasoning
      1. Abstraction

Recommendations

Intertopic information mining for query-based summarization

In this article, the authors address the problem of sentence ranking in summarization. Although most existing summarization approaches are concerned with the information embodied in a particular topic (including a set of documents and an associated ...
Read More
Video summarization via transferrable structured learning
WWW '11: Proceedings of the 20th international conference on World wide web

It is well-known that textual information such as video transcripts and video reviews can significantly enhance the performance of video summarization algorithms. Unfortunately, many videos on the Web such as those from the popular video sharing site ...
Read More
Topic and sentiment aware microblog summarization for twitter
Abstract
Recent advances in microblog content summarization has primarily viewed this task in the context of traditional multi-document summarization techniques where a microblog post or their collection form one document. While these techniques already ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '09: Proceedings of the 18th international conference on World wide web
April 2009
1280 pages
ISBN:9781605584874
DOI:10.1145/1526709
General Chairs:
Juan Quemada
DIT-UPM
,
Gonzalo León
DIT-UPM
,
Program Chairs:
Yoelle Maarek
Google Inc., Israel
,
Wolfgang Nejdl
L3S and Hannover University
Copyright © 2009 IW3C2 org
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 April 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
balance
coverage
diversity
structural svm
summarization
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Upcoming Conference
WWW '24

Sponsor:

sigweb

The ACM Web Conference 2024

May 13 - 17, 2024

Singapore , Singapore
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 94
  Total Citations
  View Citations
- 812
  Total Downloads
- Downloads (Last 12 months)24
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Enhancing diversity, coverage and balance for summarization through structure learning

WWW '09: Proceedings of the 18th international conference on World wide web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Intertopic information mining for query-based summarization

Video summarization via transferrable structured learning

Topic and sentiment aware microblog summarization for twitter