ABSTRACT
Document summarization plays an increasingly important role with the exponential growth of documents on the Web. Many supervised and unsupervised approaches have been proposed to generate summaries from documents. However, these approaches seldom simultaneously consider summary diversity, coverage, and balance issues which to a large extent determine the quality of summaries. In this paper, we consider extract-based summarization emphasizing the following three requirements: 1) diversity in summarization, which seeks to reduce redundancy among sentences in the summary; 2) sufficient coverage, which focuses on avoiding the loss of the document's main information when generating the summary; and 3) balance, which demands that different aspects of the document need to have about the same relative importance in the summary. We formulate the extract-based summarization problem as learning a mapping from a set of sentences of a given document to a subset of the sentences that satisfies the above three requirements. The mapping is learned by incorporating several constraints in a structure learning framework, and we explore the graph structure of the output variables and employ structural SVM for solving the resulted optimization problem. Experiments on the DUC2001 data sets demonstrate significant performance improvements in terms of F1 and ROUGE metrics.
- Ibm many aspects document summarization tool, http://www.alphaworks.ibm.com/tech/manyaspects.Google Scholar
- S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In WWW, pages 107--117, Amsterdam, The Netherlands, The Netherlands, 1998. Elsevier Science Publishers B. V. Google ScholarDigital Library
- J. Carbonell and J. Goldstein. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In SIGIR, pages 335--336, New York, NY, USA, 1998. ACM. Google ScholarDigital Library
- C. L. A. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Buttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In SIGIR, pages 659--666, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- J. M. Conroy and D. P. O'leary. Text summarization via hidden markov models. In SIGIR, pages 406--407, New York, NY, USA, 2001. ACM. Google ScholarDigital Library
- M. desJardins, E. Eaton, and K. Wagsta. Learning user preferences for sets of objects. In ICML, pages 273--280, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
- G. ErKan and D. R. Radev. Lexpagerank: Prestige in multi-document text summarization. In EMNLP, Barcelona, Spain, 2004.Google Scholar
- J. Friedman, T. Hastie, and R. Tibshirani. The Elements of Statistical Learning: Data Mining, Inference and Prediction. newblock 2001.Google Scholar
- J. Goldstein, V. Mittal, J. Carbonell, and M. Kantrowitz. Multi--document summarization by sentence extraction. In NAACL-ANLP, pages 40--48, Morristown, NJ, USA, 2000. Association for Computational Linguistics. Google ScholarDigital Library
- Y. H. Gong and X. Liu. Generic text summarization using relevance measure and latent semantic analysis. In SIGIR, pages 19--25, New York, NY, USA, 2001. ACM. Google ScholarDigital Library
- S. Harabagiu and F. Lacatusu. Topic themes for multi-document summarization. In SIGIR, pages 202--209, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
- H. Hardy, N. Shimizu, T. Strzalkowski, L. Ting, G. B. Wise, and X. Zhang. Cross--document summarization by concept classification. In SIGIR, pages 121--128, New York, NY, USA, 2002. ACM. Google ScholarDigital Library
- H. Jing and K. R. McKeown. Cut and paste based text summarization. In ANLP, pages 178--185, San Francisco, CA, USA, 2000. Morgan Kaufmann Publishers Inc. Google ScholarDigital Library
- S. Khuller, A. Moss, and J. Naor. The budgeted maximum coverage problem. Information Processing Letters, 70(1):39--45, 1999. Google ScholarDigital Library
- J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46(5):604--632, 1999. Google ScholarDigital Library
- K. Knight and D. Marcu. Summarization beyond sentence extraction: A probabilistic approach to sentence compression. Artificial Intelligence, 139(1):91--107, 2002. Google ScholarDigital Library
- J. Kupiec, J. Pedersen, and F. Chen. A trainable document summarizer. In SIGIR, pages 68--73, New York, NY, USA, 1995. ACM. Google ScholarDigital Library
- C. Y. Lin and E. Hovy. Automatic evaluation of summaries using n-gram co-occurrence statistics. In NAACL, pages 71--78, Morristown, NJ, USA, 2003. Association for Computational Linguistics. Google ScholarDigital Library
- D. Metzler and T. Kanungo. Machine learned sentence selection strategies for query-biased summarization. In SIGIR, 2008.Google Scholar
- R. Mihalcea. Language independent extractive summarization. In AAAI, pages 1688--1689, 2005. Google ScholarDigital Library
- R. Mihalcea and P. Tarau. Textrank: Bringing order into texts. In EMNLP, Barcelona, Spain, 2004.Google Scholar
- M. Naaman and L. Kennedy. Generating diverse and representative image search results for landmarks. In WWW, pages 297--306, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- T. Nomoto and Y. Matsumoto. A new approach to unsupervised text summarization. In SIGIR, pages 26--34, New York, NY, USA, 2001. ACM. Google ScholarDigital Library
- C. V. Rijsbergen. Information Retrieval. 1979. Google ScholarDigital Library
- D. Shen, Z. Chen, Q. Yang, H. J. Zeng, B. Zhang, Y. Lu, and W. Y. Ma. Web-page classification through summarization. In SIGIR, pages 242--249, New York, NY, USA, 2004. ACM. Google ScholarDigital Library
- D. Shen, J. T. Sun, H. Li, Q. Yang, and Z. Chen. Document summarization using conditional random fields. In IJCAI, pages 2862--2867, 2007. Google ScholarDigital Library
- J. T. Sun, D. Shen, H. J. Zeng, Q. Yang, Y. C. Lu, and Z. Chen. Web-page summarization using clickthrough data. In SIGIR, pages 194--201, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
- I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Large margin methods for structured and interdependent output variables. JMLR, 6:1453--1484, 2005. Google ScholarDigital Library
- K. Wagsta, M. desJardins, E. Eaton, and J. Montminy. Learning and visualizing user preferences over sets. In AAAI, 2007.Google Scholar
- X. Wan, J. Yang, and J. Xiao. Collabsum: Exploiting multiple document clustering for collaborative single document summarizations. In SIGIR, pages 143--150, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- Y. Yue and T. Joachims. Predicting diverse subsets using structural svms. In ICML, pages 1224--1231, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- H. Y. Zha. Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering. In SIGIR, pages 113--120, New York, NY, USA, 2002. ACM. Google ScholarDigital Library
Index Terms
- Enhancing diversity, coverage and balance for summarization through structure learning
Recommendations
Intertopic information mining for query-based summarization
In this article, the authors address the problem of sentence ranking in summarization. Although most existing summarization approaches are concerned with the information embodied in a particular topic (including a set of documents and an associated ...
Video summarization via transferrable structured learning
WWW '11: Proceedings of the 20th international conference on World wide webIt is well-known that textual information such as video transcripts and video reviews can significantly enhance the performance of video summarization algorithms. Unfortunately, many videos on the Web such as those from the popular video sharing site ...
Topic and sentiment aware microblog summarization for twitter
AbstractRecent advances in microblog content summarization has primarily viewed this task in the context of traditional multi-document summarization techniques where a microblog post or their collection form one document. While these techniques already ...
Comments