skip to main content
10.1145/1526709.1526720acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Enhancing diversity, coverage and balance for summarization through structure learning

Authors Info & Claims
Published:20 April 2009Publication History

ABSTRACT

Document summarization plays an increasingly important role with the exponential growth of documents on the Web. Many supervised and unsupervised approaches have been proposed to generate summaries from documents. However, these approaches seldom simultaneously consider summary diversity, coverage, and balance issues which to a large extent determine the quality of summaries. In this paper, we consider extract-based summarization emphasizing the following three requirements: 1) diversity in summarization, which seeks to reduce redundancy among sentences in the summary; 2) sufficient coverage, which focuses on avoiding the loss of the document's main information when generating the summary; and 3) balance, which demands that different aspects of the document need to have about the same relative importance in the summary. We formulate the extract-based summarization problem as learning a mapping from a set of sentences of a given document to a subset of the sentences that satisfies the above three requirements. The mapping is learned by incorporating several constraints in a structure learning framework, and we explore the graph structure of the output variables and employ structural SVM for solving the resulted optimization problem. Experiments on the DUC2001 data sets demonstrate significant performance improvements in terms of F1 and ROUGE metrics.

References

  1. Ibm many aspects document summarization tool, http://www.alphaworks.ibm.com/tech/manyaspects.Google ScholarGoogle Scholar
  2. S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In WWW, pages 107--117, Amsterdam, The Netherlands, The Netherlands, 1998. Elsevier Science Publishers B. V. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Carbonell and J. Goldstein. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In SIGIR, pages 335--336, New York, NY, USA, 1998. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. L. A. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Buttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In SIGIR, pages 659--666, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. M. Conroy and D. P. O'leary. Text summarization via hidden markov models. In SIGIR, pages 406--407, New York, NY, USA, 2001. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. desJardins, E. Eaton, and K. Wagsta. Learning user preferences for sets of objects. In ICML, pages 273--280, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. ErKan and D. R. Radev. Lexpagerank: Prestige in multi-document text summarization. In EMNLP, Barcelona, Spain, 2004.Google ScholarGoogle Scholar
  8. J. Friedman, T. Hastie, and R. Tibshirani. The Elements of Statistical Learning: Data Mining, Inference and Prediction. newblock 2001.Google ScholarGoogle Scholar
  9. J. Goldstein, V. Mittal, J. Carbonell, and M. Kantrowitz. Multi--document summarization by sentence extraction. In NAACL-ANLP, pages 40--48, Morristown, NJ, USA, 2000. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y. H. Gong and X. Liu. Generic text summarization using relevance measure and latent semantic analysis. In SIGIR, pages 19--25, New York, NY, USA, 2001. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Harabagiu and F. Lacatusu. Topic themes for multi-document summarization. In SIGIR, pages 202--209, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. H. Hardy, N. Shimizu, T. Strzalkowski, L. Ting, G. B. Wise, and X. Zhang. Cross--document summarization by concept classification. In SIGIR, pages 121--128, New York, NY, USA, 2002. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. H. Jing and K. R. McKeown. Cut and paste based text summarization. In ANLP, pages 178--185, San Francisco, CA, USA, 2000. Morgan Kaufmann Publishers Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Khuller, A. Moss, and J. Naor. The budgeted maximum coverage problem. Information Processing Letters, 70(1):39--45, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46(5):604--632, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. K. Knight and D. Marcu. Summarization beyond sentence extraction: A probabilistic approach to sentence compression. Artificial Intelligence, 139(1):91--107, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Kupiec, J. Pedersen, and F. Chen. A trainable document summarizer. In SIGIR, pages 68--73, New York, NY, USA, 1995. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Y. Lin and E. Hovy. Automatic evaluation of summaries using n-gram co-occurrence statistics. In NAACL, pages 71--78, Morristown, NJ, USA, 2003. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Metzler and T. Kanungo. Machine learned sentence selection strategies for query-biased summarization. In SIGIR, 2008.Google ScholarGoogle Scholar
  20. R. Mihalcea. Language independent extractive summarization. In AAAI, pages 1688--1689, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Mihalcea and P. Tarau. Textrank: Bringing order into texts. In EMNLP, Barcelona, Spain, 2004.Google ScholarGoogle Scholar
  22. M. Naaman and L. Kennedy. Generating diverse and representative image search results for landmarks. In WWW, pages 297--306, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. Nomoto and Y. Matsumoto. A new approach to unsupervised text summarization. In SIGIR, pages 26--34, New York, NY, USA, 2001. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. V. Rijsbergen. Information Retrieval. 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. Shen, Z. Chen, Q. Yang, H. J. Zeng, B. Zhang, Y. Lu, and W. Y. Ma. Web-page classification through summarization. In SIGIR, pages 242--249, New York, NY, USA, 2004. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. D. Shen, J. T. Sun, H. Li, Q. Yang, and Z. Chen. Document summarization using conditional random fields. In IJCAI, pages 2862--2867, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. T. Sun, D. Shen, H. J. Zeng, Q. Yang, Y. C. Lu, and Z. Chen. Web-page summarization using clickthrough data. In SIGIR, pages 194--201, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Large margin methods for structured and interdependent output variables. JMLR, 6:1453--1484, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. K. Wagsta, M. desJardins, E. Eaton, and J. Montminy. Learning and visualizing user preferences over sets. In AAAI, 2007.Google ScholarGoogle Scholar
  30. X. Wan, J. Yang, and J. Xiao. Collabsum: Exploiting multiple document clustering for collaborative single document summarizations. In SIGIR, pages 143--150, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Y. Yue and T. Joachims. Predicting diverse subsets using structural svms. In ICML, pages 1224--1231, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. H. Y. Zha. Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering. In SIGIR, pages 113--120, New York, NY, USA, 2002. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Enhancing diversity, coverage and balance for summarization through structure learning

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader