skip to main content
research-article

Toward a Unified Framework for Standard and Update Multi-Document Summarization

Published:01 June 2012Publication History
Skip Abstract Section

Abstract

This article presents a unified framework for extracting standard and update summaries from a set of documents. In particular, a topic modeling approach is employed for salience determination and a dynamic modeling approach is proposed for redundancy control. In the topic modeling approach for salience determination, we represent various kinds of text units, such as word, sentence, document, documents, and summary, using a single vector space model via their corresponding probability distributions over the inherent topics of given documents or a related corpus. Therefore, we are able to calculate the similarity between any two text units via their topic probability distributions. In the dynamic modeling approach for redundancy control, we consider the similarity between the summary and the given documents, and the similarity between the sentence and the summary, besides the similarity between the sentence and the given documents, for standard summarization while for update summarization, we also consider the similarity between the sentence and the history documents or summary. Evaluation on TAC 2008 and 2009 in English language shows encouraging results, especially the dynamic modeling approach in removing the redundancy in the given documents. Finally, we extend the framework to Chinese multi-document summarization and experiments show the effectiveness of our framework.

References

  1. Allan, J., Wade, C., and Boliva, A. R. 2003. Retrieval and novelty detection at the sentence level. In Proceedings of the 26th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR’03). 314--321. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Arora, R. and Ravindran, B. 2008a. Latent Dirichlet allocation based multi-document summarization. In Proceedings of the 2nd Workshop on Analytics for Noisy Unstructured Text Data (ANUTD’08). 91--97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Arora, R. and Ravindran, B. 2008b. Latent Dirichlet Allocation and Singular Value Decomposition-Based Multi-Document Summarization. In Proceedings of the International Conference on Data Mining (ICDM’08). 713--718. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bhandari, H., Shimbo, M., Ito, T., and Matsumoto, Y. 2008. Generic text summarization using probabilistic latent semantic indexing. In Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP’08). 133--140.Google ScholarGoogle Scholar
  5. Blei, D. M., Ng, A. Y., and Jordan, M. I. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993--1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Brin, S. and Page, L. 1998. The anatomy of a large-scale hypertextual Web search engine. Comp. Netw. 30, 1--7, 107--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Carbonell, J. and Goldstein, J. 1998. Use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR’98). 335--336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Dang, H. T. and Owczarzak, K. 2008. Overview of the TAC 2008 update summarization task. In Proceedings of the 1st Text Analysis Conference (TAC’08).Google ScholarGoogle Scholar
  9. Edmundson, H. P. 1969. New methods in automatic extracting. J. ACM 16, 2, 264--285. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Erkan, G. and Radev, D. R. 2004. LexPageRank: Prestige in multi-document text summarization. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’04). 365--371.Google ScholarGoogle Scholar
  11. Gillick, D., Favre, B., and Hakkani-Tur, D. 2008. The ICSI summarization system at TAC 2008. In Proceedings of the 1st Text Analysis Conference (TAC’08).Google ScholarGoogle Scholar
  12. Gillick, D., Favre, B., Hakkani-Tur, D., Bohnet, B., Liu, Y., and Xie, S. 2009. The ICSI/UTD summarization system at TAC 2009. In Proceedings of the 2nd Text Analysis Conference (TAC’09).Google ScholarGoogle Scholar
  13. Haghighi, A. and Vanderwende, L. 2009. Exploring content models for multi-document summarization. The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (ACL’09). 362--370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jones, K. 1999. Automatic summarizing: Factors and directions. In Advances in Automatic Text Summarization, MIT Press, 1--12.Google ScholarGoogle Scholar
  15. Jones, K. 2007. Automatic summarizing: The state of the art. Inf. Proc. Man. 43, 6, 1449--1481. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Kleinberg, J. and Authoritative, M. 1998. Sources in a hyperlinked environment. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SIAM’98). 668--677. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Kullback, S. and Leibler, R. A. 1951. On information and sufficiency. Annals Math. Stat. 22, 1, 79--86.Google ScholarGoogle ScholarCross RefCross Ref
  18. Larkey, L. S., Allan, J., Connell, M. E., Bolivar, A., and Wade, C. 2003. UMass at TREC 2002: Cross Language and Novelty Tracks. Nat. Inst. Stand. Tech. 721--732.Google ScholarGoogle Scholar
  19. Lin, C. Y. and Hovy, E. H. 2003. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of 2003 Language Technology Conference (HLT-NAACL’03). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Liu, D., Wang, Y., Liu, C., and Wang, Z. 2006. Multiple documents summarization based on genetic algorithm. Fuzzy System and Knowledge Discovery, Lecture Notes in Computer Science, vol. 4223, 355--364. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Mihalcea, R. 2005. Language independent extractive summarization. In Proceedings of the ACL Interactive Poster and Demonstration Sessions (ACL’05). 49--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Mani, I. and Bloedorn, E. 1999. Summarizing similarities and differences among related documents. Inf. Retriev. 1, 1, 35--67. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Nastase, V. 2008. Topic-driven multi-document summarization with encyclopedic knowledge and spreading activation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’08). 763--772. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Park, S., Lee, J. H., Ahn, C. M., Hong, J. S., and Chun, S. J. 2006. Query based summarization using non-negative matrix factorization. In Proceeding of International Conference on Knowledge-Based and Intelligent Information & Engineering Systems (KES’’06). 84--89. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Radev, D. R., Jing, H., and Budzikowska, M. 2000. Centroid-based summarization of multiple documents: Sentence extraction, utility-based evaluation, and user studies. In Proceedings of the ANLP-NAACL Workshop on Summarization (ANLP-NAACL’00). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Radev, D. R., Jing, H., and Budzikowska, M. 2001. Experiments in single and multiple documents summarization using MEAD. In Proceedings of the Document Understanding Conference (DUC’01).Google ScholarGoogle Scholar
  27. Steinberger, J. and Jezek, K. 2004. Using latent semantic analysis in text summarization and summary evaluation. In Proceedings of ISIM (ISIM’04). 93--100.Google ScholarGoogle Scholar
  28. Torralbo, R., Alfonseca, E., Guirao, J. M., and Moreno-Sandoval, A. 2005. Description of the UAM system at DUC-2005. In Proceedings of the Document Understanding Conference Workshop 2005 at HLT/EMNLP 2005 (HLT/EMNLP’05).Google ScholarGoogle Scholar
  29. Varadarajan, R. and Hristidis, V. 2006. A system for query-specific document summarization. In Proceedings of the 15th ACM International Conference and Information and Knowledge Management (CIKM’06). 622--631. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Wang, D., Zhu, S., Li, T., and Gong, Y. 2009. Multi-document summarization using sentence-based topic models. In Proceedings of the International Joint Conference on Natural Language Processing Conference Short Paper (INCNLP’09). 297--300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Xu, Y. D., Xu, Z. M., and Wang, X. L. 2007. Multi-document automatic summarization technique based on information fusion. Chin. J. Comp. 30, 11, 2048--2054.Google ScholarGoogle Scholar

Index Terms

  1. Toward a Unified Framework for Standard and Update Multi-Document Summarization

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian Language Information Processing
      ACM Transactions on Asian Language Information Processing  Volume 11, Issue 2
      June 2012
      109 pages
      ISSN:1530-0226
      EISSN:1558-3430
      DOI:10.1145/2184436
      Issue’s Table of Contents

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 June 2012
      • Accepted: 1 July 2011
      • Revised: 1 June 2011
      • Received: 1 November 2010
      Published in talip Volume 11, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader