research-article

Toward a Unified Framework for Standard and Update Multi-Document Summarization

Authors:
Hongling Wang

Soochow University

Soochow University
View Profile

,
Guodong Zhou

Soochow University

Soochow University
View Profile

ACM Transactions on Asian Language Information Processing Volume 11 Issue 2Article No.: 5pp 1–18https://doi.org/10.1145/2184436.2184438

Published:01 June 2012Publication History

ACM Transactions on Asian Language Information Processing

Abstract

This article presents a unified framework for extracting standard and update summaries from a set of documents. In particular, a topic modeling approach is employed for salience determination and a dynamic modeling approach is proposed for redundancy control. In the topic modeling approach for salience determination, we represent various kinds of text units, such as word, sentence, document, documents, and summary, using a single vector space model via their corresponding probability distributions over the inherent topics of given documents or a related corpus. Therefore, we are able to calculate the similarity between any two text units via their topic probability distributions. In the dynamic modeling approach for redundancy control, we consider the similarity between the summary and the given documents, and the similarity between the sentence and the summary, besides the similarity between the sentence and the given documents, for standard summarization while for update summarization, we also consider the similarity between the sentence and the history documents or summary. Evaluation on TAC 2008 and 2009 in English language shows encouraging results, especially the dynamic modeling approach in removing the redundancy in the given documents. Finally, we extend the framework to Chinese multi-document summarization and experiments show the effectiveness of our framework.

References

Allan, J., Wade, C., and Boliva, A. R. 2003. Retrieval and novelty detection at the sentence level. In Proceedings of the 26th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR’03). 314--321. Google ScholarDigital Library
Arora, R. and Ravindran, B. 2008a. Latent Dirichlet allocation based multi-document summarization. In Proceedings of the 2nd Workshop on Analytics for Noisy Unstructured Text Data (ANUTD’08). 91--97. Google ScholarDigital Library
Arora, R. and Ravindran, B. 2008b. Latent Dirichlet Allocation and Singular Value Decomposition-Based Multi-Document Summarization. In Proceedings of the International Conference on Data Mining (ICDM’08). 713--718. Google ScholarDigital Library
Bhandari, H., Shimbo, M., Ito, T., and Matsumoto, Y. 2008. Generic text summarization using probabilistic latent semantic indexing. In Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP’08). 133--140.Google Scholar
Blei, D. M., Ng, A. Y., and Jordan, M. I. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993--1022. Google ScholarDigital Library
Brin, S. and Page, L. 1998. The anatomy of a large-scale hypertextual Web search engine. Comp. Netw. 30, 1--7, 107--117. Google ScholarDigital Library
Carbonell, J. and Goldstein, J. 1998. Use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR’98). 335--336. Google ScholarDigital Library
Dang, H. T. and Owczarzak, K. 2008. Overview of the TAC 2008 update summarization task. In Proceedings of the 1st Text Analysis Conference (TAC’08).Google Scholar
Edmundson, H. P. 1969. New methods in automatic extracting. J. ACM 16, 2, 264--285. Google ScholarDigital Library
Erkan, G. and Radev, D. R. 2004. LexPageRank: Prestige in multi-document text summarization. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’04). 365--371.Google Scholar
Gillick, D., Favre, B., and Hakkani-Tur, D. 2008. The ICSI summarization system at TAC 2008. In Proceedings of the 1st Text Analysis Conference (TAC’08).Google Scholar
Gillick, D., Favre, B., Hakkani-Tur, D., Bohnet, B., Liu, Y., and Xie, S. 2009. The ICSI/UTD summarization system at TAC 2009. In Proceedings of the 2nd Text Analysis Conference (TAC’09).Google Scholar
Haghighi, A. and Vanderwende, L. 2009. Exploring content models for multi-document summarization. The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (ACL’09). 362--370. Google ScholarDigital Library
Jones, K. 1999. Automatic summarizing: Factors and directions. In Advances in Automatic Text Summarization, MIT Press, 1--12.Google Scholar
Jones, K. 2007. Automatic summarizing: The state of the art. Inf. Proc. Man. 43, 6, 1449--1481. Google ScholarDigital Library
Kleinberg, J. and Authoritative, M. 1998. Sources in a hyperlinked environment. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SIAM’98). 668--677. Google ScholarDigital Library
Kullback, S. and Leibler, R. A. 1951. On information and sufficiency. Annals Math. Stat. 22, 1, 79--86.Google ScholarCross Ref
Larkey, L. S., Allan, J., Connell, M. E., Bolivar, A., and Wade, C. 2003. UMass at TREC 2002: Cross Language and Novelty Tracks. Nat. Inst. Stand. Tech. 721--732.Google Scholar
Lin, C. Y. and Hovy, E. H. 2003. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of 2003 Language Technology Conference (HLT-NAACL’03). Google ScholarDigital Library
Liu, D., Wang, Y., Liu, C., and Wang, Z. 2006. Multiple documents summarization based on genetic algorithm. Fuzzy System and Knowledge Discovery, Lecture Notes in Computer Science, vol. 4223, 355--364. Google ScholarDigital Library
Mihalcea, R. 2005. Language independent extractive summarization. In Proceedings of the ACL Interactive Poster and Demonstration Sessions (ACL’05). 49--52. Google ScholarDigital Library
Mani, I. and Bloedorn, E. 1999. Summarizing similarities and differences among related documents. Inf. Retriev. 1, 1, 35--67. Google ScholarDigital Library
Nastase, V. 2008. Topic-driven multi-document summarization with encyclopedic knowledge and spreading activation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’08). 763--772. Google ScholarDigital Library
Park, S., Lee, J. H., Ahn, C. M., Hong, J. S., and Chun, S. J. 2006. Query based summarization using non-negative matrix factorization. In Proceeding of International Conference on Knowledge-Based and Intelligent Information & Engineering Systems (KES’’06). 84--89. Google ScholarDigital Library
Radev, D. R., Jing, H., and Budzikowska, M. 2000. Centroid-based summarization of multiple documents: Sentence extraction, utility-based evaluation, and user studies. In Proceedings of the ANLP-NAACL Workshop on Summarization (ANLP-NAACL’00). Google ScholarDigital Library
Radev, D. R., Jing, H., and Budzikowska, M. 2001. Experiments in single and multiple documents summarization using MEAD. In Proceedings of the Document Understanding Conference (DUC’01).Google Scholar
Steinberger, J. and Jezek, K. 2004. Using latent semantic analysis in text summarization and summary evaluation. In Proceedings of ISIM (ISIM’04). 93--100.Google Scholar
Torralbo, R., Alfonseca, E., Guirao, J. M., and Moreno-Sandoval, A. 2005. Description of the UAM system at DUC-2005. In Proceedings of the Document Understanding Conference Workshop 2005 at HLT/EMNLP 2005 (HLT/EMNLP’05).Google Scholar
Varadarajan, R. and Hristidis, V. 2006. A system for query-specific document summarization. In Proceedings of the 15th ACM International Conference and Information and Knowledge Management (CIKM’06). 622--631. Google ScholarDigital Library
Wang, D., Zhu, S., Li, T., and Gong, Y. 2009. Multi-document summarization using sentence-based topic models. In Proceedings of the International Joint Conference on Natural Language Processing Conference Short Paper (INCNLP’09). 297--300. Google ScholarDigital Library
Xu, Y. D., Xu, Z. M., and Wang, X. L. 2007. Multi-document automatic summarization technique based on information fusion. Chin. J. Comp. 30, 11, 2048--2054.Google Scholar

Index Terms

Toward a Unified Framework for Standard and Update Multi-Document Summarization
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources

Recommendations

Latent dirichlet allocation based multi-document summarization
AND '08: Proceedings of the second workshop on Analytics for noisy unstructured text data

Extraction based Multi-Document Summarization Algorithms consist of choosing sentences from the documents using some weighting mechanism and combining them into a summary. In this article we use Latent Dirichlet Allocation to capture the events being ...
Read More
Topic-Driven Multi-document Summarization
IALP '10: Proceedings of the 2010 International Conference on Asian Language Processing

This paper presents a topic-driven framework for generating a generic summary from multi-documents. Our approach is based on the intuition that, from the statistical point of view, the summary’s probability distribution over the topics should be ...
Read More
Research on Multi-document Summarization Based on LDA Topic Model
IHMSC '14: Proceedings of the 2014 Sixth International Conference on Intelligent Human-Machine Systems and Cybernetics - Volume 02

Compared with VSM (Vector Space Model) and graph-ranking models, LDA (Latent Dirichlet Allocation) Model can discover latent topics in the corpus and latent topics are beneficial to use sentence-ranking mechanisms to form a good summary. In the paper, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Asian Language Information Processing Volume 11, Issue 2
June 2012
109 pages
ISSN:1530-0226
EISSN:1558-3430
DOI:10.1145/2184436
Issue’s Table of Contents

Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 June 2012
- Accepted: 1 July 2011
- Revised: 1 June 2011
- Received: 1 November 2010
Published in talip Volume 11, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Multi-document summarization
dynamic modeling
latent Dirichlet allocation
topic modeling
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 306
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Toward a Unified Framework for Standard and Update Multi-Document Summarization

ACM Transactions on Asian Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Latent dirichlet allocation based multi-document summarization

Topic-Driven Multi-document Summarization

Research on Multi-document Summarization Based on LDA Topic Model

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Toward a Unified Framework for Standard and Update Multi-Document Summarization

ACM Transactions on Asian Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Latent dirichlet allocation based multi-document summarization

Topic-Driven Multi-document Summarization

Research on Multi-document Summarization Based on LDA Topic Model

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media