ABSTRACT
In this paper we describe a Cross Document Summarizer XDoX designed specifically to summarize large document sets (50-500 documents and more). Such sets of documents are typically obtained from routing or filtering systems run against a continuous stream of data, such as a newswire. XDoX works by identifying the most salient themes within the set (at the granularity level that is regulated by the user) and composing an extraction summary, which reflects these main themes. In the current version, XDoX is not optimized to produce a summary based on a few unrelated documents; indeed, such summaries are best obtained simply by concatenating summaries of individual documents. We show examples of summaries obtained in our tests as well as from our participation in the first Document Understanding Conference (DUC).
- Carbonell, J., and Goldstein, J. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of SIGIR (1998), 335-336. Google ScholarDigital Library
- Fellbaum, C. (ed.). WordNet - An Electronic Lexical Database. MIT Press, 1998.Google Scholar
- Firmin, T., and Chrzanowski, M. J. An Evaluation of Automatic Text Summarization Systems. In I. Mani and M. Maybury (eds.), Advances in Automatic Text Summarization. MIT Press, 1999.Google Scholar
- Hatzivassiloglou, V., Klavans, J. L., Holcombe, M. L., Barzilay, R., Kan, M., and McKeown, K. R. SimFinder: A Flexible Clustering Tool for Summarization. In NAACL 2001 Workshop on Automatic Summarization (Pittsburgh, PA), 41-49.Google Scholar
- Hearst, M. Multi-paragraph segmentation of expository text. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics (Las Cruces, NM, 1994), Association for Computational Linguistics, 9-16. Google ScholarDigital Library
- Kraaij, W., Spitters, M., and van der Heijden, M. Combining a mixture language model and Naïve Bayes for multi-document summarization. In SIGIR 2001 Workshop on Text Summarization (New Orleans, LA), 95-103.Google Scholar
- Lin, C. and Hovy, E. NEATS: A Multidocument Summarizer. In SIGIR 2001 Workshop on Text Summarization (New Orleans, LA), 131-134.Google Scholar
- Marcu, D. Discourse-Based Summarization in DUC-2001. In SIGIR 2001 Workshop on Text Summarization (New Orleans, LA), 109--116.Google Scholar
- McKeown, K. and Radev, D. Generating summaries of multiple news articles. In Proceedings, 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Seattle, WA, 1995), 74--82. Google ScholarDigital Library
- Miller, G.A. WordNet: A Lexical Database. Communication of the ACM 38, 11(1995), 39--41. Google ScholarDigital Library
- Mitra, M., Singhal, A., and Buckley, C. Automatic text summarization by paragraph extraction. In Proceedings of the ACL'97/EACL'97 Workshop on Intelligent Scalable Text Summarization (Madrid, Spain, 1997).Google Scholar
- Over, P. Introduction to DUC-2001: an Intrinsic Evaluation of Generic News Text Summarization Systems. http://www.itl.nist.gov/iaui/894.02/projects/duc/duc2001/pauls_slides/index.htm.Google Scholar
- Radev, D. R., Fan, W., and Zhang, Z. WebInEssence: A Personalized Web-Based Multi-Document Summarization and Recommendation System. In NAACL 2001 Workshop on Automatic Summarization (Pittsburgh, PA), 79--88.Google Scholar
- Robertson, S. E., Walker, S., Jones, S., Hancock-Beaulieu, M. M., and Gatford, M. Okapi at TREC-3. In Harman, D. (ed.), The Third Text Retrieval Conference (TREC-3). National Institute of Standards and Technology Special Publication 500-225, 1995, 219-230.Google Scholar
- Singhal, A., Buckley, C., and Mitra, M. Pivoted Document Length Normalization. SIGIR 1996, 21--29. Google ScholarDigital Library
- Stein, G., Strzalkowski, T., and Wise, B. Interactive, Text-Based Summarization of Multiple Documents. Computational Intelligence 16, 4 (2000), 606-613.Google Scholar
- Strzalkowski, T., Stein, G., Wang, J., and Wise, B. A Robust, Practical Text Summarizer. In I. Mani and M. Maybury (eds.), Advances in Automatic Text Summarization. MIT Press, 1999, 137-154.Google Scholar
- Willett, P. Recent trends in hierarchical document clustering: A critical review. Information Processing and Management, 24, 5 (1988). Google ScholarDigital Library
Index Terms
- Cross-document summarization by concept classification
Recommendations
Latent dirichlet allocation based multi-document summarization
AND '08: Proceedings of the second workshop on Analytics for noisy unstructured text dataExtraction based Multi-Document Summarization Algorithms consist of choosing sentences from the documents using some weighting mechanism and combining them into a summary. In this article we use Latent Dirichlet Allocation to capture the events being ...
Research on Multi-document Summarization Based on LDA Topic Model
IHMSC '14: Proceedings of the 2014 Sixth International Conference on Intelligent Human-Machine Systems and Cybernetics - Volume 02Compared with VSM (Vector Space Model) and graph-ranking models, LDA (Latent Dirichlet Allocation) Model can discover latent topics in the corpus and latent topics are beneficial to use sentence-ranking mechanisms to form a good summary. In the paper, ...
Multi-document Summarization Based on Locally Relevant Sentences
MICAI '09: Proceedings of the 2009 Eighth Mexican International Conference on Artificial IntelligenceMulti-document summarization systems must be able to draw the "best" information from a set of documents.In this paper we propose a novel extractive approach for multidocument summarization based on the detection of locally relevant sentences. Our main ...
Comments