Article

Cross-document summarization by concept classification

Authors:
Hilda Hardy

NLIP Laboratory, University at Albany, Albany, NY

NLIP Laboratory, University at Albany, Albany, NY
View Profile

,
Nobuyuki Shimizu

NLIP Laboratory, University at Albany, Albany, NY

NLIP Laboratory, University at Albany, Albany, NY
View Profile

,
Tomek Strzalkowski

NLIP Laboratory, University at Albany, Albany, NY

NLIP Laboratory, University at Albany, Albany, NY
View Profile

,
Liu Ting

NLIP Laboratory, University at Albany, Albany, NY

NLIP Laboratory, University at Albany, Albany, NY
View Profile

,
Xinyang Zhang

NLIP Laboratory, University at Albany, Albany, NY

NLIP Laboratory, University at Albany, Albany, NY
View Profile

,
G. Bowden Wise

GE Global Research Center, Niskayuna, NY

GE Global Research Center, Niskayuna, NY
View Profile

SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrievalAugust 2002Pages 121–128https://doi.org/10.1145/564376.564399

Published:11 August 2002Publication History

SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 121–128

ABSTRACT

In this paper we describe a Cross Document Summarizer XDoX designed specifically to summarize large document sets (50-500 documents and more). Such sets of documents are typically obtained from routing or filtering systems run against a continuous stream of data, such as a newswire. XDoX works by identifying the most salient themes within the set (at the granularity level that is regulated by the user) and composing an extraction summary, which reflects these main themes. In the current version, XDoX is not optimized to produce a summary based on a few unrelated documents; indeed, such summaries are best obtained simply by concatenating summaries of individual documents. We show examples of summaries obtained in our tests as well as from our participation in the first Document Understanding Conference (DUC).

References

Carbonell, J., and Goldstein, J. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of SIGIR (1998), 335-336. Google ScholarDigital Library
Fellbaum, C. (ed.). WordNet - An Electronic Lexical Database. MIT Press, 1998.Google Scholar
Firmin, T., and Chrzanowski, M. J. An Evaluation of Automatic Text Summarization Systems. In I. Mani and M. Maybury (eds.), Advances in Automatic Text Summarization. MIT Press, 1999.Google Scholar
Hatzivassiloglou, V., Klavans, J. L., Holcombe, M. L., Barzilay, R., Kan, M., and McKeown, K. R. SimFinder: A Flexible Clustering Tool for Summarization. In NAACL 2001 Workshop on Automatic Summarization (Pittsburgh, PA), 41-49.Google Scholar
Hearst, M. Multi-paragraph segmentation of expository text. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics (Las Cruces, NM, 1994), Association for Computational Linguistics, 9-16. Google ScholarDigital Library
Kraaij, W., Spitters, M., and van der Heijden, M. Combining a mixture language model and Naïve Bayes for multi-document summarization. In SIGIR 2001 Workshop on Text Summarization (New Orleans, LA), 95-103.Google Scholar
Lin, C. and Hovy, E. NEATS: A Multidocument Summarizer. In SIGIR 2001 Workshop on Text Summarization (New Orleans, LA), 131-134.Google Scholar
Marcu, D. Discourse-Based Summarization in DUC-2001. In SIGIR 2001 Workshop on Text Summarization (New Orleans, LA), 109--116.Google Scholar
McKeown, K. and Radev, D. Generating summaries of multiple news articles. In Proceedings, 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Seattle, WA, 1995), 74--82. Google ScholarDigital Library
Miller, G.A. WordNet: A Lexical Database. Communication of the ACM 38, 11(1995), 39--41. Google ScholarDigital Library
Mitra, M., Singhal, A., and Buckley, C. Automatic text summarization by paragraph extraction. In Proceedings of the ACL'97/EACL'97 Workshop on Intelligent Scalable Text Summarization (Madrid, Spain, 1997).Google Scholar
Over, P. Introduction to DUC-2001: an Intrinsic Evaluation of Generic News Text Summarization Systems. http://www.itl.nist.gov/iaui/894.02/projects/duc/duc2001/pauls_slides/index.htm.Google Scholar
Radev, D. R., Fan, W., and Zhang, Z. WebInEssence: A Personalized Web-Based Multi-Document Summarization and Recommendation System. In NAACL 2001 Workshop on Automatic Summarization (Pittsburgh, PA), 79--88.Google Scholar
Robertson, S. E., Walker, S., Jones, S., Hancock-Beaulieu, M. M., and Gatford, M. Okapi at TREC-3. In Harman, D. (ed.), The Third Text Retrieval Conference (TREC-3). National Institute of Standards and Technology Special Publication 500-225, 1995, 219-230.Google Scholar
Singhal, A., Buckley, C., and Mitra, M. Pivoted Document Length Normalization. SIGIR 1996, 21--29. Google ScholarDigital Library
Stein, G., Strzalkowski, T., and Wise, B. Interactive, Text-Based Summarization of Multiple Documents. Computational Intelligence 16, 4 (2000), 606-613.Google Scholar
Strzalkowski, T., Stein, G., Wang, J., and Wise, B. A Robust, Practical Text Summarizer. In I. Mani and M. Maybury (eds.), Advances in Automatic Text Summarization. MIT Press, 1999, 137-154.Google Scholar
Willett, P. Recent trends in hierarchical document clustering: A critical review. Information Processing and Management, 24, 5 (1988). Google ScholarDigital Library

Index Terms

Cross-document summarization by concept classification
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources

Recommendations

Latent dirichlet allocation based multi-document summarization
AND '08: Proceedings of the second workshop on Analytics for noisy unstructured text data

Extraction based Multi-Document Summarization Algorithms consist of choosing sentences from the documents using some weighting mechanism and combining them into a summary. In this article we use Latent Dirichlet Allocation to capture the events being ...
Read More
Research on Multi-document Summarization Based on LDA Topic Model
IHMSC '14: Proceedings of the 2014 Sixth International Conference on Intelligent Human-Machine Systems and Cybernetics - Volume 02

Compared with VSM (Vector Space Model) and graph-ranking models, LDA (Latent Dirichlet Allocation) Model can discover latent topics in the corpus and latent topics are beneficial to use sentence-ranking mechanisms to form a good summary. In the paper, ...
Read More
Multi-document Summarization Based on Locally Relevant Sentences
MICAI '09: Proceedings of the 2009 Eighth Mexican International Conference on Artificial Intelligence

Multi-document summarization systems must be able to draw the "best" information from a set of documents.In this paper we propose a novel extractive approach for multidocument summarization based on the detection of locally relevant sentences. Our main ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
August 2002
478 pages
ISBN:1581135610
DOI:10.1145/564376
General Chair:
Kalervo Järvelin
University of Tampere, Finland
,
Program Chairs:
Micheline Beaulieu
University of Sheffield, UK
,
Ricardo Baeza-Yates
University of Chile, Chile
,
Sung Hyon Myaeng
Chungnam National University, Korea
Copyright © 2002 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 August 2002
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
clustering
multi-document summarization
n-grams
passage similarity
summary
term weights
Qualifiers
- Article
Conference

Acceptance Rates
SIGIR '02 Paper Acceptance Rate44of219submissions,20%Overall Acceptance Rate792of3,983submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 43
  Total Citations
  View Citations
- 1,302
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Cross-document summarization by concept classification

SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Latent dirichlet allocation based multi-document summarization

Research on Multi-document Summarization Based on LDA Topic Model

Multi-document Summarization Based on Locally Relevant Sentences