skip to main content
10.1145/564376.564399acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Cross-document summarization by concept classification

Published:11 August 2002Publication History

ABSTRACT

In this paper we describe a Cross Document Summarizer XDoX designed specifically to summarize large document sets (50-500 documents and more). Such sets of documents are typically obtained from routing or filtering systems run against a continuous stream of data, such as a newswire. XDoX works by identifying the most salient themes within the set (at the granularity level that is regulated by the user) and composing an extraction summary, which reflects these main themes. In the current version, XDoX is not optimized to produce a summary based on a few unrelated documents; indeed, such summaries are best obtained simply by concatenating summaries of individual documents. We show examples of summaries obtained in our tests as well as from our participation in the first Document Understanding Conference (DUC).

References

  1. Carbonell, J., and Goldstein, J. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of SIGIR (1998), 335-336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Fellbaum, C. (ed.). WordNet - An Electronic Lexical Database. MIT Press, 1998.Google ScholarGoogle Scholar
  3. Firmin, T., and Chrzanowski, M. J. An Evaluation of Automatic Text Summarization Systems. In I. Mani and M. Maybury (eds.), Advances in Automatic Text Summarization. MIT Press, 1999.Google ScholarGoogle Scholar
  4. Hatzivassiloglou, V., Klavans, J. L., Holcombe, M. L., Barzilay, R., Kan, M., and McKeown, K. R. SimFinder: A Flexible Clustering Tool for Summarization. In NAACL 2001 Workshop on Automatic Summarization (Pittsburgh, PA), 41-49.Google ScholarGoogle Scholar
  5. Hearst, M. Multi-paragraph segmentation of expository text. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics (Las Cruces, NM, 1994), Association for Computational Linguistics, 9-16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Kraaij, W., Spitters, M., and van der Heijden, M. Combining a mixture language model and Naïve Bayes for multi-document summarization. In SIGIR 2001 Workshop on Text Summarization (New Orleans, LA), 95-103.Google ScholarGoogle Scholar
  7. Lin, C. and Hovy, E. NEATS: A Multidocument Summarizer. In SIGIR 2001 Workshop on Text Summarization (New Orleans, LA), 131-134.Google ScholarGoogle Scholar
  8. Marcu, D. Discourse-Based Summarization in DUC-2001. In SIGIR 2001 Workshop on Text Summarization (New Orleans, LA), 109--116.Google ScholarGoogle Scholar
  9. McKeown, K. and Radev, D. Generating summaries of multiple news articles. In Proceedings, 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Seattle, WA, 1995), 74--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Miller, G.A. WordNet: A Lexical Database. Communication of the ACM 38, 11(1995), 39--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Mitra, M., Singhal, A., and Buckley, C. Automatic text summarization by paragraph extraction. In Proceedings of the ACL'97/EACL'97 Workshop on Intelligent Scalable Text Summarization (Madrid, Spain, 1997).Google ScholarGoogle Scholar
  12. Over, P. Introduction to DUC-2001: an Intrinsic Evaluation of Generic News Text Summarization Systems. http://www.itl.nist.gov/iaui/894.02/projects/duc/duc2001/pauls_slides/index.htm.Google ScholarGoogle Scholar
  13. Radev, D. R., Fan, W., and Zhang, Z. WebInEssence: A Personalized Web-Based Multi-Document Summarization and Recommendation System. In NAACL 2001 Workshop on Automatic Summarization (Pittsburgh, PA), 79--88.Google ScholarGoogle Scholar
  14. Robertson, S. E., Walker, S., Jones, S., Hancock-Beaulieu, M. M., and Gatford, M. Okapi at TREC-3. In Harman, D. (ed.), The Third Text Retrieval Conference (TREC-3). National Institute of Standards and Technology Special Publication 500-225, 1995, 219-230.Google ScholarGoogle Scholar
  15. Singhal, A., Buckley, C., and Mitra, M. Pivoted Document Length Normalization. SIGIR 1996, 21--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Stein, G., Strzalkowski, T., and Wise, B. Interactive, Text-Based Summarization of Multiple Documents. Computational Intelligence 16, 4 (2000), 606-613.Google ScholarGoogle Scholar
  17. Strzalkowski, T., Stein, G., Wang, J., and Wise, B. A Robust, Practical Text Summarizer. In I. Mani and M. Maybury (eds.), Advances in Automatic Text Summarization. MIT Press, 1999, 137-154.Google ScholarGoogle Scholar
  18. Willett, P. Recent trends in hierarchical document clustering: A critical review. Information Processing and Management, 24, 5 (1988). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Cross-document summarization by concept classification

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
      August 2002
      478 pages
      ISBN:1581135610
      DOI:10.1145/564376

      Copyright © 2002 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 August 2002

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      SIGIR '02 Paper Acceptance Rate44of219submissions,20%Overall Acceptance Rate792of3,983submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader