skip to main content
research-article

Using topic themes for multi-document summarization

Published: 02 July 2010 Publication History

Abstract

The problem of using topic representations for multidocument summarization (MDS) has received considerable attention recently. Several topic representations have been employed for producing informative and coherent summaries. In this article, we describe five previously known topic representations and introduce two novel representations of topics based on topic themes. We present eight different methods of generating multidocument summaries and evaluate each of these methods on a large set of topics used in past DUC workshops. Our evaluation results show a significant improvement in the quality of summaries based on topic themes over MDS methods that use other alternative topic representations.

References

[1]
Baayen, R., Piepenbrock, R., and Gulikers, L. 1995. The CELEX Lexical Database (Release 2) {CD-ROM}. Linguistic Data Consortium, University of Pennsylvania {Distributor}, Philadelphia, PA.
[2]
Baker, C. F., Fillmore, C. J., and Lowe, J. B. 1998. The Berkeley FrameNet project. In Proceedings of the Joint Conference of the International Committee on Computation Linguistics and the Association for Computation Linguistics (COLING-ACL'98). 86--90.
[3]
Barzilay, R. and Lee, L. 2004. Catching the drift: probabilistic content models, with applications to generation and summarization. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL'04). 113--120.
[4]
Barzilay, R., McKeown, K. R., and Elhadad, M. 1999. Information fusion in the context of multi-document summarization. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics. 550--557.
[5]
Barzilay, R., McKeown, K. R., and Elhadad, M. 2002. Inferring strategies for sentence ordering in multidocument news summarization. J. Artif. Intell. Res. 35--55.
[6]
Bejan, C. A. and Hathaway, C. 2007. Utd-srl: A pipeline architecture for extracting frame semantic structures. In Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval'07).
[7]
Biryukov, M., Angheluta, R., and Moens, M.-F. 2005. Multidocument question answering text summarization using topic signatures. In Proceedings of the Dutch-Belgian Information Retrieval Workshop (DIR'5).
[8]
Carbonell, J., Geng, Y., and Goldstein, J. 1997. Automated query-relevant summarization and diversity-based reranking. In Proceedings of the Workshop on AI in Digital Libraries (IJCAI'97). 12--19.
[9]
Carbonell, J. G. and Goldstein, J. 1998. The Use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference, A. Moffat and J. Zobel, Eds., 335--336.
[10]
Clarke, J. and Lapata, M. 2006. Models for sentence compression: a comparison across domains, training requirements and evaluation measures. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics.
[11]
Collins, M. 1999. Head-Driven Statistical Models for Natural Language Parsing. Ph.D. thesis, University of Pennsylvania.
[12]
Dang, H. 2005. Overview of DUC 2005. In Proceedings of the Document Understanding Workshop (DUC'05).
[13]
DeJong, G. F. 1982. An overview of the FRUMP system. In Strategies for Natural Language Processing, W. G. Lehnert and M. H. Ringle Eds., Lawrence Erlbaum Associates, 149--176.
[14]
Euler, T. 2002. Tailoring text using topic words: selection and compression. In Proceedings of 13th International Workshop on Database and Expert Systems Applications (DEXA'02). 215--222.
[15]
Fellbaum, C. 1998. WordNet: An Electronic Lexical Database. MIT Press.
[16]
Gildea, D. and Jurafsky, D. 2002. Automatic labeling of semantic roles. Comput. Linguist. 28, 3, 245--288.
[17]
Gildea, D. and Palmer, M. 2002. The necessity of syntactic parsing for predicate argument recognition. In Proceedings of the 40th Annual Conference of the Association for Computational Linguistics (ACL'02). 239--246.
[18]
Grishman, R. and Sundheim, B. 1996. Message understanding conference - 6: A brief history. In Proceedings of the 16th International Conference on Computational Linguistics (COLING). 466--471.
[19]
Harabagiu, S. 1997. WordNet-Based Inference of Textual Context, Cohesion and Coherence. Ph.D. thesis, University of Southern California, Los Angeles, CA.
[20]
Harabagiu, S. 2004. Incremental Topic Representations. In Proceedings of the 20th COLING Conference.
[21]
Harabagiu, S., Hickl, A., and Lacatusu, F. 2006. Negation, contrast and contradiction in text processing. In Proceedings of the Annual Conference of the American Association for Artificial Intelligence (AAAI'06).
[22]
Harabagiu, S. and Maiorano, S. 2002. Multi-document summarization with GISTexter. In Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC'02).
[23]
Hearst, M. A. 1997. Texttiling: segmenting text into multi-paragraph subtopic passages. Computat. Ling. 23, 1, 33--64.
[24]
Hickl, A., Williams, J., Bensley, J., Roberts, K., Rink, B., and Shi, Y. 2006. Recognizing textual entailment with LCC's Groundhog System. In Proceedings of the 2nd PASCAL Challenges Workshop.
[25]
Hirschman, L., Robinson, P., Ferro, L., Chinchor, N., Brown, E., Grishman, R., and Sundheim, B. 1999. Hub-4 Event99 General Guidelines and Templettes. Springer.
[26]
Hori, C. and Furui, S. 2004. Speech summarization: an approach through word extraction and a method for evaluation. IEICE Trans. Inform. Syst. E87-D(1), 15--25.
[27]
Hovy, E., Lin, C. Y., and Zhou, L. 2005. A BE-based multi-document summarizer with sentence compression. In Proceedings of Multilingual Summarization Evaluation Workshop (ACL'05).
[28]
Hovy, E., Lin, C.-Y., Zhou, L., and Fukumoto, J. 2006. Automated summarization evaluation with basic elements. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC'06).
[29]
Ji, X., Xu, W., and ZhuJing, S. 2006. Document clustering with prior knowledge. In Proceedings of the 29th Annual International ACM SIGIR Conference.
[30]
Kehler, A. 2002. Coherence, Reference, and the Theory of Grammar. CSLI, Stanford, CA.
[31]
Knight, K. and Marcu, D. 2000. Statistics-based summarization—step one: sentence compression. In Proceedings of the 17th National Conference of the American Association for Artificial Intelligence. 703--710.
[32]
Knott, A. and Sanders, T. J. M. 1998. The classification of coherence relations and their linguistic markers: an exploration of two languages. J. Pragmatics 30, 135--175.
[33]
Kudo, T. and Matsumoto, Y. 2003. Fast methods for kernel-based text analysis. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. 24--31.
[34]
Lacatusu, F., Hickl, A., Harabagiu, S., and Nezda, L. 2004. Lite-GISTexter at Proceedings of the Document Understanding Conference (DUC'04).
[35]
Lin, C.-Y. and Hovy, E. 2000. The automated acquisition of topic signatures for text summarization. In Proceedings of the 18th Conference of the International Committee on Computational Linguistics (COLING).
[36]
Lin, C.-Y. and Hovy, E. 2003. The potential and limitations of automatic sentence extraction for summarization. In Proceedings of the HLT-NAACL Workshop: Text Summarization (DUC03).
[37]
Marcu, D. 1998. Improving summarization through rhetorical parsing tuning. In Proceedings of the Sixth Workshop on Very Large Corpora. 206--215.
[38]
Marcu, D. and Echihabi, A. 2002. An unsupervised approach to recognizing discourse relations. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL'02).
[39]
Marcu, D. and Gerber, L. 2001. An Inquiry into the Nature of Multidocument Abstracts, Extracts, and Their Evaluation. In Proceedings of the Workshop on Automatic Summarization (NAACL'01). 1--8.
[40]
McKeown, K. R., Klavans, J., Hatzivassiloglou, V., Barzilay, R., and Eskin, E. 1999. Towards multidocument summarization by reformulation: progress and prospects. In Proceedings of the 16th National Conference on Artificial Intelligence. 453--460.
[41]
Morris, J. and Hirst, G. 1991. Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computat. Ling. 17, 1, 21--43.
[42]
Moschitti, A. and Bejan, C. A. 2004. A semantic kernel for predicate argument classification. In Proceedings of Conference on Computational Natural Language Learning (CoNLL'04). 17--24.
[43]
Nenkova, A. and Passonneau, R. 2004. Evaluating Content Selection in Summarization: the Pyramid Method. In Proceedings of Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL'04).
[44]
Ng, V. 2004. Learning noun phrase anaphoricity to improve coreference resolution: issues in representation and optimization. In Proceedings of the 42nd Annual Meeting of the Asssociation for Computational Linguistics (ACL'04).
[45]
Nicolae, C. and Nicolae, G. 2006. Bestcut: A graph algorithm for coreference resolution. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. 275--283.
[46]
Palmer, M., Gildea, D., and Kingsbury, P. 2005. The proposition bank: an annotated corpus of semantic roles. Computat. Ling. 31, 1, 71--106.
[47]
Passonneau, R., Nenkova, A., McKeown, K., and Sigelman, S. 2005. Applying the Pyramid Method in DUC 2005. In Proceedings of the Document Understanding Workshop (DUC'05).
[48]
Pradhan, S., Ward, W., Hacioglu, K., Martin, J., and Jurafsky, D. 2005. Semantic role labeling using different syntactic views. In Proceedings of the Association for Computational Linguistics 43rd Annual Meeting (ACL'05).
[49]
Radev, D. R., Jing, H., and Budzikowska, M. 2000. Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies. In Proceedings of the ANLP-NAACL Workshop on Automatic Summarization.
[50]
Riloff, E. 1996. Automatically generating extraction patterns from untagged text. In Proceedings of the Conference of the Association for the Advacenmnet of Artificial Intelligence/Innovative Applications of Artificial Intelligence (AAAI/IAAI). 1044--1049.
[51]
Riloff, E. and Schmelzenbach, M. 1998. An empirical approach to conceptual case frame acquisition. In Proceedings of the 16th Workshop on Very Large Corpora.
[52]
SemEval. 2007. Fourth international workshop on semantic evaluations. In Proceedings of the Association for Computational Linguistics (ACL'07).
[53]
SENSEVAL-3. 2004. Third international workshop on the evaluation of systems for the semantic analysis of text. In Proceedings of the Association for Computational Linguistics (ACL'04).
[54]
Soricut, R. and Marcu, D. 2003. Sentence level discourse parsing using syntactic and lexical information. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics.
[55]
Surdeanu, M. and Turmo, J. 2005. Semantic role labeling using complete syntactic analysis. In Proceedings of Conference on Computational Natural Language Learning (CoNLL'05).
[56]
Turner, J. and Charniak, E. 2005. Supervised and unsupervised learning for sentence compression. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05). 290--297.
[57]
Zajic, D., Dorr, B. J., and Schwartz, R. 2004. BBN/UMD at DUC-2004: Topiary. In Proceedings of the HLT/NAACL Document Understanding Workshop (DUC'04). 112--119.

Cited By

View all
  • (2023)Systematic Review of Automatic Arabic Text Summarization TechniquesBusiness Intelligence and Information Technology10.1007/978-981-99-3416-4_63(783-796)Online publication date: 4-Jul-2023
  • (2022)SemG-TS: Abstractive Arabic Text Summarization Using Semantic Graph EmbeddingMathematics10.3390/math1018322510:18(3225)Online publication date: 6-Sep-2022
  • (2021)Two-Phase Multidocument Summarization Through Content-Attention-Based Subtopic DetectionIEEE Transactions on Computational Social Systems10.1109/TCSS.2021.30792068:6(1379-1392)Online publication date: Dec-2021
  • Show More Cited By

Recommendations

Reviews

Quinsulon Israel

With the increase of digital text and the rise of related metadata, there is a growing interest in finding ways to reduce information overload while still maintaining the most important and useful content. Focused multi-document summarization (MDS) is a process that seeks to condense collections of documents that are related by a query, question, topic, or category down to a passage of only several sentences. Harabagiu and Lacatusu present research based on topic themes, a new method of topic representation. Topic themes not only improve all aspects of the MDS process, but they also improve one's understanding of the performance of the various focus-based techniques and their various combinations, with different extraction, compression, and selection methods. These topic themes are basically simple predicate-argument structures. In short, the research compares two of their own novel representations with five state-of-the-art topic representations that use eight theme selection methods (in all, 40 MDS system combinations). Because of the many explanations of the various MDS topic representation techniques, the fundamental MDS and evaluation measures, and the authors' methodology, the paper is a bit verbose and information dense. That being said, the paper's clear writing style makes it accessible to new computational linguistics and natural language processing students, who should read this paper in its entirety. However, experts-readers who are already very familiar with information retrieval and MDS-should use this source as a reference. This elucidation of the MDS field is a great example of thorough experimentation. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Information Systems
ACM Transactions on Information Systems  Volume 28, Issue 3
June 2010
231 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/1777432
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 July 2010
Accepted: 01 August 2009
Revised: 01 June 2009
Received: 01 October 2008
Published in TOIS Volume 28, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Summarization
  2. topic representations
  3. topic themes

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Systematic Review of Automatic Arabic Text Summarization TechniquesBusiness Intelligence and Information Technology10.1007/978-981-99-3416-4_63(783-796)Online publication date: 4-Jul-2023
  • (2022)SemG-TS: Abstractive Arabic Text Summarization Using Semantic Graph EmbeddingMathematics10.3390/math1018322510:18(3225)Online publication date: 6-Sep-2022
  • (2021)Two-Phase Multidocument Summarization Through Content-Attention-Based Subtopic DetectionIEEE Transactions on Computational Social Systems10.1109/TCSS.2021.30792068:6(1379-1392)Online publication date: Dec-2021
  • (2021)A Naive Extractive Text Summarizer for Assamese Language2021 Third International Conference on Inventive Research in Computing Applications (ICIRCA)10.1109/ICIRCA51532.2021.9544769(1712-1717)Online publication date: 2-Sep-2021
  • (2021)RETRACTED: A Comparative Analysis of Pre-Processing Time in Summary of Hindi Language using Stanza and SpacyIOP Conference Series: Materials Science and Engineering10.1088/1757-899X/1110/1/0120191110:1(012019)Online publication date: 1-Mar-2021
  • (2021)Small, narrow, and parallel recurrent neural networks for sentence representation in extractive text summarizationJournal of Ambient Intelligence and Humanized Computing10.1007/s12652-021-03583-113:9(4151-4157)Online publication date: 6-Nov-2021
  • (2019)Multi-Document Extractive Summarization as a Non-linear Combinatorial Optimization ProblemNonlinear Combinatorial Optimization10.1007/978-3-030-16194-1_15(295-308)Online publication date: 1-Jun-2019
  • (2019)Clustering and Its Extensions in the Social Media DomainAdaptive Resonance Theory in Social Media Data Clustering10.1007/978-3-030-02985-2_2(15-44)Online publication date: 1-May-2019
  • (2018)A data-driven analysis of the knowledge structure of library science with full-text journal articlesJournal of Librarianship and Information Science10.1177/096100061879397752:2(345-365)Online publication date: 9-Oct-2018
  • (2018)Structured Text Summarization via Open Domain Information Extraction2018 IEEE 22nd International Conference on Computer Supported Cooperative Work in Design ((CSCWD))10.1109/CSCWD.2018.8465372(701-706)Online publication date: May-2018
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media