skip to main content
research-article

Using topic themes for multi-document summarization

Published:02 July 2010Publication History
Skip Abstract Section

Abstract

The problem of using topic representations for multidocument summarization (MDS) has received considerable attention recently. Several topic representations have been employed for producing informative and coherent summaries. In this article, we describe five previously known topic representations and introduce two novel representations of topics based on topic themes. We present eight different methods of generating multidocument summaries and evaluate each of these methods on a large set of topics used in past DUC workshops. Our evaluation results show a significant improvement in the quality of summaries based on topic themes over MDS methods that use other alternative topic representations.

References

  1. Baayen, R., Piepenbrock, R., and Gulikers, L. 1995. The CELEX Lexical Database (Release 2) {CD-ROM}. Linguistic Data Consortium, University of Pennsylvania {Distributor}, Philadelphia, PA.Google ScholarGoogle Scholar
  2. Baker, C. F., Fillmore, C. J., and Lowe, J. B. 1998. The Berkeley FrameNet project. In Proceedings of the Joint Conference of the International Committee on Computation Linguistics and the Association for Computation Linguistics (COLING-ACL'98). 86--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Barzilay, R. and Lee, L. 2004. Catching the drift: probabilistic content models, with applications to generation and summarization. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL'04). 113--120.Google ScholarGoogle Scholar
  4. Barzilay, R., McKeown, K. R., and Elhadad, M. 1999. Information fusion in the context of multi-document summarization. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics. 550--557. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Barzilay, R., McKeown, K. R., and Elhadad, M. 2002. Inferring strategies for sentence ordering in multidocument news summarization. J. Artif. Intell. Res. 35--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bejan, C. A. and Hathaway, C. 2007. Utd-srl: A pipeline architecture for extracting frame semantic structures. In Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval'07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Biryukov, M., Angheluta, R., and Moens, M.-F. 2005. Multidocument question answering text summarization using topic signatures. In Proceedings of the Dutch-Belgian Information Retrieval Workshop (DIR'5).Google ScholarGoogle Scholar
  8. Carbonell, J., Geng, Y., and Goldstein, J. 1997. Automated query-relevant summarization and diversity-based reranking. In Proceedings of the Workshop on AI in Digital Libraries (IJCAI'97). 12--19.Google ScholarGoogle Scholar
  9. Carbonell, J. G. and Goldstein, J. 1998. The Use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference, A. Moffat and J. Zobel, Eds., 335--336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Clarke, J. and Lapata, M. 2006. Models for sentence compression: a comparison across domains, training requirements and evaluation measures. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Collins, M. 1999. Head-Driven Statistical Models for Natural Language Parsing. Ph.D. thesis, University of Pennsylvania. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Dang, H. 2005. Overview of DUC 2005. In Proceedings of the Document Understanding Workshop (DUC'05).Google ScholarGoogle Scholar
  13. DeJong, G. F. 1982. An overview of the FRUMP system. In Strategies for Natural Language Processing, W. G. Lehnert and M. H. Ringle Eds., Lawrence Erlbaum Associates, 149--176.Google ScholarGoogle Scholar
  14. Euler, T. 2002. Tailoring text using topic words: selection and compression. In Proceedings of 13th International Workshop on Database and Expert Systems Applications (DEXA'02). 215--222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Fellbaum, C. 1998. WordNet: An Electronic Lexical Database. MIT Press.Google ScholarGoogle Scholar
  16. Gildea, D. and Jurafsky, D. 2002. Automatic labeling of semantic roles. Comput. Linguist. 28, 3, 245--288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Gildea, D. and Palmer, M. 2002. The necessity of syntactic parsing for predicate argument recognition. In Proceedings of the 40th Annual Conference of the Association for Computational Linguistics (ACL'02). 239--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Grishman, R. and Sundheim, B. 1996. Message understanding conference - 6: A brief history. In Proceedings of the 16th International Conference on Computational Linguistics (COLING). 466--471. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Harabagiu, S. 1997. WordNet-Based Inference of Textual Context, Cohesion and Coherence. Ph.D. thesis, University of Southern California, Los Angeles, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Harabagiu, S. 2004. Incremental Topic Representations. In Proceedings of the 20th COLING Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Harabagiu, S., Hickl, A., and Lacatusu, F. 2006. Negation, contrast and contradiction in text processing. In Proceedings of the Annual Conference of the American Association for Artificial Intelligence (AAAI'06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Harabagiu, S. and Maiorano, S. 2002. Multi-document summarization with GISTexter. In Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC'02).Google ScholarGoogle Scholar
  23. Hearst, M. A. 1997. Texttiling: segmenting text into multi-paragraph subtopic passages. Computat. Ling. 23, 1, 33--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Hickl, A., Williams, J., Bensley, J., Roberts, K., Rink, B., and Shi, Y. 2006. Recognizing textual entailment with LCC's Groundhog System. In Proceedings of the 2nd PASCAL Challenges Workshop.Google ScholarGoogle Scholar
  25. Hirschman, L., Robinson, P., Ferro, L., Chinchor, N., Brown, E., Grishman, R., and Sundheim, B. 1999. Hub-4 Event99 General Guidelines and Templettes. Springer.Google ScholarGoogle Scholar
  26. Hori, C. and Furui, S. 2004. Speech summarization: an approach through word extraction and a method for evaluation. IEICE Trans. Inform. Syst. E87-D(1), 15--25.Google ScholarGoogle Scholar
  27. Hovy, E., Lin, C. Y., and Zhou, L. 2005. A BE-based multi-document summarizer with sentence compression. In Proceedings of Multilingual Summarization Evaluation Workshop (ACL'05).Google ScholarGoogle Scholar
  28. Hovy, E., Lin, C.-Y., Zhou, L., and Fukumoto, J. 2006. Automated summarization evaluation with basic elements. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC'06).Google ScholarGoogle Scholar
  29. Ji, X., Xu, W., and ZhuJing, S. 2006. Document clustering with prior knowledge. In Proceedings of the 29th Annual International ACM SIGIR Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Kehler, A. 2002. Coherence, Reference, and the Theory of Grammar. CSLI, Stanford, CA.Google ScholarGoogle Scholar
  31. Knight, K. and Marcu, D. 2000. Statistics-based summarization—step one: sentence compression. In Proceedings of the 17th National Conference of the American Association for Artificial Intelligence. 703--710. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Knott, A. and Sanders, T. J. M. 1998. The classification of coherence relations and their linguistic markers: an exploration of two languages. J. Pragmatics 30, 135--175.Google ScholarGoogle ScholarCross RefCross Ref
  33. Kudo, T. and Matsumoto, Y. 2003. Fast methods for kernel-based text analysis. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. 24--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Lacatusu, F., Hickl, A., Harabagiu, S., and Nezda, L. 2004. Lite-GISTexter at Proceedings of the Document Understanding Conference (DUC'04).Google ScholarGoogle Scholar
  35. Lin, C.-Y. and Hovy, E. 2000. The automated acquisition of topic signatures for text summarization. In Proceedings of the 18th Conference of the International Committee on Computational Linguistics (COLING). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Lin, C.-Y. and Hovy, E. 2003. The potential and limitations of automatic sentence extraction for summarization. In Proceedings of the HLT-NAACL Workshop: Text Summarization (DUC03). Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Marcu, D. 1998. Improving summarization through rhetorical parsing tuning. In Proceedings of the Sixth Workshop on Very Large Corpora. 206--215.Google ScholarGoogle Scholar
  38. Marcu, D. and Echihabi, A. 2002. An unsupervised approach to recognizing discourse relations. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL'02). Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Marcu, D. and Gerber, L. 2001. An Inquiry into the Nature of Multidocument Abstracts, Extracts, and Their Evaluation. In Proceedings of the Workshop on Automatic Summarization (NAACL'01). 1--8.Google ScholarGoogle Scholar
  40. McKeown, K. R., Klavans, J., Hatzivassiloglou, V., Barzilay, R., and Eskin, E. 1999. Towards multidocument summarization by reformulation: progress and prospects. In Proceedings of the 16th National Conference on Artificial Intelligence. 453--460. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Morris, J. and Hirst, G. 1991. Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computat. Ling. 17, 1, 21--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Moschitti, A. and Bejan, C. A. 2004. A semantic kernel for predicate argument classification. In Proceedings of Conference on Computational Natural Language Learning (CoNLL'04). 17--24.Google ScholarGoogle Scholar
  43. Nenkova, A. and Passonneau, R. 2004. Evaluating Content Selection in Summarization: the Pyramid Method. In Proceedings of Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL'04).Google ScholarGoogle Scholar
  44. Ng, V. 2004. Learning noun phrase anaphoricity to improve coreference resolution: issues in representation and optimization. In Proceedings of the 42nd Annual Meeting of the Asssociation for Computational Linguistics (ACL'04). Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Nicolae, C. and Nicolae, G. 2006. Bestcut: A graph algorithm for coreference resolution. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. 275--283. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Palmer, M., Gildea, D., and Kingsbury, P. 2005. The proposition bank: an annotated corpus of semantic roles. Computat. Ling. 31, 1, 71--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Passonneau, R., Nenkova, A., McKeown, K., and Sigelman, S. 2005. Applying the Pyramid Method in DUC 2005. In Proceedings of the Document Understanding Workshop (DUC'05).Google ScholarGoogle Scholar
  48. Pradhan, S., Ward, W., Hacioglu, K., Martin, J., and Jurafsky, D. 2005. Semantic role labeling using different syntactic views. In Proceedings of the Association for Computational Linguistics 43rd Annual Meeting (ACL'05). Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Radev, D. R., Jing, H., and Budzikowska, M. 2000. Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies. In Proceedings of the ANLP-NAACL Workshop on Automatic Summarization. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Riloff, E. 1996. Automatically generating extraction patterns from untagged text. In Proceedings of the Conference of the Association for the Advacenmnet of Artificial Intelligence/Innovative Applications of Artificial Intelligence (AAAI/IAAI). 1044--1049. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Riloff, E. and Schmelzenbach, M. 1998. An empirical approach to conceptual case frame acquisition. In Proceedings of the 16th Workshop on Very Large Corpora.Google ScholarGoogle Scholar
  52. SemEval. 2007. Fourth international workshop on semantic evaluations. In Proceedings of the Association for Computational Linguistics (ACL'07).Google ScholarGoogle Scholar
  53. SENSEVAL-3. 2004. Third international workshop on the evaluation of systems for the semantic analysis of text. In Proceedings of the Association for Computational Linguistics (ACL'04).Google ScholarGoogle Scholar
  54. Soricut, R. and Marcu, D. 2003. Sentence level discourse parsing using syntactic and lexical information. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Surdeanu, M. and Turmo, J. 2005. Semantic role labeling using complete syntactic analysis. In Proceedings of Conference on Computational Natural Language Learning (CoNLL'05). Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Turner, J. and Charniak, E. 2005. Supervised and unsupervised learning for sentence compression. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05). 290--297. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Zajic, D., Dorr, B. J., and Schwartz, R. 2004. BBN/UMD at DUC-2004: Topiary. In Proceedings of the HLT/NAACL Document Understanding Workshop (DUC'04). 112--119.Google ScholarGoogle Scholar

Index Terms

  1. Using topic themes for multi-document summarization

      Recommendations

      Reviews

      Quinsulon Israel

      With the increase of digital text and the rise of related metadata, there is a growing interest in finding ways to reduce information overload while still maintaining the most important and useful content. Focused multi-document summarization (MDS) is a process that seeks to condense collections of documents that are related by a query, question, topic, or category down to a passage of only several sentences. Harabagiu and Lacatusu present research based on topic themes, a new method of topic representation. Topic themes not only improve all aspects of the MDS process, but they also improve one's understanding of the performance of the various focus-based techniques and their various combinations, with different extraction, compression, and selection methods. These topic themes are basically simple predicate-argument structures. In short, the research compares two of their own novel representations with five state-of-the-art topic representations that use eight theme selection methods (in all, 40 MDS system combinations). Because of the many explanations of the various MDS topic representation techniques, the fundamental MDS and evaluation measures, and the authors' methodology, the paper is a bit verbose and information dense. That being said, the paper's clear writing style makes it accessible to new computational linguistics and natural language processing students, who should read this paper in its entirety. However, experts-readers who are already very familiar with information retrieval and MDS-should use this source as a reference. This elucidation of the MDS field is a great example of thorough experimentation. Online Computing Reviews Service

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Information Systems
        ACM Transactions on Information Systems  Volume 28, Issue 3
        June 2010
        231 pages
        ISSN:1046-8188
        EISSN:1558-2868
        DOI:10.1145/1777432
        Issue’s Table of Contents

        Copyright © 2010 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 2 July 2010
        • Accepted: 1 August 2009
        • Revised: 1 June 2009
        • Received: 1 October 2008
        Published in tois Volume 28, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader