Abstract
The problem of using topic representations for multidocument summarization (MDS) has received considerable attention recently. Several topic representations have been employed for producing informative and coherent summaries. In this article, we describe five previously known topic representations and introduce two novel representations of topics based on topic themes. We present eight different methods of generating multidocument summaries and evaluate each of these methods on a large set of topics used in past DUC workshops. Our evaluation results show a significant improvement in the quality of summaries based on topic themes over MDS methods that use other alternative topic representations.
- Baayen, R., Piepenbrock, R., and Gulikers, L. 1995. The CELEX Lexical Database (Release 2) {CD-ROM}. Linguistic Data Consortium, University of Pennsylvania {Distributor}, Philadelphia, PA.Google Scholar
- Baker, C. F., Fillmore, C. J., and Lowe, J. B. 1998. The Berkeley FrameNet project. In Proceedings of the Joint Conference of the International Committee on Computation Linguistics and the Association for Computation Linguistics (COLING-ACL'98). 86--90. Google ScholarDigital Library
- Barzilay, R. and Lee, L. 2004. Catching the drift: probabilistic content models, with applications to generation and summarization. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL'04). 113--120.Google Scholar
- Barzilay, R., McKeown, K. R., and Elhadad, M. 1999. Information fusion in the context of multi-document summarization. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics. 550--557. Google ScholarDigital Library
- Barzilay, R., McKeown, K. R., and Elhadad, M. 2002. Inferring strategies for sentence ordering in multidocument news summarization. J. Artif. Intell. Res. 35--55. Google ScholarDigital Library
- Bejan, C. A. and Hathaway, C. 2007. Utd-srl: A pipeline architecture for extracting frame semantic structures. In Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval'07). Google ScholarDigital Library
- Biryukov, M., Angheluta, R., and Moens, M.-F. 2005. Multidocument question answering text summarization using topic signatures. In Proceedings of the Dutch-Belgian Information Retrieval Workshop (DIR'5).Google Scholar
- Carbonell, J., Geng, Y., and Goldstein, J. 1997. Automated query-relevant summarization and diversity-based reranking. In Proceedings of the Workshop on AI in Digital Libraries (IJCAI'97). 12--19.Google Scholar
- Carbonell, J. G. and Goldstein, J. 1998. The Use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference, A. Moffat and J. Zobel, Eds., 335--336. Google ScholarDigital Library
- Clarke, J. and Lapata, M. 2006. Models for sentence compression: a comparison across domains, training requirements and evaluation measures. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics. Google ScholarDigital Library
- Collins, M. 1999. Head-Driven Statistical Models for Natural Language Parsing. Ph.D. thesis, University of Pennsylvania. Google ScholarDigital Library
- Dang, H. 2005. Overview of DUC 2005. In Proceedings of the Document Understanding Workshop (DUC'05).Google Scholar
- DeJong, G. F. 1982. An overview of the FRUMP system. In Strategies for Natural Language Processing, W. G. Lehnert and M. H. Ringle Eds., Lawrence Erlbaum Associates, 149--176.Google Scholar
- Euler, T. 2002. Tailoring text using topic words: selection and compression. In Proceedings of 13th International Workshop on Database and Expert Systems Applications (DEXA'02). 215--222. Google ScholarDigital Library
- Fellbaum, C. 1998. WordNet: An Electronic Lexical Database. MIT Press.Google Scholar
- Gildea, D. and Jurafsky, D. 2002. Automatic labeling of semantic roles. Comput. Linguist. 28, 3, 245--288. Google ScholarDigital Library
- Gildea, D. and Palmer, M. 2002. The necessity of syntactic parsing for predicate argument recognition. In Proceedings of the 40th Annual Conference of the Association for Computational Linguistics (ACL'02). 239--246. Google ScholarDigital Library
- Grishman, R. and Sundheim, B. 1996. Message understanding conference - 6: A brief history. In Proceedings of the 16th International Conference on Computational Linguistics (COLING). 466--471. Google ScholarDigital Library
- Harabagiu, S. 1997. WordNet-Based Inference of Textual Context, Cohesion and Coherence. Ph.D. thesis, University of Southern California, Los Angeles, CA. Google ScholarDigital Library
- Harabagiu, S. 2004. Incremental Topic Representations. In Proceedings of the 20th COLING Conference. Google ScholarDigital Library
- Harabagiu, S., Hickl, A., and Lacatusu, F. 2006. Negation, contrast and contradiction in text processing. In Proceedings of the Annual Conference of the American Association for Artificial Intelligence (AAAI'06). Google ScholarDigital Library
- Harabagiu, S. and Maiorano, S. 2002. Multi-document summarization with GISTexter. In Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC'02).Google Scholar
- Hearst, M. A. 1997. Texttiling: segmenting text into multi-paragraph subtopic passages. Computat. Ling. 23, 1, 33--64. Google ScholarDigital Library
- Hickl, A., Williams, J., Bensley, J., Roberts, K., Rink, B., and Shi, Y. 2006. Recognizing textual entailment with LCC's Groundhog System. In Proceedings of the 2nd PASCAL Challenges Workshop.Google Scholar
- Hirschman, L., Robinson, P., Ferro, L., Chinchor, N., Brown, E., Grishman, R., and Sundheim, B. 1999. Hub-4 Event99 General Guidelines and Templettes. Springer.Google Scholar
- Hori, C. and Furui, S. 2004. Speech summarization: an approach through word extraction and a method for evaluation. IEICE Trans. Inform. Syst. E87-D(1), 15--25.Google Scholar
- Hovy, E., Lin, C. Y., and Zhou, L. 2005. A BE-based multi-document summarizer with sentence compression. In Proceedings of Multilingual Summarization Evaluation Workshop (ACL'05).Google Scholar
- Hovy, E., Lin, C.-Y., Zhou, L., and Fukumoto, J. 2006. Automated summarization evaluation with basic elements. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC'06).Google Scholar
- Ji, X., Xu, W., and ZhuJing, S. 2006. Document clustering with prior knowledge. In Proceedings of the 29th Annual International ACM SIGIR Conference. Google ScholarDigital Library
- Kehler, A. 2002. Coherence, Reference, and the Theory of Grammar. CSLI, Stanford, CA.Google Scholar
- Knight, K. and Marcu, D. 2000. Statistics-based summarization—step one: sentence compression. In Proceedings of the 17th National Conference of the American Association for Artificial Intelligence. 703--710. Google ScholarDigital Library
- Knott, A. and Sanders, T. J. M. 1998. The classification of coherence relations and their linguistic markers: an exploration of two languages. J. Pragmatics 30, 135--175.Google ScholarCross Ref
- Kudo, T. and Matsumoto, Y. 2003. Fast methods for kernel-based text analysis. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. 24--31. Google ScholarDigital Library
- Lacatusu, F., Hickl, A., Harabagiu, S., and Nezda, L. 2004. Lite-GISTexter at Proceedings of the Document Understanding Conference (DUC'04).Google Scholar
- Lin, C.-Y. and Hovy, E. 2000. The automated acquisition of topic signatures for text summarization. In Proceedings of the 18th Conference of the International Committee on Computational Linguistics (COLING). Google ScholarDigital Library
- Lin, C.-Y. and Hovy, E. 2003. The potential and limitations of automatic sentence extraction for summarization. In Proceedings of the HLT-NAACL Workshop: Text Summarization (DUC03). Google ScholarDigital Library
- Marcu, D. 1998. Improving summarization through rhetorical parsing tuning. In Proceedings of the Sixth Workshop on Very Large Corpora. 206--215.Google Scholar
- Marcu, D. and Echihabi, A. 2002. An unsupervised approach to recognizing discourse relations. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL'02). Google ScholarDigital Library
- Marcu, D. and Gerber, L. 2001. An Inquiry into the Nature of Multidocument Abstracts, Extracts, and Their Evaluation. In Proceedings of the Workshop on Automatic Summarization (NAACL'01). 1--8.Google Scholar
- McKeown, K. R., Klavans, J., Hatzivassiloglou, V., Barzilay, R., and Eskin, E. 1999. Towards multidocument summarization by reformulation: progress and prospects. In Proceedings of the 16th National Conference on Artificial Intelligence. 453--460. Google ScholarDigital Library
- Morris, J. and Hirst, G. 1991. Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computat. Ling. 17, 1, 21--43. Google ScholarDigital Library
- Moschitti, A. and Bejan, C. A. 2004. A semantic kernel for predicate argument classification. In Proceedings of Conference on Computational Natural Language Learning (CoNLL'04). 17--24.Google Scholar
- Nenkova, A. and Passonneau, R. 2004. Evaluating Content Selection in Summarization: the Pyramid Method. In Proceedings of Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL'04).Google Scholar
- Ng, V. 2004. Learning noun phrase anaphoricity to improve coreference resolution: issues in representation and optimization. In Proceedings of the 42nd Annual Meeting of the Asssociation for Computational Linguistics (ACL'04). Google ScholarDigital Library
- Nicolae, C. and Nicolae, G. 2006. Bestcut: A graph algorithm for coreference resolution. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. 275--283. Google ScholarDigital Library
- Palmer, M., Gildea, D., and Kingsbury, P. 2005. The proposition bank: an annotated corpus of semantic roles. Computat. Ling. 31, 1, 71--106. Google ScholarDigital Library
- Passonneau, R., Nenkova, A., McKeown, K., and Sigelman, S. 2005. Applying the Pyramid Method in DUC 2005. In Proceedings of the Document Understanding Workshop (DUC'05).Google Scholar
- Pradhan, S., Ward, W., Hacioglu, K., Martin, J., and Jurafsky, D. 2005. Semantic role labeling using different syntactic views. In Proceedings of the Association for Computational Linguistics 43rd Annual Meeting (ACL'05). Google ScholarDigital Library
- Radev, D. R., Jing, H., and Budzikowska, M. 2000. Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies. In Proceedings of the ANLP-NAACL Workshop on Automatic Summarization. Google ScholarDigital Library
- Riloff, E. 1996. Automatically generating extraction patterns from untagged text. In Proceedings of the Conference of the Association for the Advacenmnet of Artificial Intelligence/Innovative Applications of Artificial Intelligence (AAAI/IAAI). 1044--1049. Google ScholarDigital Library
- Riloff, E. and Schmelzenbach, M. 1998. An empirical approach to conceptual case frame acquisition. In Proceedings of the 16th Workshop on Very Large Corpora.Google Scholar
- SemEval. 2007. Fourth international workshop on semantic evaluations. In Proceedings of the Association for Computational Linguistics (ACL'07).Google Scholar
- SENSEVAL-3. 2004. Third international workshop on the evaluation of systems for the semantic analysis of text. In Proceedings of the Association for Computational Linguistics (ACL'04).Google Scholar
- Soricut, R. and Marcu, D. 2003. Sentence level discourse parsing using syntactic and lexical information. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics. Google ScholarDigital Library
- Surdeanu, M. and Turmo, J. 2005. Semantic role labeling using complete syntactic analysis. In Proceedings of Conference on Computational Natural Language Learning (CoNLL'05). Google ScholarDigital Library
- Turner, J. and Charniak, E. 2005. Supervised and unsupervised learning for sentence compression. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05). 290--297. Google ScholarDigital Library
- Zajic, D., Dorr, B. J., and Schwartz, R. 2004. BBN/UMD at DUC-2004: Topiary. In Proceedings of the HLT/NAACL Document Understanding Workshop (DUC'04). 112--119.Google Scholar
Index Terms
- Using topic themes for multi-document summarization
Recommendations
Topic themes for multi-document summarization
SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrievalThe problem of using topic representations for multi-document summarization (MDS) has received considerable attention recently. In this paper, we describe five different topic representations and introduce a novel representation of topics based on topic ...
Topic analysis for topic-focused multi-document summarization
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge managementTopic-focused multi-document summarization has been a challenging task because the created summary is required to be biased to the given topic or query. Existing methods consider the given topic as a single coarse unit and then directly incorporate the ...
Research on Multi-document Summarization Based on LDA Topic Model
IHMSC '14: Proceedings of the 2014 Sixth International Conference on Intelligent Human-Machine Systems and Cybernetics - Volume 02Compared with VSM (Vector Space Model) and graph-ranking models, LDA (Latent Dirichlet Allocation) Model can discover latent topics in the corpus and latent topics are beneficial to use sentence-ranking mechanisms to form a good summary. In the paper, ...
Comments