ABSTRACT
This paper deals with our recent research in text summarization. The field has moved from multi-document summarization to update summarization. When producing an update summary of a set of topic-related documents the summarizer assumes prior knowledge of the reader determined by a set of older documents of the same topic. The update summarizer thus must solve a novelty vs. redundancy problem. We describe the development of our summarizer which is based on Iterative Residual Rescaling (IRR) that creates the latent semantic space of a set of documents under consideration. IRR generalizes Singular Value Decomposition (SVD) and enables to control the influence of major and minor topics in the latent space. Our sentence-extractive summarization method computes the redundancy, novelty and significance of each topic. These values are finally used in the sentence selection process. The sentence selection component prevents inner summary redundancy. The results of our participation in TAC evaluation seem to be promising.
- Document understanding conference 2007: http://duc.nist.gov/.Google Scholar
- Text analysis conference 2008: http://www.nist.gov/tac/tracks/2008/index.html.Google Scholar
- R. Ando and L. Lee. Iterative residual rescaling: An analysis and generalization of lsi. In Proceeding of the 24th SIGIR, 2001. Google ScholarDigital Library
- M. Berry, S. Dumais, and G. O'Brien. Using linear algebra for intelligent ir. SIAM Review, 37(4), 1995. Google ScholarDigital Library
- F. Boudin, M. El-Beze, and J. Torres-Moreno. A scalable mmr approach to sentence scoring for multi-document update summarization. In Proceedings of the 22nd International Conference on Computational Linguistics, 2008.Google Scholar
- J. Carbonell and J. Goldstein. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval, 1998. Google ScholarDigital Library
- F. Choi, P. Wiemer-Hastings, and J. Moore. Latent semantic analysis for text segmentation. In Proceedings of EMNLP, 2001.Google Scholar
- C. Ding. A probabilistic model for latent semantic indexing. Journal of the American Society for Information Science and Technology, 56(6), 2005. Google ScholarDigital Library
- T. Dunning. Accurate methods for statistics of surprise and coincidence. Computational Linguistics, 19, 1993. Google ScholarDigital Library
- G. Erkan and D. Radev. Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research (JAIR), 2004. Google ScholarDigital Library
- Y. Gong and X. Liu. Generic text summarization using relevance measure and latent semantic analysis. In Proceedings of ACM SIGIR, 2002. Google ScholarDigital Library
- B. Hachey, G. Murray, and D. Reitter. The embra system at duc 2005: Query-oriented multi-document summarization with a very large latent semantic space. In Proceedings of the Document Understanding Conference, 2005.Google Scholar
- A. Hickl, K. Roberts, and F. Lacatusu. Lcc's gistexter at duc 2007: Machine reading for update summarization. In Proceedings of the Document Understanding Conference, 2007.Google Scholar
- E. Hovy and C. Lin. Automated text summarization in summarist. In Proceedings of ACL/EACL workshop on intelligent scalable text summarization, 1997. Google ScholarDigital Library
- E. Hovy, C.-Y. Lin, and L. Zhou. Evaluating duc 2005 using basic elements. In Proceedings of the Document Understanding Conference, 2005.Google Scholar
- T. Landauer and S. Dumais. A solution to platos problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104, 1997.Google Scholar
- C.-H. Lee, H.-C. Yang, and S.-M. Ma. A novel multilingual text categorization system using latent semantic indexing. In Proceedings of the First International Conference on Innovative Computing, Information and Control. IEEE Computer Society, 2006. Google ScholarDigital Library
- C. Lin. Rouge: A package for automatic evaluation of summaries. In Proceedings of the Workshop on Text Summarization Branches Out, 2004.Google Scholar
- I. Mani and G. Wilson. Robust temporal processing of news. In 38th Annual Meeting on Association for Computational Linguistics, 2000. Google ScholarDigital Library
- R. Mihalcea and P. Tarau. Text-rank - bringing order into texts. In Proceeding of the Conference on Empirical Methods in Natural Language Processing, 2004.Google Scholar
- R. Mihalcea and P. Tarau. An algorithm for language independent single and multiple document summarization. In Proceedings of the International Joint Conference on Natural Language Processing, 2005.Google Scholar
- G. Murray, S. Renals, and J. Carletta. Extractive summarization of meeting recordings. In Proceedings of Interspeech, 2005.Google Scholar
- A. Nenkova and R. Passonneau. Evaluating content selection in summarization: The pyramid method. In Document Understanding Conference, 2005.Google Scholar
- P. Over, H. Dang, and D. Harman. Duc in context. Information Processing and Management, 43(6), 2007. Google ScholarDigital Library
- J. Steinberger and K. Ježek. Text summarization and singular value decomposition. In Lecture Notes in Computer Science 2457. Springer-Verlag Berlin Heidelberg, 2004.Google Scholar
- J. Steinberger and K. Ježek. Sutler: Update summarizer based on latent topics. In Proceedings of TAC 2008, 2009.Google Scholar
- J. Steinberger and M. Křišt'an. Lsa-based multi-document summarization. In Proceedings of 8th International Workshop on Systems and Control, 2007.Google Scholar
- J. Steinberger, M. Poesio, M. Kabadjov, and K. Ježek. Two uses of anaphora resolution in summarization. Information Processing and Management, 43(6), 2007. Google ScholarDigital Library
- R. Swan and J. Allan. Automatic generation of overview timelines. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, 2000. Google ScholarDigital Library
- R. Witte, R. Krestel, and S. Bergler. Generating update summaries for duc 2007. In Proceedings of the Document Understanding Conference, 2007.Google Scholar
- J. Yeh, H. Ke, W. Yang, and I. Meng. Text summarization using a trainable summarizer and latent semantic analysis. Special issue of Information Processing and Management on An Asian digital libraries perspective, 41(1), 2005. Google ScholarDigital Library
- J. Zhang, X. Cheng, H. Xu, X. Wang, and Y. Zeng. Ictcas's ictgrasper at tac 2008: Summarizing dynamic information with signature terms based content filtering. In Proceedings of TAC 2008, 2009.Google Scholar
Index Terms
- Update summarization based on novel topic distribution
Recommendations
Sentiment diversification for short review summarization
WI '17: Proceedings of the International Conference on Web IntelligenceWith the abundance of reviews published on the Web about a given product, consumers are looking for ways to view major opinions that can be presented in a quick and succinct way. Reviews contain many different opinions, making the ability to show a ...
A Comparative Analysis on Hindi and English Extractive Text Summarization
Text summarization is the process of transfiguring a large documental information into a clear and concise form. In this article, we present a detailed comparative study of various extractive methods for automatic text summarization on Hindi and English ...
Topic-driven reader comments summarization
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge managementReaders of a news article often read its comments contributed by other readers. By reading comments, readers obtain not only complementary information about this news article but also the opinions from other readers. However, the existing ranking ...
Comments