skip to main content
10.1145/1600193.1600239acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
research-article

Update summarization based on novel topic distribution

Published:16 September 2009Publication History

ABSTRACT

This paper deals with our recent research in text summarization. The field has moved from multi-document summarization to update summarization. When producing an update summary of a set of topic-related documents the summarizer assumes prior knowledge of the reader determined by a set of older documents of the same topic. The update summarizer thus must solve a novelty vs. redundancy problem. We describe the development of our summarizer which is based on Iterative Residual Rescaling (IRR) that creates the latent semantic space of a set of documents under consideration. IRR generalizes Singular Value Decomposition (SVD) and enables to control the influence of major and minor topics in the latent space. Our sentence-extractive summarization method computes the redundancy, novelty and significance of each topic. These values are finally used in the sentence selection process. The sentence selection component prevents inner summary redundancy. The results of our participation in TAC evaluation seem to be promising.

References

  1. Document understanding conference 2007: http://duc.nist.gov/.Google ScholarGoogle Scholar
  2. Text analysis conference 2008: http://www.nist.gov/tac/tracks/2008/index.html.Google ScholarGoogle Scholar
  3. R. Ando and L. Lee. Iterative residual rescaling: An analysis and generalization of lsi. In Proceeding of the 24th SIGIR, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Berry, S. Dumais, and G. O'Brien. Using linear algebra for intelligent ir. SIAM Review, 37(4), 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. F. Boudin, M. El-Beze, and J. Torres-Moreno. A scalable mmr approach to sentence scoring for multi-document update summarization. In Proceedings of the 22nd International Conference on Computational Linguistics, 2008.Google ScholarGoogle Scholar
  6. J. Carbonell and J. Goldstein. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. F. Choi, P. Wiemer-Hastings, and J. Moore. Latent semantic analysis for text segmentation. In Proceedings of EMNLP, 2001.Google ScholarGoogle Scholar
  8. C. Ding. A probabilistic model for latent semantic indexing. Journal of the American Society for Information Science and Technology, 56(6), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. Dunning. Accurate methods for statistics of surprise and coincidence. Computational Linguistics, 19, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. Erkan and D. Radev. Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research (JAIR), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Y. Gong and X. Liu. Generic text summarization using relevance measure and latent semantic analysis. In Proceedings of ACM SIGIR, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. B. Hachey, G. Murray, and D. Reitter. The embra system at duc 2005: Query-oriented multi-document summarization with a very large latent semantic space. In Proceedings of the Document Understanding Conference, 2005.Google ScholarGoogle Scholar
  13. A. Hickl, K. Roberts, and F. Lacatusu. Lcc's gistexter at duc 2007: Machine reading for update summarization. In Proceedings of the Document Understanding Conference, 2007.Google ScholarGoogle Scholar
  14. E. Hovy and C. Lin. Automated text summarization in summarist. In Proceedings of ACL/EACL workshop on intelligent scalable text summarization, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. E. Hovy, C.-Y. Lin, and L. Zhou. Evaluating duc 2005 using basic elements. In Proceedings of the Document Understanding Conference, 2005.Google ScholarGoogle Scholar
  16. T. Landauer and S. Dumais. A solution to platos problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104, 1997.Google ScholarGoogle Scholar
  17. C.-H. Lee, H.-C. Yang, and S.-M. Ma. A novel multilingual text categorization system using latent semantic indexing. In Proceedings of the First International Conference on Innovative Computing, Information and Control. IEEE Computer Society, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Lin. Rouge: A package for automatic evaluation of summaries. In Proceedings of the Workshop on Text Summarization Branches Out, 2004.Google ScholarGoogle Scholar
  19. I. Mani and G. Wilson. Robust temporal processing of news. In 38th Annual Meeting on Association for Computational Linguistics, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. Mihalcea and P. Tarau. Text-rank - bringing order into texts. In Proceeding of the Conference on Empirical Methods in Natural Language Processing, 2004.Google ScholarGoogle Scholar
  21. R. Mihalcea and P. Tarau. An algorithm for language independent single and multiple document summarization. In Proceedings of the International Joint Conference on Natural Language Processing, 2005.Google ScholarGoogle Scholar
  22. G. Murray, S. Renals, and J. Carletta. Extractive summarization of meeting recordings. In Proceedings of Interspeech, 2005.Google ScholarGoogle Scholar
  23. A. Nenkova and R. Passonneau. Evaluating content selection in summarization: The pyramid method. In Document Understanding Conference, 2005.Google ScholarGoogle Scholar
  24. P. Over, H. Dang, and D. Harman. Duc in context. Information Processing and Management, 43(6), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Steinberger and K. Ježek. Text summarization and singular value decomposition. In Lecture Notes in Computer Science 2457. Springer-Verlag Berlin Heidelberg, 2004.Google ScholarGoogle Scholar
  26. J. Steinberger and K. Ježek. Sutler: Update summarizer based on latent topics. In Proceedings of TAC 2008, 2009.Google ScholarGoogle Scholar
  27. J. Steinberger and M. Křišt'an. Lsa-based multi-document summarization. In Proceedings of 8th International Workshop on Systems and Control, 2007.Google ScholarGoogle Scholar
  28. J. Steinberger, M. Poesio, M. Kabadjov, and K. Ježek. Two uses of anaphora resolution in summarization. Information Processing and Management, 43(6), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. R. Swan and J. Allan. Automatic generation of overview timelines. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R. Witte, R. Krestel, and S. Bergler. Generating update summaries for duc 2007. In Proceedings of the Document Understanding Conference, 2007.Google ScholarGoogle Scholar
  31. J. Yeh, H. Ke, W. Yang, and I. Meng. Text summarization using a trainable summarizer and latent semantic analysis. Special issue of Information Processing and Management on An Asian digital libraries perspective, 41(1), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Zhang, X. Cheng, H. Xu, X. Wang, and Y. Zeng. Ictcas's ictgrasper at tac 2008: Summarizing dynamic information with signature terms based content filtering. In Proceedings of TAC 2008, 2009.Google ScholarGoogle Scholar

Index Terms

  1. Update summarization based on novel topic distribution

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          DocEng '09: Proceedings of the 9th ACM symposium on Document engineering
          September 2009
          264 pages
          ISBN:9781605585758
          DOI:10.1145/1600193

          Copyright © 2009 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 16 September 2009

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate178of537submissions,33%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader