ABSTRACT
Classic news summarization plays an important role with the exponential document growth on the Web. Many approaches are proposed to generate summaries but seldom simultaneously consider evolutionary characteristics of news plus to traditional summary elements. Therefore, we present a novel framework for the web mining problem named Evolutionary Timeline Summarization (ETS). Given the massive collection of time-stamped web documents related to a general news query, ETS aims to return the evolution trajectory along the timeline, consisting of individual but correlated summaries of each date, emphasizing relevance, coverage, coherence and cross-date diversity. ETS greatly facilitates fast news browsing and knowledge comprehension and hence is a necessity. We formally formulate the task as an optimization problem via iterative substitution from a set of sentences to a subset of sentences that satisfies the above requirements, balancing coherence/diversity measurement and local/global summary quality. The optimized substitution is iteratively conducted by incorporating several constraints until convergence. We develop experimental systems to evaluate on 6 instinctively different datasets which amount to 10251 documents. Performance comparisons between different system-generated timelines and manually created ones by human editors demonstrate the effectiveness of our proposed framework in terms of ROUGE metrics.
- J. Allan, R. Gupta, and V. Khandelwal. Temporal summaries of new topics. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR'01, pages 10--18, 2001. Google ScholarDigital Library
- H. L. Chieu and Y. K. Lee. Query based event extraction along a timeline. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR'04, pages 425--432, 2004. Google ScholarDigital Library
- G. Erkan and D. Radev. Lexpagerank: Prestige in multi-document text summarization. In Proceedings of EMNLP, volume 4, 2004.Google Scholar
- \A. Feng and J. Allan. Finding and linking incidents in news. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, CIKM'07, pages 821--830, 2007. Google ScholarDigital Library
- G. P. C. Fung, J. X. Yu, H. Liu, and P. S. Yu. Time-dependent event hierarchy construction. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '07, pages 300--309, 2007. Google ScholarDigital Library
- J. Goldstein, M. Kantrowitz, V. Mittal, and J. Carbonell. Summarizing text documents: sentence selection and evaluation metrics. In Proceedings of the 22nd SIGIR conference on Research and development in information retrieval, pages 121--128, 1999. Google ScholarDigital Library
- X. Jin, S. Spangler, R. Ma, and J. Han. Topic initiator detection on the world wide web. In Proceedings of the 19th international conference on World wide web, WWW '10, pages 481--490, 2010. Google ScholarDigital Library
- G. Kumaran and J. Allan. Text classification and named entities for new event detection. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '04, pages 297--304, 2004. Google ScholarDigital Library
- L. Li, K. Zhou, G.-R. Xue, H. Zha, and Y. Yu. Enhancing diversity, coverage and balance for summarization through structure learning. In Proceedings of the 18th international conference on World wide web, WWW '09, pages 71--80, 2009. Google ScholarDigital Library
- X. Li and W. B. Croft. Improving novelty detection for general topics using sentence level information patterns. In Proceedings of the 15th ACM international conference on Information and knowledge management, CIKM '06, pages 238--247, 2006. Google ScholarDigital Library
- C.-Y. Lin and E. Hovy. From single to multi-document summarization: a prototype system and its evaluation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL '02, pages 457--464, 2002. Google ScholarDigital Library
- C.-Y. Lin and E. Hovy. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of NAACL'03, pages 71--78, 2003. Google ScholarDigital Library
- R. Mihalcea and P. Tarau. A language independent algorithm for single and multiple document summarization. In Proceedings of IJCNLP, 2005.Google Scholar
- D. Radev, H. Jing, M. Sty, and D. Tam. Centroid-based summarization of multiple documents. Information Processing and Management, 40(6):919--938, 2004. Google ScholarDigital Library
- R. Swan and J. Allan. Automatic generation of overview timelines. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '00, pages 49--56, 2000. Google ScholarDigital Library
- X. Wan and J. Yang. Multi-document summarization using cluster-based link analysis. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR'08, pages 299--306, 2008 Google ScholarDigital Library
- X. Wan, J. Yang, and J. Xiao. Single document summarization with document expansion. In AAAI, pages 931--936, 2007. Google ScholarDigital Library
- D. Wang and T. Li. Document update summarization using incremental hierarchical clustering. In Proceedings of the 19th ACM international conference on Information and knowledge management, CIKM'10, pages 279--288, 2010. Google ScholarDigital Library
- R. Yan, Y. Li, Y. Zhang, and X. Li. Event recognition from news webpages through latent ingredients extraction. Information Retrieval Technology, pages 490--501, 2010.Google ScholarCross Ref
- C. C. Yang and X. Shi. Discovering event evolution graphs from newswires. In Proceedings of the 15th international conference on World Wide Web, WWW'06, pages 945--946, 2006. Google ScholarDigital Library
- K. Zhang, J. Zi, and L. G. Wu. New event detection based on indexing-tree and named entity. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '07, pages 215--222, 2007. Google ScholarDigital Library
Index Terms
- Evolutionary timeline summarization: a balanced optimization framework via iterative substitution
Recommendations
Evolutionary document summarization for disaster management
SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrievalIn this poster, we develop an evolutionary document summarization system for discovering the changes and differences in each phase of a disaster evolution. Given a collection of document streams describing an event, our system generates a short summary ...
On Summarization and Timeline Generation for Evolutionary Tweet Streams
Short-text messages such as tweets are being created and shared at an unprecedented rate. Tweets, in their raw form, while being informative, can also be overwhelming. For both end-users and data analysts, it is a nightmare to plow through millions of ...
Timeline generation: tracking individuals on twitter
WWW '14: Proceedings of the 23rd international conference on World wide webIn this paper, we preliminarily learn the problem of reconstructing users' life history based on the their Twitter stream and proposed an unsupervised framework that create a chronological list for personal important events (PIE) of individuals. By ...
Comments