Abstract
We introduce the task of summarizing a stream of short documents on microblogs such as Twitter. On microblogs, thousands of short documents on a certain topic such as sports matches or TV dramas are posted by users. Noticeable characteristics of microblog data are that documents are often very highly redundant and aligned on timeline. There can be thousands of documents on one event in the topic. Two very similar documents will refer to two distinct events when the documents are temporally distant. We examine the microblog data to gain more understanding of those characteristics, and propose a summarization model for a stream of short documents on timeline, along with an approximate fast algorithm for generating summary. We empirically show that our model generates a good summary on the datasets of microblog documents on sports matches.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Allan, J., Carbonell, J., Doddington, G., Yamron, J., Yang, Y., Amherst, U., Umass, J.A.: Topic detection and tracking pilot study. In: DARPA Broadcast News Transcription and Understanding Workshop, pp. 194–218 (1998)
Clarke, J., Lapata, M.: Global inference for sentence compression: An integer linear programming approach. J. of Artificial Intelligence Research 31, 399–429 (2008)
Drezner, Z., Hamacher, H.W. (eds.): Facility Location: Applications and Theory. Springer, Heidelberg (2004)
Hromkovič, J.: Algorithmics for Hard Problems. Springer, Heidelberg (2003)
Lin, C.: ROUGE: a package for automatic evaluation of summaries. In: Proceedings of the Workshop on Text Summarization Branches Out, pp. 74–81 (2004)
Lin, H., Bilmes, J.: Multi-document summarization via budgeted maximization of submodular functions. In: HLT-NAACL, pp. 912–920 (2010)
Mahdiraji, A.R.: Clustering data stream: A survey of algorithms. International Journal of Knowledge-based and Intelligent Engineering Systems 13, 39–44 (2009)
Mani, I.: Automatic Summarization. John Benjamins Publisher, Amsterdam (2001)
McDonald, R.: A study of global inference algorithms in multi-document summarization. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 557–564. Springer, Heidelberg (2007)
O’Connory, B., Krieger, M., Ahn, D.: Tweetmotif: Exploratory search and topic summarization for twitter. In: AAAI Conf. on Weblogs and Social Media, pp. 384–385 (2010)
Petrovic, S., Osborne, M., Lavrenko, V.: Streaming first story detection with application to twitter. In: NAACL, pp. 181–189 (2010)
Ritter, A., Cherry, C., Dolan, B.: Unsupervised modeling of twitter conversations. In: NAACL, pp. 172–180 (2010)
Sharifi, B., Hutton, M.A., Kalita, J.: Summarizing microblogs automatically. In: NAACL, short paper, pp. 685–688 (2010)
Swan, R., Jensen, D.: Timemines: Constructing timelines with statistical models of word usage. In: SIGKDD Workshop on Text Mining, pp. 73–80 (2000)
Takamura, H., Okumura, M.: Text summarization model based on the budgeted median problem. In: CIKM, short paper, pp. 1589–1592 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Takamura, H., Yokono, H., Okumura, M. (2011). Summarizing a Document Stream. In: Clough, P., et al. Advances in Information Retrieval. ECIR 2011. Lecture Notes in Computer Science, vol 6611. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20161-5_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-20161-5_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20160-8
Online ISBN: 978-3-642-20161-5
eBook Packages: Computer ScienceComputer Science (R0)