Summarizing a Document Stream

Takamura, Hiroya; Yokono, Hikaru; Okumura, Manabu

doi:10.1007/978-3-642-20161-5_18

Summarizing a Document Stream

Hiroya Takamura²¹,
Hikaru Yokono²¹ &
Manabu Okumura²¹

Conference paper

7031 Accesses
26 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6611))

Abstract

We introduce the task of summarizing a stream of short documents on microblogs such as Twitter. On microblogs, thousands of short documents on a certain topic such as sports matches or TV dramas are posted by users. Noticeable characteristics of microblog data are that documents are often very highly redundant and aligned on timeline. There can be thousands of documents on one event in the topic. Two very similar documents will refer to two distinct events when the documents are temporally distant. We examine the microblog data to gain more understanding of those characteristics, and propose a summarization model for a stream of short documents on timeline, along with an approximate fast algorithm for generating summary. We empirically show that our model generates a good summary on the datasets of microblog documents on sports matches.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Allan, J., Carbonell, J., Doddington, G., Yamron, J., Yang, Y., Amherst, U., Umass, J.A.: Topic detection and tracking pilot study. In: DARPA Broadcast News Transcription and Understanding Workshop, pp. 194–218 (1998)
Google Scholar
Clarke, J., Lapata, M.: Global inference for sentence compression: An integer linear programming approach. J. of Artificial Intelligence Research 31, 399–429 (2008)
MATH Google Scholar
Drezner, Z., Hamacher, H.W. (eds.): Facility Location: Applications and Theory. Springer, Heidelberg (2004)
Google Scholar
Hromkovič, J.: Algorithmics for Hard Problems. Springer, Heidelberg (2003)
MATH Google Scholar
Lin, C.: ROUGE: a package for automatic evaluation of summaries. In: Proceedings of the Workshop on Text Summarization Branches Out, pp. 74–81 (2004)
Google Scholar
Lin, H., Bilmes, J.: Multi-document summarization via budgeted maximization of submodular functions. In: HLT-NAACL, pp. 912–920 (2010)
Google Scholar
Mahdiraji, A.R.: Clustering data stream: A survey of algorithms. International Journal of Knowledge-based and Intelligent Engineering Systems 13, 39–44 (2009)
Article Google Scholar
Mani, I.: Automatic Summarization. John Benjamins Publisher, Amsterdam (2001)
Book MATH Google Scholar
McDonald, R.: A study of global inference algorithms in multi-document summarization. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 557–564. Springer, Heidelberg (2007)
Chapter Google Scholar
O’Connory, B., Krieger, M., Ahn, D.: Tweetmotif: Exploratory search and topic summarization for twitter. In: AAAI Conf. on Weblogs and Social Media, pp. 384–385 (2010)
Google Scholar
Petrovic, S., Osborne, M., Lavrenko, V.: Streaming first story detection with application to twitter. In: NAACL, pp. 181–189 (2010)
Google Scholar
Ritter, A., Cherry, C., Dolan, B.: Unsupervised modeling of twitter conversations. In: NAACL, pp. 172–180 (2010)
Google Scholar
Sharifi, B., Hutton, M.A., Kalita, J.: Summarizing microblogs automatically. In: NAACL, short paper, pp. 685–688 (2010)
Google Scholar
Swan, R., Jensen, D.: Timemines: Constructing timelines with statistical models of word usage. In: SIGKDD Workshop on Text Mining, pp. 73–80 (2000)
Google Scholar
Takamura, H., Okumura, M.: Text summarization model based on the budgeted median problem. In: CIKM, short paper, pp. 1589–1592 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Precision and Intelligence Laboratory, Tokyo Institute of Technology, Japan
Hiroya Takamura, Hikaru Yokono & Manabu Okumura

Authors

Hiroya Takamura
View author publications
You can also search for this author in PubMed Google Scholar
Hikaru Yokono
View author publications
You can also search for this author in PubMed Google Scholar
Manabu Okumura
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Information School, University of Sheffield, Regent Court, 211 Portobello Street, S1 4DP, Sheffield, UK
Paul Clough
CLARITY: Centre for Sensor Web Technologies, School of Computing, Dublin City University, Glasnevin, Dublin 9, Ireland
Colum Foley , Cathal Gurrin & Hyowon Lee , &
Centre for Next Generation Localisation, School of Computing, Dublin City University, Glasnevin, Dublin 9, Ireland
Gareth J. F. Jones
TNO Human Factors, Brassersplein 2, 2612 CT, Delft, The Netherlands
Wessel Kraaij
Yahoo! Research, 177 Diagonal, 08018, Barcelona, Spain
Vanessa Mudoch

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Takamura, H., Yokono, H., Okumura, M. (2011). Summarizing a Document Stream. In: Clough, P., et al. Advances in Information Retrieval. ECIR 2011. Lecture Notes in Computer Science, vol 6611. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20161-5_18

Download citation

DOI: https://doi.org/10.1007/978-3-642-20161-5_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20160-8
Online ISBN: 978-3-642-20161-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics