Skip to main content

Summarizing a Document Stream

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6611))

Abstract

We introduce the task of summarizing a stream of short documents on microblogs such as Twitter. On microblogs, thousands of short documents on a certain topic such as sports matches or TV dramas are posted by users. Noticeable characteristics of microblog data are that documents are often very highly redundant and aligned on timeline. There can be thousands of documents on one event in the topic. Two very similar documents will refer to two distinct events when the documents are temporally distant. We examine the microblog data to gain more understanding of those characteristics, and propose a summarization model for a stream of short documents on timeline, along with an approximate fast algorithm for generating summary. We empirically show that our model generates a good summary on the datasets of microblog documents on sports matches.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allan, J., Carbonell, J., Doddington, G., Yamron, J., Yang, Y., Amherst, U., Umass, J.A.: Topic detection and tracking pilot study. In: DARPA Broadcast News Transcription and Understanding Workshop, pp. 194–218 (1998)

    Google Scholar 

  2. Clarke, J., Lapata, M.: Global inference for sentence compression: An integer linear programming approach. J. of Artificial Intelligence Research 31, 399–429 (2008)

    MATH  Google Scholar 

  3. Drezner, Z., Hamacher, H.W. (eds.): Facility Location: Applications and Theory. Springer, Heidelberg (2004)

    Google Scholar 

  4. Hromkovič, J.: Algorithmics for Hard Problems. Springer, Heidelberg (2003)

    MATH  Google Scholar 

  5. Lin, C.: ROUGE: a package for automatic evaluation of summaries. In: Proceedings of the Workshop on Text Summarization Branches Out, pp. 74–81 (2004)

    Google Scholar 

  6. Lin, H., Bilmes, J.: Multi-document summarization via budgeted maximization of submodular functions. In: HLT-NAACL, pp. 912–920 (2010)

    Google Scholar 

  7. Mahdiraji, A.R.: Clustering data stream: A survey of algorithms. International Journal of Knowledge-based and Intelligent Engineering Systems 13, 39–44 (2009)

    Article  Google Scholar 

  8. Mani, I.: Automatic Summarization. John Benjamins Publisher, Amsterdam (2001)

    Book  MATH  Google Scholar 

  9. McDonald, R.: A study of global inference algorithms in multi-document summarization. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 557–564. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  10. O’Connory, B., Krieger, M., Ahn, D.: Tweetmotif: Exploratory search and topic summarization for twitter. In: AAAI Conf. on Weblogs and Social Media, pp. 384–385 (2010)

    Google Scholar 

  11. Petrovic, S., Osborne, M., Lavrenko, V.: Streaming first story detection with application to twitter. In: NAACL, pp. 181–189 (2010)

    Google Scholar 

  12. Ritter, A., Cherry, C., Dolan, B.: Unsupervised modeling of twitter conversations. In: NAACL, pp. 172–180 (2010)

    Google Scholar 

  13. Sharifi, B., Hutton, M.A., Kalita, J.: Summarizing microblogs automatically. In: NAACL, short paper, pp. 685–688 (2010)

    Google Scholar 

  14. Swan, R., Jensen, D.: Timemines: Constructing timelines with statistical models of word usage. In: SIGKDD Workshop on Text Mining, pp. 73–80 (2000)

    Google Scholar 

  15. Takamura, H., Okumura, M.: Text summarization model based on the budgeted median problem. In: CIKM, short paper, pp. 1589–1592 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Takamura, H., Yokono, H., Okumura, M. (2011). Summarizing a Document Stream. In: Clough, P., et al. Advances in Information Retrieval. ECIR 2011. Lecture Notes in Computer Science, vol 6611. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20161-5_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20161-5_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20160-8

  • Online ISBN: 978-3-642-20161-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics