Skip to main content

Continuous Summarization over Microblog Threads

  • Conference paper
  • First Online:
  • 2614 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10178))

Abstract

With the dramatic growth of social media users, microblogs are created and shared at an unprecedented rate. The high velocity and large volumes of short text posts (microblogs) bring redundancies and noise, making it hard for users and analysts to elicit useful information. In this paper, we formalize the problem from a summarization angle – Continuous Summarization over Microblog Threads (CSMT), which considers three facets: information gain of the microblog dialogue, diversity, and temporal information. This summarization problem is different from the classic ones in two aspects: (i) It is considered over a large-scale, dynamic data with high updating frequency; (ii) the context between microblogs are taken into account. We first prove that the CSMT problem is NP-hard. Then we propose a greedy algorithm with (\(1-1/\mathrm{e}\)) performance guarantee. Finally we extend the greedy algorithm on the sliding window to continuously summarize microblogs for threads. Our experimental results on large-scale datasets show that our method is more superior than other two baselines in terms of summary diversity and information gain, with a close time cost to the best performed baseline.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://trec.nist.gov/.

  2. 2.

    http://ow.ly/Dh5d307HVGj.

References

  1. Bian, J., Yang, Y., Chua, T.-S.: Multimedia summarization for trending topics in microblogs. In: Proceedings of the CIKM, pp. 1807–1812 (2013)

    Google Scholar 

  2. Bian, J., Yang, Y., Zhang, H., Chua, T.-S.: Multimedia summarization for social events in microblog stream. IEEE Trans. Multimedia 17(2), 216 (2015)

    Article  Google Scholar 

  3. Chakrabarti, D., Punera, K.: Event summarization using tweets. In: ICWSM, vol. 11, pp. 66–73 (2011)

    Google Scholar 

  4. Chang, Y., Wang, X., Mei, Q., Liu, Y.: Towards twitter context summarization with user influence models. In: Proceedings of the WSDM, pp. 527–536. ACM (2013)

    Google Scholar 

  5. Chen, Y., Zhang, X., Li, Z., Ng, J.P.: Search engine reinforced semi-supervised classification and graph-based summarization of microblogs. Neurocomputing 152, 274–286 (2015)

    Article  Google Scholar 

  6. Chua, F., Asur, S.: Automatic summarization of events from social media. In: ICWSM (2013)

    Google Scholar 

  7. Drosou, M., Pitoura, E.: Dynamic diversification of continuous data. In: Proceedings of the EDBT, pp. 216–227 (2012)

    Google Scholar 

  8. Erkan, G., Radev, D.R.: Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)

    Google Scholar 

  9. Feige, U., Peleg, D., Kortsarz, G.: The dense k-subgraph problem. Algorithmica 29(3), 410–421 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  10. Gao, W., Li, P., Darwish, K.: Joint topic modeling for event summarization across news and social media streams. In: Proceedings of the CIKM, pp. 1173–1182 (2012)

    Google Scholar 

  11. Hasanain, M., Elsayed, T.: QU at TREC-2014: online clustering with temporal and topical expansion for tweet timeline generation. Technical report (2014)

    Google Scholar 

  12. Khan, M., Bollegala, D., Liu, G.: Multi-tweet summarization of real-time events. In: Proceedings of the SocialCom, pp. 128–133 (2013)

    Google Scholar 

  13. Li, J., Cardie, C.: Timeline generation: tracking individuals on twitter. In: Proceedings of the WWW, pp. 643–652 (2014)

    Google Scholar 

  14. Lin, J., Efron, M., Wang, Y., Sherman, G.: Overview of the TREC-2014 Microblog track. In: Proceedings of the TREC (2014)

    Google Scholar 

  15. Magdy, W., Gao, W., Elganainy, T., Wei, Z.: QCRI at TREC 2014: applying the kiss principle for the TTG task in the microblog track. Technical report (2014)

    Google Scholar 

  16. Nemhauser, G.L., Wolsey, L.A., Fisher, M.L.: An analysis of approximations for maximizing submodular set functions. Math. Program. 14(1), 265–294 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  17. Ren, Z., Liang, S., Meij, E., de Rijke, M.: Personalized time-aware tweets summarization. In: Proceedings of the SIGIR, pp. 513–522 (2013)

    Google Scholar 

  18. Shou, L., Wang, Z., Chen, K., Chen, G.: Sumblr: continuous summarization of evolving tweet streams. In: Proceedings of the SIGIR, pp. 533–542 (2013)

    Google Scholar 

  19. Wang, C., Yu, X., Li, Y., Zhai, C., Han, J.: Content coverage maximization on word networks for hierarchical topic summarization. In: Proceedings of the CIKM, pp. 249–258 (2013)

    Google Scholar 

  20. Zhao, X.W., Guo, Y., Yan, R., He, Y., Li, X.: Timeline generation with social attention. In: Proceedings of the SIGIR, pp. 1061–1064 (2013)

    Google Scholar 

Download references

Acknowledgement

This work was partially supported by ARC DP170102726, DP170102231 and National Natural Science Foundation of China (NSFC) 91646204.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhifeng Bao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Song, L., Zhang, P., Bao, Z., Sellis, T. (2017). Continuous Summarization over Microblog Threads. In: Candan, S., Chen, L., Pedersen, T., Chang, L., Hua, W. (eds) Database Systems for Advanced Applications. DASFAA 2017. Lecture Notes in Computer Science(), vol 10178. Springer, Cham. https://doi.org/10.1007/978-3-319-55699-4_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-55699-4_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-55698-7

  • Online ISBN: 978-3-319-55699-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics