Skip to main content

Improving Tweet Timeline Generation by Predicting Optimal Retrieval Depth

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9460))

Abstract

Tweet Timeline Generation (TTG) systems provide users with informative and concise summaries of topics, as they developed over time, in a retrospective manner. In order to produce a tweet timeline that constitutes a summary of a given topic, a TTG system typically retrieves a list of potentially-relevant tweets over which the timeline is eventually generated. In such design, dependency of the performance of the timeline generation step on that of the retrieval step is inevitable.

In this work, we aim at improving the performance of a given timeline generation system by controlling the depth of the ranked list of retrieved tweets considered in generating the timeline. We propose a supervised approach in which we predict the optimal depth of the ranked tweet list for a given topic by combining estimates of list quality computed at different depths.

We conducted our experiments on a recent TREC TTG test collection of 243 M tweets and 55 topics. We experimented with 14 different retrieval models (used to retrieve the initial ranked list of tweets) and 3 different TTG models (used to generate the final timeline). Our results demonstrate the effectiveness of the proposed approach; it managed to improve TTG performance over a strong baseline in 76 % of the cases, out of which 31 % were statistically significant, with no single significant degradation observed.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    No published work on the system design of the remaining 3 teams.

  2. 2.

    http://www.cs.waikato.ac.nz/ml/weka/.

  3. 3.

    We observed that TTG performance is less sensitive to change in list depth with large cutoffs.

References

  1. Alonso, O., Shiells, K.: Timelines as summaries of popular scheduled events. In: Proceedings of the 22nd International Conference on World Wide Web Companion, pp. 1037–1044. WWW 2013 Companion (2013)

    Google Scholar 

  2. Arampatzis, A., Kamps, J., Robertson, S.: Where to stop reading a ranked list?: threshold optimization using truncated score distributions. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 524–531. SIGIR 2009 (2009)

    Google Scholar 

  3. Carmel, D., Yom-Tov, E.: Estimating the Query Difficulty for Information Retrieval. Synthesis Lectures on Information Concepts, Retrieval, and Services 2(1), 1–89 (2010)

    Article  Google Scholar 

  4. Chen, X., Tang, B., Chen, G.: BUPT_pris at TREC 2014 microblog track. In: TREC 2014 (2014)

    Google Scholar 

  5. Chen, Y., Zhang, X., Li, Z., Ng, J.P.: Search engine reinforced semi-supervised classification and graph-based summarization of microblogs. Neurocomputing 152, 274–286 (2015)

    Article  Google Scholar 

  6. Cronen-Townsend, S., Zhou, Y., Croft, W.B.: Predicting query performance. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2002, pp. 299–306 (2002)

    Google Scholar 

  7. Cummins, R.: Predicting query performance directly from score distributions. In: Salem, M.V.M., Shaalan, K., Oroumchian, F., Shakery, A., Khelalfa, H. (eds.) AIRS 2011. LNCS, vol. 7097, pp. 315–326. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  8. Erkan, G., Radev, D.R.: Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22(1), 457–479 (2004)

    Google Scholar 

  9. Hasanain, M., Elsayed, T.: QU at TREC-2014: online clustering with temporal and topical expansion for tweet timeline generation. In: TREC 2014 (2014)

    Google Scholar 

  10. Hasanain, M., Malhas, R., Elsayed, T.: Query performance prediction for microblog search: a preliminary study. In: Proceedings of the First International Workshop on Social Media Retrieval and Analysis, SoMeRA 2014, pp. 1–6 (2014)

    Google Scholar 

  11. Keikha, M., Gerani, S., Crestani, F.: Time-based relevance models. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, pp. 1087–1088 (2011)

    Google Scholar 

  12. Lan, Y., Niu, S., Guo, J., Cheng, X.: Is top-k sufficient for ranking? In: Proceedings of the 22nd ACM International Conference on Conference on Information and Knowledge Management, CIKM 2013, pp. 1261–1270 (2013)

    Google Scholar 

  13. Li, X., Croft, W.B.: Time-based language models. In: Proceedings of the Twelfth International Conference on Information and Knowledge Management, CIKM 2003, pp. 469–475 (2003)

    Google Scholar 

  14. Lin, J., Efron, M., Wang, Y., Garrick, S.: Overview of the TREC-2014 microblog track (notebook draft). In: TREC 2014 (2014)

    Google Scholar 

  15. Louis, A., Nenkova, A.: Performance confidence estimation for automatic summarization. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2009, pp. 541–548 (2009)

    Google Scholar 

  16. Lv, C., Fan, F., Qiang, R., Fei, Y., Yang, J.: PKUICST at TREC 2014 microblog track: feature extraction for effective microblog search and adaptative clustering algorithms for TTG. In: TREC 2014 (2014)

    Google Scholar 

  17. Magdy, W., Gao, W., Elganainy, T., Zhongyu, W.: QCRI at TREC 2014:applying the KISS principle for the TTG task in the microblog track. In: TREC 2014 (2014)

    Google Scholar 

  18. Rodriguez Perez, J.A., Jose, J.M.: Predicting query performance in microblog retrieval. In: Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2014, pp. 1183–1186 (2014)

    Google Scholar 

  19. Shou, L., Wang, Z., Chen, K., Chen, G.: Sumblr: Continuous summarization of evolving tweet streams. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2013, pp. 533–542 (2013)

    Google Scholar 

  20. Shtok, A., Kurland, O., Carmel, D., Raiber, F., Markovits, G.: Predicting query performance by query-drift estimation. ACM Trans. Inf. Syst. 30(2), 11:1–11:35 (2012)

    Article  Google Scholar 

  21. Wang, Y., Witten, I.H.: Induction of model trees for predicting continuous classes. In: Proceedings of the 9th European Conference on Machine Learning Poster Papers, ECML 1997 (1997)

    Google Scholar 

  22. Xu, T., McNamee, P., Oard, D.W.: HLTCOE at TREC 2014: microblog and clinical decision support. In: TREC 2014 (2014)

    Google Scholar 

Download references

Acknowledgments

This work was made possible by NPRP grant# NPRP 6-1377-1-257 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maram Hasanain .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Hasanain, M., Elsayed, T., Magdy, W. (2015). Improving Tweet Timeline Generation by Predicting Optimal Retrieval Depth. In: Zuccon, G., Geva, S., Joho, H., Scholer, F., Sun, A., Zhang, P. (eds) Information Retrieval Technology. AIRS 2015. Lecture Notes in Computer Science(), vol 9460. Springer, Cham. https://doi.org/10.1007/978-3-319-28940-3_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-28940-3_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-28939-7

  • Online ISBN: 978-3-319-28940-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics