Improving Tweet Timeline Generation by Predicting Optimal Retrieval Depth

Hasanain, Maram; Elsayed, Tamer; Magdy, Walid

doi:10.1007/978-3-319-28940-3_11

Improving Tweet Timeline Generation by Predicting Optimal Retrieval Depth

Maram Hasanain¹⁹,
Tamer Elsayed¹⁹ &
Walid Magdy²⁰

Conference paper
First Online: 22 January 2016

823 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9460))

Abstract

Tweet Timeline Generation (TTG) systems provide users with informative and concise summaries of topics, as they developed over time, in a retrospective manner. In order to produce a tweet timeline that constitutes a summary of a given topic, a TTG system typically retrieves a list of potentially-relevant tweets over which the timeline is eventually generated. In such design, dependency of the performance of the timeline generation step on that of the retrieval step is inevitable.

In this work, we aim at improving the performance of a given timeline generation system by controlling the depth of the ranked list of retrieved tweets considered in generating the timeline. We propose a supervised approach in which we predict the optimal depth of the ranked tweet list for a given topic by combining estimates of list quality computed at different depths.

We conducted our experiments on a recent TREC TTG test collection of 243 M tweets and 55 topics. We experimented with 14 different retrieval models (used to retrieve the initial ranked list of tweets) and 3 different TTG models (used to generate the final timeline). Our results demonstrate the effectiveness of the proposed approach; it managed to improve TTG performance over a strong baseline in 76 % of the cases, out of which 31 % were statistically significant, with no single significant degradation observed.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
No published work on the system design of the remaining 3 teams.
2.
http://www.cs.waikato.ac.nz/ml/weka/.
3.
We observed that TTG performance is less sensitive to change in list depth with large cutoffs.

References

Alonso, O., Shiells, K.: Timelines as summaries of popular scheduled events. In: Proceedings of the 22nd International Conference on World Wide Web Companion, pp. 1037–1044. WWW 2013 Companion (2013)
Google Scholar
Arampatzis, A., Kamps, J., Robertson, S.: Where to stop reading a ranked list?: threshold optimization using truncated score distributions. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 524–531. SIGIR 2009 (2009)
Google Scholar
Carmel, D., Yom-Tov, E.: Estimating the Query Difficulty for Information Retrieval. Synthesis Lectures on Information Concepts, Retrieval, and Services 2(1), 1–89 (2010)
Article Google Scholar
Chen, X., Tang, B., Chen, G.: BUPT_pris at TREC 2014 microblog track. In: TREC 2014 (2014)
Google Scholar
Chen, Y., Zhang, X., Li, Z., Ng, J.P.: Search engine reinforced semi-supervised classification and graph-based summarization of microblogs. Neurocomputing 152, 274–286 (2015)
Article Google Scholar
Cronen-Townsend, S., Zhou, Y., Croft, W.B.: Predicting query performance. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2002, pp. 299–306 (2002)
Google Scholar
Cummins, R.: Predicting query performance directly from score distributions. In: Salem, M.V.M., Shaalan, K., Oroumchian, F., Shakery, A., Khelalfa, H. (eds.) AIRS 2011. LNCS, vol. 7097, pp. 315–326. Springer, Heidelberg (2011)
Chapter Google Scholar
Erkan, G., Radev, D.R.: Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22(1), 457–479 (2004)
Google Scholar
Hasanain, M., Elsayed, T.: QU at TREC-2014: online clustering with temporal and topical expansion for tweet timeline generation. In: TREC 2014 (2014)
Google Scholar
Hasanain, M., Malhas, R., Elsayed, T.: Query performance prediction for microblog search: a preliminary study. In: Proceedings of the First International Workshop on Social Media Retrieval and Analysis, SoMeRA 2014, pp. 1–6 (2014)
Google Scholar
Keikha, M., Gerani, S., Crestani, F.: Time-based relevance models. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, pp. 1087–1088 (2011)
Google Scholar
Lan, Y., Niu, S., Guo, J., Cheng, X.: Is top-k sufficient for ranking? In: Proceedings of the 22nd ACM International Conference on Conference on Information and Knowledge Management, CIKM 2013, pp. 1261–1270 (2013)
Google Scholar
Li, X., Croft, W.B.: Time-based language models. In: Proceedings of the Twelfth International Conference on Information and Knowledge Management, CIKM 2003, pp. 469–475 (2003)
Google Scholar
Lin, J., Efron, M., Wang, Y., Garrick, S.: Overview of the TREC-2014 microblog track (notebook draft). In: TREC 2014 (2014)
Google Scholar
Louis, A., Nenkova, A.: Performance confidence estimation for automatic summarization. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2009, pp. 541–548 (2009)
Google Scholar
Lv, C., Fan, F., Qiang, R., Fei, Y., Yang, J.: PKUICST at TREC 2014 microblog track: feature extraction for effective microblog search and adaptative clustering algorithms for TTG. In: TREC 2014 (2014)
Google Scholar
Magdy, W., Gao, W., Elganainy, T., Zhongyu, W.: QCRI at TREC 2014:applying the KISS principle for the TTG task in the microblog track. In: TREC 2014 (2014)
Google Scholar
Rodriguez Perez, J.A., Jose, J.M.: Predicting query performance in microblog retrieval. In: Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2014, pp. 1183–1186 (2014)
Google Scholar
Shou, L., Wang, Z., Chen, K., Chen, G.: Sumblr: Continuous summarization of evolving tweet streams. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2013, pp. 533–542 (2013)
Google Scholar
Shtok, A., Kurland, O., Carmel, D., Raiber, F., Markovits, G.: Predicting query performance by query-drift estimation. ACM Trans. Inf. Syst. 30(2), 11:1–11:35 (2012)
Article Google Scholar
Wang, Y., Witten, I.H.: Induction of model trees for predicting continuous classes. In: Proceedings of the 9th European Conference on Machine Learning Poster Papers, ECML 1997 (1997)
Google Scholar
Xu, T., McNamee, P., Oard, D.W.: HLTCOE at TREC 2014: microblog and clinical decision support. In: TREC 2014 (2014)
Google Scholar

Download references

Acknowledgments

This work was made possible by NPRP grant# NPRP 6-1377-1-257 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.

Author information

Authors and Affiliations

Computer Science and Engineering Department, College of Engineering, Qatar University, Doha, Qatar
Maram Hasanain & Tamer Elsayed
Qatar Computing Research Institute, HBKU, Doha, Qatar
Walid Magdy

Authors

Maram Hasanain
View author publications
You can also search for this author in PubMed Google Scholar
Tamer Elsayed
View author publications
You can also search for this author in PubMed Google Scholar
Walid Magdy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maram Hasanain .

Editor information

Editors and Affiliations

Science and Engineering Faculty, Queensland University of Technology, Brisbane, Australia
Guido Zuccon
Brisbane, Queensland, Australia
Shlomo Geva
University of Tsukuba, Ibaraki, Japan
Hideo Joho
RMIT University, Melbourne, Australia
Falk Scholer
School of Computer Engineering, Nanyang Technological University, Singapore, Singapore
Aixin Sun
Tianjin University, China
Peng Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hasanain, M., Elsayed, T., Magdy, W. (2015). Improving Tweet Timeline Generation by Predicting Optimal Retrieval Depth. In: Zuccon, G., Geva, S., Joho, H., Scholer, F., Sun, A., Zhang, P. (eds) Information Retrieval Technology. AIRS 2015. Lecture Notes in Computer Science(), vol 9460. Springer, Cham. https://doi.org/10.1007/978-3-319-28940-3_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-28940-3_11
Published: 22 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28939-7
Online ISBN: 978-3-319-28940-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics