research-article

Online In-Situ Interleaved Evaluation of Real-Time Push Notification Systems

Authors:

Jimmy LinAuthors Info & Claims

SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 415 - 424

https://doi.org/10.1145/3077136.3080808

Published: 07 August 2017 Publication History

Abstract

Real-time push notification systems monitor continuous document streams such as social media posts and alert users to relevant content directly on their mobile devices. We describe a user study of such systems in the context of the TREC 2016 Real-Time Summarization Track, where system updates are immediately delivered as push notifications to the mobile devices of a cohort of users. Our study represents, to our knowledge, the first deployment of an interleaved evaluation framework for prospective information needs, and also provides an opportunity to examine user behavior in a realistic setting. Results of our online in-situ evaluation are correlated against the results a more traditional post-hoc batch evaluation. We observe substantial correlations between many online and batch evaluation metrics, especially for those that share the same basic design (e.g., are utility-based). For some metrics, we observe little correlation, but are able to identify the volume of messages that a system pushes as one major source of differences.

References

[1]

Azzah Al-Maskari, Mark Sanderson, Paul Clough, and Eija Airio. 2008. The Good and the Bad System: Does the Test Collection Predict Users' Effectiveness? SIGIR. 59--66.

[2]

James Allan. 2002. Topic Detection and Tracking: Event-Based Information Organization. Kluwer Academic Publishers, Dordrecht, The Netherlands.

[3]

James Allan, Ben Carterette, and Joshua Lewis. 2005. When Will Information Retrieval Be "Good Enough"? User Effectiveness as a Function of Retrieval Accuracy. In SIGIR. 433--440.

[4]

Javed Aslam, Fernando Diaz, Matthew Ekstrand-Abueg, Richard McCreadie, Virgil Pavlu, and Tetsuya Sakai. 2015. TREC 2015 Temporal Summarization Track Overview TREC.

[5]

Peter Bailey, Nick Craswell, Ian Soboroff, Paul Thomas, Arjen P. de Vries, and Emine Yilmaz. 2008. Relevance Assessment: Are Judges Exchangeable and Does it Matter? SIGIR. 667--674.

Digital Library

[6]

Nicholas J. Belkin and W. Bruce Croft. 1992. Information Filtering and Information Retrieval: Two Sides of the Same Coin? CACM, Vol. 35, 12 (1992), 29--38.

Digital Library

[7]

Olivier Chapelle, Thorsten Joachims, Filip Radlinski, and Yisong Yue. 2012. Large-Scale Validation and Analysis of Interleaved Search Evaluation. ACM TOIS, Vol. 30, 1 (2012), Article 6.

Digital Library

[8]

Qi Guo, Fernando Diaz, and Elad Yom-Tov. 2013. Updating Users about Time Critical Events. In ECIR. 483--494.

Digital Library

[9]

Allan Hanbury, Henning Müller, Krisztian Balog, Torben Brodt, Gordon V. Cormack, Ivan Eggel, Tim Gollub, Frank Hopfgartner, Jayashree Kalpathy-Cramer, Noriko Kando, Anastasia Krithara, Jimmy Lin, Simon Mercer, and Martin Potthast. 2015. Evaluation-as-a-Service: Overview and Outlook. arXiv:1512.07454.

[10]

William Hersh, Andrew Turpin, Susan Price, Benjamin Chan, Dale Kramer, Lynetta Sacherek, and Daniel Olson. 2000. Do Batch and User Evaluations Give the Same Results? SIGIR. 17--24.

[11]

Katja Hofmann, Shimon Whiteson, and Maarten de Rijke. 2011. A Probabilistic Method for Inferring Preferences from Clicks CIKM. 249--258.

[12]

Ron Kohavi, Randal M. Henne, and Dan Sommerfield. 2007. Practical Guide to Controlled Experiments on the Web: Listen to Your Customers not to the HiPPO. In KDD. 959--967.

[13]

David D. Lewis. 1995. The TREC-4 Filtering Track. In TREC. 165--180.

[14]

Jimmy Lin, Miles Efron, Yulu Wang, and Garrick Sherman. 2014. Overview of the TREC-2014 Microblog Track. TREC.

[15]

Jimmy Lin, Miles Efron, Yulu Wang, Garrick Sherman, and Ellen Voorhees. 2015. Overview of the TREC-2015 Microblog Track. TREC.

[16]

Jimmy Lin, Adam Roegiest, Luchen Tan, Richard McCreadie, Ellen Voorhees, and Fernando Diaz. 2016. Overview of the TREC 2016 Real-Time Summarization Track TREC.

[17]

Abhinav Mehrotra, Veljko Pejovic, Jo Vermeulen, Robert Hendley, and Mirco Musolesi. 2016. My Phone and Me: Understanding People's Receptivity to Mobile Notifications CHI. 1021--1032.

[18]

Xin Qian, Jimmy Lin, and Adam Roegiest. 2016. Interleaved Evaluation for Retrospective Summarization and Prospective Notification on Document Streams. In SIGIR. 175--184.

Digital Library

[19]

Filip Radlinski and Nick Craswell 2010. Comparing the Sensitivity of Information Retrieval Metrics SIGIR. 667--674.

[20]

Filip Radlinski and Nick Craswell. 2013. Optimized Interleaving for Online Retrieval Evaluation WSDM. 245--254.

[21]

Stephen Robertson and Ian Soboroff. 2002. The TREC 2002 Filtering Track Report. In TREC.

[22]

Alan Said, Jimmy Lin, Alejandro Bellogín, and Arjen P. de Vries. 2013. A Month in the Life of a Production News Recommender System CIKM Workshop on Living Labs for Information Retrieval Evaluation. 7--10.

[23]

Mark Sanderson, Monica Paramita, Paul Clough, and Evangelos Kanoulas. 2010. Do User Preferences and Evaluation Measures Line Up? SIGIR. 555--562.

[24]

Anne Schuth, Krisztian Balog, and Liadh Kelly. 2015. Overview of the Living Labs for Information Retrieval Evaluation (LL4IR) CLEF Lab 2015. In CLEF.

Digital Library

[25]

Anne Schuth, Katja Hofmann, and Filip Radlinski. 2015. Predicting Search Satisfaction Metrics with Interleaved Comparisons SIGIR. 463--472.

[26]

Mark Smucker and Chandra Jethani. 2010. Human Performance and Retrieval Precision Revisited SIGIR. 595--602.

[27]

Ian Soboroff, Iadh Ounis, Craig Macdonald, and Jimmy Lin. 2012. Overview of the TREC-2012 Microblog Track. In TREC.

[28]

Luchen Tan, Adam Roegiest, Jimmy Lin, and Charles L. A. Clarke. 2016. An Exploration of Evaluation Metrics for Mobile Push Notifications SIGIR. 741--744.

[29]

Andrew Turpin and William R. Hersh. 2001. Why Batch and User Evaluations Do Not Give the Same Results SIGIR. 225--231.

[30]

Andrew Turpin and Falk Scholer. 2006. User Performance versus Precision Measures for Simple Search Tasks SIGIR. 11--18.

[31]

Yulu Wang, Garrick Sherman, Jimmy Lin, and Miles Efron. 2015. Assessor Differences and User Preferences in Tweet Timeline Generation SIGIR. 615--624.

Cited By

Jiang TLiu J(2022)Reflection on future directions: a systematic review of reported limitations and solutions in interactive information retrieval user studiesAslib Journal of Information Management10.1108/AJIM-05-2022-0253Online publication date: 19-Dec-2022
https://doi.org/10.1108/AJIM-05-2022-0253
Sultana TWalcott K(2022)NaPP: Notification and Push Performance in Wearable DevicesProceedings of the Future Technologies Conference (FTC) 2022, Volume 210.1007/978-3-031-18458-1_43(634-648)Online publication date: 13-Oct-2022
https://doi.org/10.1007/978-3-031-18458-1_43
Yang MQu QShen YZhao ZChen XLi C(2021)An Effective Hybrid Learning Model for Real-Time Event SummarizationIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2020.301774732:10(4419-4431)Online publication date: Oct-2021
https://doi.org/10.1109/TNNLS.2020.3017747
Show More Cited By

Index Terms

Online In-Situ Interleaved Evaluation of Real-Time Push Notification Systems
1. Information systems
  1. Information retrieval

Recommendations

An Exploration of Evaluation Metrics for Mobile Push Notifications
SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval

How do we evaluate systems that filter social media streams and send users updates via push notifications on their mobile phones? Such notifications must be relevant, timely, and novel. In this paper, we explore various evaluation metrics for this task, ...
Behavior Analysis of Microblog Users Based on Transitions in Posting Activities
IIWAS '13: Proceedings of International Conference on Information Integration and Web-based Applications & Services

In recent years, such microblogs as Twitter have spread widely over the world. Twitter, which enables instant text communications among users, was launched in 2006. In 2012, its Japanese users exceeded 29.9 million. Useful functions related to posting a ...
Dark retweets: investigating non-conventional retweeting patterns
SocInfo'12: Proceedings of the 4th international conference on Social Informatics

Retweets are an important mechanism for recognising propagation of information on the Twitter social media platform. However, many retweets do not use the official retweet mechanism, or even community established conventions, and these "dark retweets" ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

August 2017

1476 pages

ISBN:9781450350228

DOI:10.1145/3077136

General Chairs:
Noriko Kando
National Institute of Informatics
,
Tetsuya Sakai
Waseda University
,
Hideo Joho
University of Tsukuba
,
Program Chairs:
Hang Li
Huawei Noah's Ark Lab
,
Arjen P. de Vries
Radboud University
,
Ryen W. White
Microsoft Cortana

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Natural Sciences and Engineering Research Council of Canada

Conference

SIGIR '17

Sponsor:

SIGIR

SIGIR '17: The 40th International ACM SIGIR conference on research and development in Information Retrieval

August 7 - 11, 2017

Tokyo, Shinjuku, Japan

Acceptance Rates

SIGIR '17 Paper Acceptance Rate 78 of 362 submissions, 22%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
308
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)2

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Jiang TLiu J(2022)Reflection on future directions: a systematic review of reported limitations and solutions in interactive information retrieval user studiesAslib Journal of Information Management10.1108/AJIM-05-2022-0253Online publication date: 19-Dec-2022
https://doi.org/10.1108/AJIM-05-2022-0253
Sultana TWalcott K(2022)NaPP: Notification and Push Performance in Wearable DevicesProceedings of the Future Technologies Conference (FTC) 2022, Volume 210.1007/978-3-031-18458-1_43(634-648)Online publication date: 13-Oct-2022
https://doi.org/10.1007/978-3-031-18458-1_43
Yang MQu QShen YZhao ZChen XLi C(2021)An Effective Hybrid Learning Model for Real-Time Event SummarizationIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2020.301774732:10(4419-4431)Online publication date: Oct-2021
https://doi.org/10.1109/TNNLS.2020.3017747
Modha SMajumder PMandl TSingla R(2021)Design and analysis of microblog-based summarization systemSocial Network Analysis and Mining10.1007/s13278-021-00830-311:1Online publication date: 2-Nov-2021
https://doi.org/10.1007/s13278-021-00830-3
Mehrotra AMusolesi M(2020)Intelligent Notification SystemsSynthesis Lectures on Mobile and Pervasive Computing10.2200/S00965ED1V01Y201911MPC01411:1(1-75)Online publication date: 3-Jan-2020
https://doi.org/10.2200/S00965ED1V01Y201911MPC014
Byrd RSultana TWalcott K(2020)AHPCap: A Framework for Automated Hardware Proﬁling and Capture of Mobile Application States2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)10.1109/ISSREW51248.2020.00069(183-188)Online publication date: Oct-2020
https://doi.org/10.1109/ISSREW51248.2020.00069
Mehrotra AHendley RMusolesi MAzzopardi LHalvey MRuthven IJoho HMurdock VQvarfordt P(2019)NotifyMeHereProceedings of the 2019 Conference on Human Information Interaction and Retrieval10.1145/3295750.3298932(103-111)Online publication date: 8-Mar-2019
https://dl.acm.org/doi/10.1145/3295750.3298932
Yang MTu WQu QLei KChen XZhu JShen Y(2019)MARESWorld Wide Web10.1007/s11280-018-0597-722:2(499-515)Online publication date: 1-Mar-2019
https://dl.acm.org/doi/10.1007/s11280-018-0597-7
Lin JMohammed SSequiera RTan LCollins-Thompson KMei QDavison BLiu YYilmaz E(2018)Update Delivery Mechanisms for Prospective Information NeedsThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3210018(785-794)Online publication date: 27-Jun-2018
https://dl.acm.org/doi/10.1145/3209978.3210018
Ghelani NMohammed SWang SLin JKando NSakai TJoho HLi Hde Vries AWhite R(2017)Event Detection on Curated Tweet StreamsProceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3077136.3084141(1325-1328)Online publication date: 7-Aug-2017
https://dl.acm.org/doi/10.1145/3077136.3084141

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten