skip to main content
10.1145/3077136.3080795acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Evaluating Mobile Search with Height-Biased Gain

Published: 07 August 2017 Publication History

Abstract

Mobile search engine result pages (SERPs) are becoming highly visual and heterogenous. Unlike the traditional ten-blue-link SERPs for desktop search, different verticals and cards occupy different amounts of space within the small screen. Hence, traditional retrieval measures that regard the SERP as a ranked list of homogeneous items are not adequate for evaluating the overall quality of mobile SERPs. Specifically, we address the following new problems in mobile search evaluation: (1) Different retrieved items have different heights within the scrollable SERP, unlike a ten-blue-link SERP in which results have similar heights with each other. Therefore, the traditional rank-based decaying functions are not adequate for mobile search metrics. (2) For some types of verticals and cards, the information that the user seeks is already embedded in the snippet, which makes clicking on those items to access the landing page unnecessary. (3) For some results with complex sub-components (and usually a large height), the total gain of the results cannot be obtained if users only read part of their contents. The benefit brought by the result is affected by user's reading behavior and the internal gain distribution (over the height) should be modeled to get a more accurate estimation. To tackle these problems, we conduct a lab-based user study to construct suitable user behavior model for mobile search evaluation. From the results, we find that the geometric heights of user's browsing trails can be adopted as a good signal of user effort. Based on these findings, we propose a new evaluation metric, Height-Biased Gain, which is calculated by summing up the product of gain distribution and discount factors that are both modeled in terms of result height. To evaluate the effectiveness of the proposed metric, we compare the agreement of evaluation metrics with side-by-side user preferences on a test collection composed of four mobile search engines. Experimental results show that HBG agrees with user preferences 85.33% of the time, which is better than all existing metrics.

References

[1]
Olga Arkhipova and Lidia Grauer. Evaluating mobile web search performance by taking good abandonment into account SIGIR '14.
[2]
Ricardo Baeza-Yates, Georges Dupret, and Javier Velasco. A study of mobile search queries in Japan. In WWW '07.
[3]
Ben Carterette. System Effectiveness, User Models, and User Utility: A Conceptual Framework for Investigation. In SIGIR '11.
[4]
Olivier Chapelle, Shihao Ji, Ciya Liao, Emre Velipasaoglu, Larry Lai, and Su-Lin Wu. 2011. Intent-based diversification of web search results: metrics and algorithms. Information Retrieval Vol. 14, 6 (2011), 572--592.
[5]
Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan. Expected Reciprocal Rank for Graded Relevance. In CIKM '09.
[6]
Charles LA Clarke, Nick Craswell, Ian Soboroff, and Azin Ashkan. A comparative analysis of cascade measures for novelty and diversity WSDM '11.
[7]
Cyril W Cleverdon and Michael Keen 1966. Aslib Cranfield research project-Factors determining the performance of indexing systems; Volume 2, Test results. (1966).
[8]
Qi Guo, Haojian Jin, Dmitry Lagun, Shuai Yuan, and Eugene Agichtein. Mining Touch Interaction Data on Mobile Devices to Predict Web Search Result Relevance SIGIR '13.
[9]
Qi Guo and Yang Song. Large-Scale Analysis of Viewing Behavior: Towards Measuring Satisfaction with Mobile Proactive Systems. In CIKM '16.
[10]
Xiangnan He, Hanwang Zhang, Min-Yen Kan, and Tat-Seng Chua. Fast matrix factorization for online recommendation with implicit feedback SIGIR '16.
[11]
Kalervo J"arvelin and Jaana Kek"al"ainen 2002. Cumulated Gain-based Evaluation of IR Techniques. ACM Trans. Inf. Syst. Vol. 20, 4 (2002).
[12]
Matt Jones, Gary Marsden, Norliza Mohd-Nasir, Kevin Boone, and George Buchanan 1999. Improving Web interaction on small displays. Computer Networks, Vol. 31, 11 (1999), 1129--1137.
[13]
Maryam Kamvar, Melanie Kellar, Rajan Patel, and Ya Xu. Computers and Iphones and Mobile Phones, Oh My!: A Logs-based Comparison of Search Users on Different Devices. In WWW '09.
[14]
Makoto P Kato, Virgil Pavlu, Tetsuya Sakai, Takehiro Yamamoto, and Hajime Morita. Two-layered Summaries for Mobile Search: Does the Evaluation Measure Reflect User Preferences? EVIA '16.
[15]
Jaewon Kim, Paul Thomas, Ramesh Sankaranarayana, Tom Gedeon, and Hwan-Jin Yoon 2015. Eye-tracking analysis of user behavior and performance in web search on large and small screens. Journal of the Association for Information Science and Technology, Vol. 66, 3 (2015), 526--544.
[16]
Jaewon Kim, Paul Thomas, Ramesh Sankaranarayana, Tom Gedeon, and Hwan-Jin Yoon 2016. Understanding eye movements on mobile devices for better presentation of search results. Journal of the Association for Information Science and Technology (2016).
[17]
Dmitry Lagun, Chih-Hung Hsieh, Dale Webster, and Vidhya Navalpakkam. Towards better measurement of attention and satisfaction in mobile search SIGIR '14.
[18]
Dmitry Lagun, Donal McMahon, and Vidhya Navalpakkam. Understanding Mobile Searcher Attention with Rich Ad Formats CIKM '16.
[19]
Jane Li, Scott Huffman, and Akihito Tokuda. Good abandonment in mobile and PC internet search. SIGIR '09.
[20]
Yiqun Liu, Ruihua Song, Min Zhang, Zhicheng Dou, Takehiro Yamamoto, Makoto P Kato, Hiroaki Ohshima, and Ke Zhou. 2014. Overview of the NTCIR-11 IMine Task. In NTCIR '12.
[21]
Lori Lorigo, Maya Haridasan, Hrönn Brynjarsdóttir, Ling Xia, Thorsten Joachims, Geri Gay, Laura Granka, Fabio Pellacini, and Bing Pan. 2008. Eye Tracking and Online Search: Lessons Learned and Challenges Ahead. J. Am. Soc. Inf. Sci. Technol. Vol. 59, 7 (2008).
[22]
Ping Luo, Ganbin Zhou, Jiaxi Tang, Rui Chen, Zhongjie Yu, and Qing He. Browsing Regularities in Hedonic Content Systems. IJCAI '16.
[23]
Alistair Moffat and Justin Zobel 2008. Rank-biased Precision for Measurement of Retrieval Effectiveness. ACM Trans. Inf. Syst. Vol. 27, 1 (2008).
[24]
Tetsuya Sakai. Evaluating Evaluation Metrics Based on the Bootstrap SIGIR '06.
[25]
Tetsuya Sakai. Evaluation with Informational and Navigational Intents WWW '12.
[26]
Tetsuya Sakai. New Performance Metrics Based on Multigrade Relevance: Their Application to Question Answering. In NTCIR '04.
[27]
Tetsuya Sakai. 2007. On the properties of evaluation metrics for finding one highly relevant document. Information and Media Technologies Vol. 2, 4 (2007), 1163--1180.
[28]
Tetsuya Sakai. 2014. Metrics, Statistics, Tests. Springer Berlin Heidelberg, Berlin, Heidelberg, 116--163.
[29]
Tetsuya Sakai and Zhicheng Dou. Summaries, Ranked Retrieval and Sessions: A Unified Framework for Information Access Evaluation SIGIR '13.
[30]
Tetsuya Sakai and Stephen Robertson 2008. Modelling A User Population for Designing Information Retrieval Metrics. EVIA '08.
[31]
Mark Sanderson, Monica Lestari Paramita, Paul Clough, and Evangelos Kanoulas. Do User Preferences and Evaluation Measures Line Up? SIGIR '10.
[32]
Mark D. Smucker and Charles L.A. Clarke. Time-based Calibration of Effectiveness Measures. SIGIR '12.
[33]
Yang Song, Hao Ma, Hongning Wang, and Kuansan Wang. Exploring and Exploiting User Search Behavior on Mobile and Tablet Devices to Improve Search Relevance. In WWW '13.
[34]
Ellen M Voorhees and others 1999. The TREC-8 Question Answering Track Report. In Trec, Vol. Vol. 99. 77--82.
[35]
Ellen M. Voorhees and Donna K. Harman 2005. TREC: Experiment and Evaluation in Information Retrieval. The MIT Press.
[36]
Kyle Williams, Julia Kiseleva, Aidan C Crook, Imed Zitouni, Ahmed Hassan Awadallah, and Madian Khabsa. Is This Your Final Answer?: Evaluating the Effect of Answers on Good Abandonment in Mobile Search SIGIR '16.
[37]
Jeonghee Yi, Farzin Maghoul, and Jan Pedersen. Deciphering mobile search patterns: a study of yahoo! mobile search queries WWW '08.
[38]
Emine Yilmaz, Milad Shokouhi, Nick Craswell, and Stephen Robertson. Expected Browsing Utility for Web Search Evaluation CIKM '10.
[39]
Yuye Zhang, Laurence A. F. Park, and Alistair Moffat. 2010. Click-based evidence for decaying weight distributions in search effectiveness metrics. Information Retrieval Vol. 13, 1 (2010), 46--69. ISSN1573-7659
[40]
Ke Zhou, Ronan Cummins, Mounia Lalmas, and Joemon M. Jose. Evaluating Aggregated Search Pages. In SIGIR '12.
[41]
Ke Zhou, Ronan Cummins, Mounia Lalmas, and Joemon M Jose. Evaluating reward and risk for vertical selection. CIKM '12.

Cited By

View all
  • (2023)Practice and Challenges in Building a Business-oriented Search Engine Quality MetricProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591841(3295-3299)Online publication date: 19-Jul-2023
  • (2023)Formally Modeling Users in Information RetrievalA Behavioral Economics Approach to Interactive Information Retrieval10.1007/978-3-031-23229-9_2(23-64)Online publication date: 18-Feb-2023
  • (2022)From linear to non-linear: investigating the effects of right-rail results on complex SERPsAdvances in Computational Intelligence10.1007/s43674-021-00028-22:1Online publication date: 10-Jan-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval
August 2017
1476 pages
ISBN:9781450350228
DOI:10.1145/3077136
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. evaluation metric
  2. mobile search
  3. user behavior

Qualifiers

  • Research-article

Funding Sources

Conference

SIGIR '17
Sponsor:

Acceptance Rates

SIGIR '17 Paper Acceptance Rate 78 of 362 submissions, 22%;
Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)2
Reflects downloads up to 27 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Practice and Challenges in Building a Business-oriented Search Engine Quality MetricProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591841(3295-3299)Online publication date: 19-Jul-2023
  • (2023)Formally Modeling Users in Information RetrievalA Behavioral Economics Approach to Interactive Information Retrieval10.1007/978-3-031-23229-9_2(23-64)Online publication date: 18-Feb-2023
  • (2022)From linear to non-linear: investigating the effects of right-rail results on complex SERPsAdvances in Computational Intelligence10.1007/s43674-021-00028-22:1Online publication date: 10-Jan-2022
  • (2022)Offline recommender system evaluationAI Magazine10.1002/aaai.1205143:2(225-238)Online publication date: 16-Jun-2022
  • (2020)Cascade or RecencyProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401163(389-398)Online publication date: 25-Jul-2020
  • (2020)Fundamental Limits on the Regret of Online Network-CachingProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/33921434:2(1-31)Online publication date: 12-Jun-2020
  • (2020)Mechanism Design for Online Resource AllocationProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/33921424:2(1-46)Online publication date: 12-Jun-2020
  • (2020)A Survey of Figurative Language and Its Computational Detection in Online Social NetworksACM Transactions on the Web10.1145/337554714:1(1-52)Online publication date: 7-Feb-2020
  • (2020)Data-Driven Evaluation Metrics for Heterogeneous Search Engine Result PagesProceedings of the 2020 Conference on Human Information Interaction and Retrieval10.1145/3343413.3377959(213-222)Online publication date: 14-Mar-2020
  • (2020)Evaluation of Information Access with SmartphonesEvaluating Information Retrieval and Access Tasks10.1007/978-981-15-5554-1_11(151-167)Online publication date: 2-Sep-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media