research-article

Cascade or Recency: Constructing Better Evaluation Metrics for Session Search

Authors:
Fan Zhang

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Jiaxin Mao

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Yiqun Liu

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Weizhi Ma

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Min Zhang

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Shaoping Ma

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information RetrievalJuly 2020Pages 389–398https://doi.org/10.1145/3397271.3401163

Published:25 July 2020Publication History

SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 389–398

ABSTRACT

Recently session search evaluation has been paid more attention as a realistic search scenario usually involves multiple queries and interactions between users and systems. Evolved from model-based evaluation metrics for a single query, existing session-based metrics also follow a generic framework based on the cascade hypothesis. The cascade hypothesis assumes that lower-ranked search results and later-issued queries receive less attention from users and should therefore be assigned smaller weights when calculating evaluation metrics. This hypothesis gains much success in modeling search users' behavior and designing evaluation metrics, by explaining why users' attention decays on search engine result pages. However, recent studies have found that the recency effect also plays an important role in determining user satisfaction in search sessions. Especially, whether a user feels satisfied in the later-issued queries heavily influences his/her search satisfaction in the whole session. To take both the cascade hypothesis and the recency effect into the design of session search evaluation metrics, we propose Recency-aware Session-based Metrics (RSMs) to simultaneously characterize users' examination process with a browsing model and cognitive process with a utility accumulation model. With both self-constructed and public available user search behavior datasets, we show the effectiveness of proposed RSMs by comparing them with existing session-based metrics in the light of correlation with user satisfaction. We also find that the influence of the cascade and the recency effects varies dramatically among tasks with different difficulties and complexities, which suggests that we should use different model parameters for different types of search tasks. Our findings highlight the importance of investigating and utilizing cognitive effects besides examination hypotheses in search evaluation.

References

Leif Azzopardi, Paul Thomas, and Nick Craswell. 2018. Measuring the utility of search engine result pages: an information foraging based measure. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. ACM, 605--614.Google ScholarDigital Library
AD Baddeley. 1968. Prior recall of newly learned items and the recency effect in free recall. Canadian Journal of Psychology/Revue canadienne de psychologie, Vol. 22, 3 (1968), 157.Google Scholar
Peter Bailey, Alistair Moffat, Falk Scholer, and Paul Thomas. 2015. User variability and IR system evaluation. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 625--634.Google ScholarDigital Library
Ben Carterette. 2011. System effectiveness, user models, and user utility: a conceptual framework for investigation. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. ACM, 903--912.Google ScholarDigital Library
Ben Carterette, Ashraf Bah, and Mustafa Zengin. 2015. Dynamic test collections for retrieval evaluation. In Proceedings of the 2015 international conference on the theory of information retrieval. ACM, 91--100.Google ScholarDigital Library
Ben Carterette, Evangelos Kanoulas, Mark Hall, and Paul Clough. 2014. Overview of the TREC 2014 session track. Technical Report. DELAWARE UNIV NEWARK DEPT OF COMPUTER AND INFORMATION SCIENCES.Google Scholar
Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan. 2009. Expected reciprocal rank for graded relevance. In Proceedings of the 18th ACM conference on Information and knowledge management. ACM, 621--630.Google ScholarDigital Library
Cyril Cleverdon, Jack Mills, and Michael Keen. 1966. ASLIB Cranfield Research Project: factors determining the performance of indexing systems. (1966).Google Scholar
Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. 2008. An experimental comparison of click position-bias models. In Proceedings of the 2008 international conference on web search and data mining. ACM, 87--94.Google ScholarDigital Library
Jiyin He and Emine Yilmaz. 2017. User behaviour and task characteristics: A field study of daily information behaviour. In Proceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval. ACM, 67--76.Google ScholarDigital Library
Kalervo J"arvelin and Jaana Kek"al"ainen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS), Vol. 20, 4 (2002), 422--446.Google ScholarDigital Library
Kalervo J"arvelin, Susan L Price, Lois ML Delcambre, and Marianne Lykke Nielsen. 2008. Discounted cumulated gain based evaluation of multiple-query IR sessions. In European Conference on Information Retrieval. Springer, 4--15.Google Scholar
Jiepu Jiang and James Allan. 2016. Correlation between system and user metrics in a session. In Proceedings of the 2016 ACM on Conference on Human Information Interaction and Retrieval. ACM, 285--288.Google ScholarDigital Library
Jiepu Jiang, Daqing He, and James Allan. 2014. Searching, browsing, and clicking in a search session: changes in user behavior by task and over time. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. ACM, 607--616.Google ScholarDigital Library
Rosie Jones and Kristina Lisa Klinkner. 2008. Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs. In Proceedings of the 17th ACM conference on Information and knowledge management. ACM, 699--708.Google ScholarDigital Library
Santiago Larrain, Christoph Trattner, Denis Parra, Eduardo Graells-Garrido, and Kjetil Nørvåg. 2015. Good times bad times: A study on recency effects in collaborative filtering for social tagging. In Proceedings of the 9th ACM Conference on Recommender Systems. ACM, 269--272.Google ScholarDigital Library
Aldo Lipani, Ben Carterette, and Emine Yilmaz. 2019. From a User Model for Query Sessions to Session Rank Biased Precision (sRBP). In Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval. ACM, 109--116.Google ScholarDigital Library
Jingjing Liu, Michael J Cole, Chang Liu, Ralf Bierig, Jacek Gwizdka, Nicholas J Belkin, Jun Zhang, and Xiangmin Zhang. 2010. Search behaviors in different task types. In Proceedings of the 10th annual joint conference on Digital libraries. ACM, 69--78.Google ScholarDigital Library
Mengyang Liu, Yiqun Liu, Jiaxin Mao, Cheng Luo, and Shaoping Ma. 2018. Towards designing better session search evaluation metrics. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. ACM, 1121--1124.Google ScholarDigital Library
Mengyang Liu, Jiaxin Mao, Yiqun Liu, Min Zhang, and Shaoping Ma. 2019. Investigating Cognitive Effects in Session-level Search User Satisfaction. KDD.Google Scholar
Cheng Luo, Yiqun Liu, Tetsuya Sakai, Fan Zhang, Min Zhang, and Shaoping Ma. 2017. Evaluating mobile search with height-biased gain. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 435--444.Google ScholarDigital Library
Jiyun Luo, Christopher Wing, Hui Yang, and Marti Hearst. 2013. The water filling model and the cube test: multi-dimensional evaluation for professional search. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management. ACM, 709--714.Google ScholarDigital Library
Jiaxin Mao, Yiqun Liu, Ke Zhou, Jian-Yun Nie, Jingtao Song, Min Zhang, Shaoping Ma, Jiashen Sun, and Hengliang Luo. 2016. When does relevance mean usefulness and user satisfaction in Web search?. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 463--472.Google ScholarDigital Library
Alistair Moffat, Paul Thomas, and Falk Scholer. 2013. Users versus models: What observation tells us about effectiveness metrics. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management. ACM, 659--668.Google ScholarDigital Library
Alistair Moffat and Justin Zobel. 2008. Rank-biased precision for measurement of retrieval effectiveness. ACM Transactions on Information Systems (TOIS), Vol. 27, 1 (2008), 2.Google ScholarDigital Library
Tetsuya Sakai and Zhicheng Dou. 2013. Summaries, ranked retrieval and sessions: A unified framework for information access evaluation. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. ACM, 473--482.Google ScholarDigital Library
Mark Sanderson et almbox. 2010. Test collection based evaluation of information retrieval systems. Foundations and Trends® in Information Retrieval, Vol. 4, 4 (2010), 247--375.Google Scholar
Mark D Smucker and Charles LA Clarke. 2012. Time-based calibration of effectiveness measures. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. ACM, 95--104.Google ScholarDigital Library
Zhiwen Tang and Grace Hui Yang. 2017. Investigating per topic upper bound for session search evaluation. In Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval. ACM, 185--192.Google ScholarDigital Library
Amos Tversky and Daniel Kahneman. 1974. Judgment under uncertainty: Heuristics and biases. science, Vol. 185, 4157 (1974), 1124--1131.Google Scholar
Zhijing Wu, Yiqun Liu, Qianfan Zhang, Kailu Wu, Min Zhang, and Shaoping Ma. 2019. The influence of image search intents on user behavior and satisfaction. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. ACM, 645--653.Google ScholarDigital Library
Grace Hui Yang and Ian Soboroff. 2016. TREC 2016 Dynamic Domain Track Overview.. In TREC.Google Scholar
Yiming Yang and Abhimanyu Lad. 2009. Modeling expected utility of multi-session information distillation. In Conference on the Theory of Information Retrieval. Springer, 164--175.Google ScholarDigital Library
Emine Yilmaz, Milad Shokouhi, Nick Craswell, and Stephen Robertson. 2010. Expected browsing utility for web search evaluation. In Proceedings of the 19th ACM international conference on Information and knowledge management. ACM, 1561--1564.Google ScholarDigital Library
Fan Zhang, Yiqun Liu, Xin Li, Min Zhang, Yinghui Xu, and Shaoping Ma. 2017b. Evaluating web search with a bejeweled player model. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 425--434.Google ScholarDigital Library
Yinan Zhang, Xueqing Liu, and ChengXiang Zhai. 2017a. Information retrieval evaluation as search simulation: A general formal framework for ir evaluation. In Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval. ACM, 193--200.Google ScholarDigital Library
Yuye Zhang, Laurence AF Park, and Alistair Moffat. 2010. Click-based evidence for decaying weight distributions in search effectiveness metrics. Information Retrieval, Vol. 13, 1 (2010), 46--69.Google ScholarDigital Library

Index Terms

Cascade or Recency: Constructing Better Evaluation Metrics for Session Search
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results

Recommendations

Towards Designing Better Session Search Evaluation Metrics
SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval

User satisfaction has been paid much attention to in recent Web search evaluation studies and regarded as the ground truth for designing better evaluation metrics. However, most existing studies are focused on the relationship between satisfaction and ...
Read More
Investigating Cognitive Effects in Session-level Search User Satisfaction
KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

User satisfaction is an important variable in Web search evaluation studies and has received more and more attention in recent years. Many studies regard user satisfaction as the ground truth for designing better evaluation metrics. However, most of the ...
Read More
Grid-based Evaluation Metrics for Web Image Search
WWW '19: The World Wide Web Conference

Compared to general web search engines, web image search engines display results in a different way. In web image search, results are typically placed in a grid-based manner rather than a sequential result list. In this scenario, users can view results ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2020
2548 pages
ISBN:9781450380164
DOI:10.1145/3397271
General Chairs:
Jimmy Huang
York University, Canada
,
Yi Chang
Jilin University, China
,
Xueqi Cheng
Chinese Academy of Sciences, China
,
Program Chairs:
Jaap Kamps
University of Amsterdam, Netherlands
,
Vanessa Murdock
Amazon, U.S.A.
,
Ji-Rong Wen
Renmin University of China, China
,
Yiqun Liu
Tsinghua University, China
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 July 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
evaluation metrics
recency effect
session search
user behavior
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 13
  Total Citations
  View Citations
- 287
  Total Downloads
- Downloads (Last 12 months)25
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Cascade or Recency: Constructing Better Evaluation Metrics for Session Search

SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Towards Designing Better Session Search Evaluation Metrics

Investigating Cognitive Effects in Session-level Search User Satisfaction

Grid-based Evaluation Metrics for Web Image Search

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Cascade or Recency: Constructing Better Evaluation Metrics for Session Search

SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Towards Designing Better Session Search Evaluation Metrics

Investigating Cognitive Effects in Session-level Search User Satisfaction

Grid-based Evaluation Metrics for Web Image Search

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media