Implicit user behaviours to improve post-retrieval document relevancy

doi:10.1016/j.chb.2014.01.001

Computers in Human Behavior

Volume 33, April 2014, Pages 104-112

https://doi.org/10.1016/j.chb.2014.01.001 Get rights and content

Highlights

•
The study improved document relevancy based on users’ implicit behaviours.
•
Dwell time, click-through, text selection and page review data were collected.
•
Feedbacks integrated and re-ranked using a logarithmic function.
•
Integrated techniques significantly better than baseline and single feedback.

Abstract

The collection of user feedback as indications of users’ interests resulted in a growing interest in improving users’ search experiences. In this article, we describe a method that integrates multiple implicit feedback approaches to unobtrusively monitor users’ interactions to improve document search results relevancy. The study gathered users’ feedback based on the dwell time, click-through data, page review, and also text selection. An experiment was conducted to assess the performance of the proposed integrated model. Collected data were analysed and compared at three ranking levels, that is, top 10, 15 and 25. Both the mean average precisions and normalised discounted cumulative gain values indicate the integrated model to significantly outperform the baseline (TF-IDF) at each of the varying levels. Moreover, a comparison across all the models also show the integrated model to have the best search performance further indicating that merging multiple feedback techniques improves the overall document relevancy. Results also show page review and text selection have the lowest and highest precisions, respectively among all the four implicit feedback models, however the differences are insignificant. Overall it can be concluded that integrated implicit feedback significantly improves post-retrieval document relevancy compared to stand-alone feedback, and also when no feedback is available.

Introduction

With ever growing information over the Web, finding high quality relevant information within the large collection of texts is a challenging issue. The tremendous growth of both information and usage has introduced an information overload problem in which users are finding it increasingly difficult to locate the right information at the right time. This phenomenon resulted in the widespread research on information retrieval – the science of searching for information from a large set of documents collection. Information retrieval deals with the storage, representation, organisation and access to the information items (Baeza-Yates & Ribeiro-Neto, 1999). A typical example of information retrieval is the web search engine such as Google.

It is a challenge to build effective mechanisms to improve search performances, and studies have explored various techniques including using users’ feedback. Explicit feedback requires users to explicitly give feedback stating their preferences, for example, by specifying keywords, commenting, answering questions, rating or ranking, among others. This approach requires the users to engage in additional activities beyond their normal searching behaviours, and thus resulting in higher user cost (time and effort). Therefore, implicit feedback which estimates users’ feedback based on their interactions and behaviours, such as dwell time (i.e. the amount of time spent on a page), scrolling, printing and bookmarking are more appealing. User feedback have been shown to improve search results relevancy and many of these studies have combined multiple feedback approaches (i.e. more than one implicit or explicit feedback) such as Fox et al., 2005, Claypool et al., 2001, Morita and Shinoda, 1994, Joachims et al., 2007, Liu et al., 2011, White et al., 2010, Buscher et al., 2012, Guo and Agichtein, 2012, or combined both implicit and explicit approaches into a hybrid model (Liu et al., 2010, Núñez-Valdéz et al., 2012, Park, 2013, Zhu et al., 2012).

The focus of our study is to specifically explore how multiple implicit feedback can be integrated to improve post-retrieval document relevancy. Though many works have been done in combining implicit feedback, the current study differs in the sense that we propose a re-ranking algorithm which takes into account two common techniques (i.e. dwell time and click-through) and two other techniques, namely page review (i.e. returning to the same document) and text selection. Text selection and page review, in particular are deemed to be interesting additions as they measure a user’s post-click behaviour. Additionally, the study also intends to compare the performance of each of the feedback techniques separately with the integrated model. Therefore, the main research question of the current study is “Does integrating dwell time, click-through, page review and text selection improve the search performance for a query?” To evaluate the accuracy predictions of our proposed method, an experiment was conducted using a self-developed prototype search engine. Implicit feedback were gathered while users interact with the list of ranked results on the search engines result page (SERP). These data were then fed into the re-ranking algorithm so as to improve the document relevancy. The overall evaluation results were then analysed and compared with the baseline algorithm (TF-IDF). In addition, the results were also compared among the various implicit feedback models. All the comparisons were made based on the top 10, 15 and 25 documents.

The structure of the remaining paper is as follows – the description of implicit feedback and some of the notable works in this area are given in the following section. Then, the research methodology is presented with explanations on the re-ranking algorithm, evaluation metrics, experimental setup, etc. This is followed by the results and discussion. The paper is finally concluded by a Section 5.

Section snippets

Implicit feedback

Implicit feedback unobtrusively obtains information about users’ behaviour by watching their interactions with the systems. Common techniques used to gather implicit feedback include dwell time, saving, scrolling, bookmarking, printing and click-through, among others. Compared to explicit feedback, inferences drawn from implicit feedback are considered to be less reliable, however large quantities of data can be gathered implicitly without incurring any additional activities by the users (Jung,

Research methodology

Fig. 1 illustrates the proposed method used in improving the document relevance in this study. Assuming a user performs a new search via a query, a SERP is returned to the user based on the classical information retrieval algorithm, TF-IDF (further details in Salton & Buckley, 1988). As the user interacts with the results displayed, his/her behaviours are captured implicitly (i.e. dwell time, click-through, page review and text selection) and then fed into the re-ranking algorithm which sorts

Results and discussion

The models were compared three-ways: (i) TF-IDF_Integrated versus TF-IDF, (ii) each of the four implicit feedback models with TF-IDF, and (iii) all the five models with implicit feedback.

Conclusion

In this paper we proposed an integrated implicit feedback model to improve the post-retrieval document relevancy. The techniques combined were dwell time, click-through, page review and text selection. Our technique first ranks and presents a list of results for a query based on the classical TF-IDF algorithm. As the user interacts with the system, his/her pattern of interaction (i.e. implicit feedback) was captured. A re-ranking algorithm then incorporates the captured interactions to re-rank

Acknowledgements

The authors wish to thank University of Malaya (RG103-12ICT) for supporting this study. Gratitude also goes to all the students who participated in the user testing.

References (32)

N.L. Beebe et al.
Post-retrieval search hit clustering to improve information retrieval effectiveness: Two digital forensics case studies
Decision Support Systems
(2011)
S. Jung et al.
Click data as implicit relevance feedback in web search
Information Processing & Management
(2007)
Y. Liu et al.
How do users describe their information need: Query recommendation based on snippet click model
Expert Systems with Applications
(2011)
E.R. Núñez-Valdéz et al.
Implicit feedback techniques on recommender systems applied to electronic books
Computers in Human Behavior
(2012)
Y.-J. Park
An adaptive match-making system reflecting the explicit and implicit preferences of users
Expert Systems with Applications
(2013)
G. Salton et al.
Term weighting approaches in automatic text retrieval
Information Processing and Management
(1988)
L. Tauscher et al.
How people revisit web pages: Empirical findings and implications for the design of history systems
International Journal of Human-Computer Studies – Special issue: World Wide Web Usability
(1997)
Y. Zhu et al.
User interest modeling and self-adaptive update using relevance feedback technology
Procedia Engineering
(2012)
Agichtein, E., Brill, E., & Dumais, S. (2006). Improving web search ranking by incorporating user behaviour...
Ahn, J., Brusilovsky, P., He, D., Grady, J., & Li, Q. (2008). Personalized web exploration with task models. In...

R. Baeza-Yates et al.

Modern information retrieval

(1999)

Z.A.M. Bidoki et al.

A3CRank: An adaptive ranking method based on connectivity, content and click-through data

Information Processing & Management

(2010)

Buscher, G., White, R. W., Dumais, S., & Huang, J. (2012). Large-scale analysis of individual and task differences in...

M. Claypool et al.

Inferring user interest

IEEE Internet Computing

(2001)

Dou, Z., Song, R., Yuan, X., & Wen, J. (2008). Are click-through data adequate for learning web search rankings? In...

S. Fox et al.

Evaluating implicit measures to improve web search

ACM Transactions on Information Systems

(2005)

Cited by (11)

Fuzzy rule based profiling approach for enterprise information seeking and retrieval
2017, Information Sciences
Citation Excerpt :
This was proven by comparing observational studies with explicit interest measures [37,50,59,51]. Current research shows that the combination of several relevance feedback parameters can produce better results [37,79,15,10,23]. It was found that reading time, along with other user behaviour can be a very reliable indicator of content relevancy.
With the exponential growth of information available on the Internet and various organisational intranets there is a need for profile based information seeking and retrieval (IS&R) systems. These systems should be able to support users with their context-aware information needs. This paper presents a new approach for enterprise IS&R systems using fuzzy logic to develop task, user and document profiles to model user information seeking behaviour. Relevance feedback was captured from real users engaged in IS&R tasks. The feedback was used to develop a linear regression model for predicting document relevancy based on implicit relevance indicators. Fuzzy relevance profiles were created using Term Frequency and Inverse Document Frequency (TF-IDF) analysis for the successful user queries. Fuzzy rule based summarisation was used to integrate the three profiles into a unified index reflecting the semantic weight of the query terms related to the task, user and document. The unified index was used to select the most relevant documents and experts related to the query topic. The overall performance of the system was evaluated based on standard precision and recall metrics which show significant improvements in retrieving relevant documents in response to user queries.
Comparative analysis of relevance feedback methods based on two user studies
2016, Computers in Human Behavior
Citation Excerpt :
Similar success was reported by Shapira, Taieb-Maimon and Moskowitz (Shapira, Taieb-Maimon, & Moskowitz, 2006). Balakrishnan and Zhang (Balakrishnan & Zhang, 2014) examined the effect of some implicit indicators on post-retrieval document relevancy. They found that a combination of text selection, dwell time, click-through and page review post-click behaviour can improve the precision of relevance feedback.
Rigorous analysis of user interest in web documents is essential for the development of recommender systems. This paper investigates the relationship between the implicit parameters and user explicit rating during their search and reading tasks. The objective of this paper is therefore three-fold: firstly, the paper identifies the implicit parameters which are statistically correlated with the user explicit rating through user study 1. These parameters are used to develop a predictive model which can be used to represent users’ perceived relevance of documents. Secondly, it investigates the reliability and validity of the predictive model by comparing it with eye gaze during a reading task through user study 2. Our findings suggest that there is no significant difference between the predictive model based on implicit indicators and eye gaze within the context examined. Thirdly, we measured the consistency of user explicit rating in both studies and found significant consistency in user explicit rating of document relevance and interest level which further validates the predictive model. We envisage that the results presented in this paper can help to develop recommender and personalised systems for recommending documents to users based on their previous interaction with the system.
How does the first buggy file work well for iterative IR-based bug localization?
2022, Proceedings of the ACM Symposium on Applied Computing
Gender differentials and implicit feedback on online video content: enhancing user interest evaluation
2019, Industrial Management and Data Systems
Context-aware adaptive m-learning: Implicit indicators of learning performance, perceived usefulness, and willingness to use
2019, Computers in Education Journal
A novel hybrid knowledge retrieval approach for online customer service platforms
2018, 26th European Conference on Information Systems: Beyond Digitization - Facets of Socio-Technical Change, ECIS 2018

View all citing articles on Scopus

View full text

Implicit user behaviours to improve post-retrieval document relevancy

Highlights

Abstract

Introduction

Section snippets

Implicit feedback

Research methodology

Results and discussion

Conclusion

Acknowledgements

Decision Support Systems

Information Processing & Management

Expert Systems with Applications

Computers in Human Behavior

Expert Systems with Applications

Information Processing and Management

International Journal of Human-Computer Studies – Special issue: World Wide Web Usability

Procedia Engineering

Modern information retrieval

A3CRank: An adaptive ranking method based on connectivity, content and click-through data

Information Processing & Management

Inferring user interest

IEEE Internet Computing

Evaluating implicit measures to improve web search

ACM Transactions on Information Systems