Abstract
Search behavior, and information seeking behavior more generally, is often motivated by tasks that prompt search processes that are often lengthy, iterative, and intermittent, and are characterized by distinct stages, shifting goals and multitasking. Current search systems do not provide adequate support for users tackling complex tasks due to which the cognitive burden of keeping track of such tasks is placed on the searcher. In this note, we summarize our recent efforts towards extracting search tasks from search logs. Based on recent advancements in Bayesian Nonparametrics and distributional semantics, we propose novel algorithms to extract task and subtasks from a query collection. The models discussed can inform the design of the next generation of task-based search systems that leverage user’s task behavior for better support and personalization.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
1 Introduction
Search behavior, and more generally, information-seeking behavior is often motivated by tasks that prompt search processes that are often lengthy, iterative, intermittent, and characterized by distinct stages, shifting goals and multitasking. Current search engines do not provide adequate support for tackling complex tasks (e.g. planning a trip, surveying a topic), due to which the cognitive burden of keeping track of such tasks and completing them is placed on the searcher. Ideally, a search engine should be able to decipher the underlying reason that led the user to submit a query (i.e., the actual task that caused the query to be issued), and be able to guide the user to achieve their task by incorporating this knowledge about the actual information need.
In this research, we hypothesize that developing a comprehensive understanding of user’s tasks would help in providing better support and recommendations to users based on their contextual information and as a result, help users accomplish the task. As part of the proposed research, we consider the challenge of extracting tasks from a given collection of search log data and present task extraction techniques which rely on recent advancements in bayesian non parametrics and word embeddings. We evaluate the performance of such techniques using a number of techniques based on crowdsourced judgments as well as labelled ground truth data.
2 Task Based Information Retrieval
Our efforts at developing task based retrieval systems have focussed around three major themes, (i) understanding searcher’s behaviors, (ii) developing task extraction techniques and (iii) showing the benefits of task information via improved personalization. We next describe each of them in detail.
2.1 Understanding Searcher’s Task Behavior
While a major share of prior work have considered search sessions as the focal unit of analysis for seeking behavioral insights [7–9], search tasks are emerging as a competing perspective in this space. In a recent work [1], we quantify multi-tasking behavior of web search users and show that over 50 % of search sessions have more than 2 tasks. Further, we provide a method to categorize users into focused, multi-taskers or supertaskers depending on their level of task-multiplicity and show that the search effort expended by these users varies across the groups. Additionally, in a follow up work [3] we relate user’s multitasking propensities to tasks and topics. Specifically, we analyze user-disposition, topic and user-interest level heterogeneities that are prevalent in search task behavior. We find that not only do users have varying propensities to multi-task, they also search for distinct topics across single-task and multi-task sessions. The findings from our analysis provide useful insights about task-multiplicity in an online search environment and hold potential value for search engines that wish to personalize and support search experiences of users based on their task behavior.
2.2 Extracting Hierarchies
An important first step in developing task based systems is task extraction. In a recently published work [4], we considered the challenge of extracting hierarchies of search tasks and their associated subtasks from a given search log given just the log data without the need of any manual annotation of any sort. We present an efficient Bayesian nonparametric model for discovering task hierarchies and propose a tree based bayesian hierarchical task construction algorithm to discover this rich hierarchical structure embedded within search logs. Our model organises the queries into a nested hierarchy T of tasks/subtasks, with all queries in one node at the root and singleton queries at the leaves. We interpret a tree (T) as a mixture of partitions over those group of queries (Q). We define the probability of a group of such queries as:
where \(p(\phi (T))\) is the mixing proportion of partition \(\phi (T)\), and \(p(Q|\phi (t))\) is the probability of the group of queries Q given a partitioning by \(\phi (T)\). In general the number of partitions consistent with T can be exponentially large. To make computations tractable, we define the mixture model in such a way that \(p(Q|\phi (t))\) can be computed using dynamic programming over T:
In the beginning, each query is regarded as a tree on its own. For each step, the algorithm selects two trees \(T_i\) and \(T_j\) and merges them into a new tree \(T_m\). Unlike binary hierarchical clustering, we allow three possible merging operations: (i) Join: \(T_m = \lbrace T_i, T_j\rbrace \), such that the tree \(T_m\) has two children now; (ii) Absorb: \(T_m = \lbrace children(T_i) \cup T_j\rbrace \), i.e., the children of one tree gets absorbed into the other tree forming an absorbed tree with >2 children; and (iii) Collapse: \(T_m = \lbrace children(T_i) \cup children(T_j)\rbrace \), all the children of both the sub-tree get combined together at the same level. Such a setting allows each task to be composed of an arbitrary number of sub-tasks without restricting tasks to contain only binary subtasks.
The tree is built in a bottom-up greedy agglomerative fashion, and the algorithm finishes when just one tree remains. At each iteration a pair of trees in the forest F is chosen to be merged by considering the pair and type of merger that yields the largest Bayes factor improvement over the current model. Further details of the work are available in our research paper [4].
2.3 Decomposing Complex Search Tasks
Quite often, search tasks (e.g. planing a trip) are complex and conceptually decompose into a set of sub-tasks (e.g. booking flights, finding places of interest etc.), each of which warrants the user to further issue multiple queries to solve. Given a collection of on-task queries (extracted using standard task extraction algorithm), we proposed a distance dependent Chinese Restaurant process model to extract these sub-tasks from a given collection of on-task queries.
In our sub-task extraction problem, each task is associated with a dd-CRP and its tables are embellished with IID draws from a base distribution over mixture component parameters. Let \(z_i\) denote the ith query assignment, the index of the query with whom the ith query is linked. Let \(d_{ij}\) denote the distance measurement between queries i and j, let D denote the set of all distance measurements between queries, and let f be a decay function. The distance dependent CRP independently draws the query assignments to sub-tasks conditioned on the distance measurements,
Here, \(d_{ij}\) is an externally specified distance between queries i and j, and \(\alpha \) determines the probability that a customer links to themselves rather than another customer. Given a decay function f, distances between queries D, scaling parameter \(\alpha \), and an exchangeable Dirichlet distribution with parameter \(\lambda \), N M-word queries are drawn as follows,
-
1.
For \(i \in [1, N]\), draw \(z_i \sim dist-CRP(\alpha , f, D)\).
-
2.
For \(i \in [1, N]\),
-
(a)
If \(z_i \notin R^{*}_{q_{1:N}}\), set the parameter for the ith query to \(\theta _i = \theta _{q_i}\). Otherwise draw the parameter from the base distribution, \(\theta _i \sim Dirichlet(\lambda )\).
-
(b)
Draw the ith query terms, \(w_i \sim Mult(M, \theta _i)\).
-
(a)
Further details of the work are available in our research paper [2].
2.4 Task Based Personalization
In order to demonstrate the usefulness of a task based system, in recent work [5, 6] we presented a novel approach to couple user’s topical interest information with their search task information & their term usage behavior to learn a joint user representation technique. We demonstrated that coupling user’s task information with their topical interests indeed helps us build better user models. We show through extensive experimentation that our task based method outperforms existing query term based and topical interest based user representation methods. By evaluating the quality of our approach on a variety of tasks for personalisation including collaborative query recommendation, cluster based recommendation and user cohort analysis, we demonstrate that the proposed methods result in better user profiles.
3 Conclusion
In this note, we offered insights about the shift in focus from sessions to tasks and presented a brief summary of our recent work aimed at extracting tasks from search logs. We believe that the task-based personalization and recommendation has the potential to shape the future of user interaction systems for the upcoming era of intelligent Web, and there is much to be done on this emerging topic. Some of the key problems to investigate in the future include using task based systems for improved recommendations and better predicting contextual needs of users for proactive recommendations.
References
Mehrotra, R., Bhattacharya, P., Yilmaz, E.: Characterizing users’ multi-tasking behavior in web search. In: Proceedings of the ACM on Conference on Human Information Interaction and Retrieval (2016)
Mehrotra, R., Bhattacharya, P., Yilmaz, E.: Deconstructing complex search tasks: a bayesian nonparametric approach for extracting sub-tasks. In: Proceedings of NAACL-HLT, pp. 599–605 (2016)
Mehrotra, R., Bhattacharya, P., Yilmaz, E.: Sessions; tasks & topics - uncovering behavioral heterogeneities in online search behavior. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM (2016)
Mehrotra, R., Yilmaz, E.: Towards hierarchies of search tasks & subtasks. In: WWW (2015)
Mehrotra, R., Yilmaz, E.: Terms, topics & tasks: enhanced user modelling for better personalization. In: Proceedings of the International Conference on the Theory of Information Retrieval, pp. 131–140. ACM (2015)
Mehrotra, R., Yilmaz, E., Verma, M.: Task-based user modelling for personalization via probabilistic matrix factorization. In: RecSys Posters (2014)
Odijk, D., White, R.W., Hassan Awadallah, A., Dumais, S.T.: Struggling and success in web search. In: CIKM (2015)
White, R.W., Bennett, P.N., Dumais, S.T.: Predicting short-term interests using activity-based search context. In: CIKM (2010)
Xiang, B., Jiang, D., Pei, J., Sun, X., Chen, E., Li, H.: Context-aware ranking in web search. In: SIGIR (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Mehrotra, R., Yilmaz, E. (2016). Query Log Mining for Inferring User Tasks and Needs. In: Berendt, B., et al. Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2016. Lecture Notes in Computer Science(), vol 9853. Springer, Cham. https://doi.org/10.1007/978-3-319-46131-1_36
Download citation
DOI: https://doi.org/10.1007/978-3-319-46131-1_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46130-4
Online ISBN: 978-3-319-46131-1
eBook Packages: Computer ScienceComputer Science (R0)