Determining the informational, navigational, and transactional intent of Web queries

https://doi.org/10.1016/j.ipm.2007.07.015Get rights and content

Abstract

In this paper, we define and present a comprehensive classification of user intent for Web searching. The classification consists of three hierarchical levels of informational, navigational, and transactional intent. After deriving attributes of each, we then developed a software application that automatically classified queries using a Web search engine log of over a million and a half queries submitted by several hundred thousand users. Our findings show that more than 80% of Web queries are informational in nature, with about 10% each being navigational and transactional. In order to validate the accuracy of our algorithm, we manually coded 400 queries and compared the results from this manual classification to the results determined by the automated method. This comparison showed that the automatic classification has an accuracy of 74%. Of the remaining 25% of the queries, the user intent is vague or multi-faceted, pointing to the need for probabilistic classification. We discuss how search engines can use knowledge of user intent to provide more targeted and relevant results in Web searching.

Introduction

The World Wide Web (Web) has become an indispensable tool in the daily lives of many people, and search engines provide critical access to Web resources. With nearly 70% of Web searchers using a search engine as their point of entry, the major search engines receive millions of queries per day and present billions of results per week in response to these queries (Sullivan, 2006). Search engines are ‘the tool’ that many people use on a daily basis for accessing the information, Internet sites, services, and other resources on the Web. Although popular, how are people using Web search engines to accomplish their intended goal? How can we determine what it is that these people are actually seeking? What task, need, or goal are these people trying to address with their Web searching?

Belkin (1993) states that one can classify searching episodes in terms of (1) goal of the interaction, (2) method of interaction, (3) mode of retrieval and (4) type of resource interacted with during the search. Web searching certainly possesses these aspects, so Web searching has continuity with earlier searching interactions, such as library systems. However, Web searching differs in three respects (i.e., context, scale, and variety), making it a unique domain of study. The first difference is that the direct availability of content accessible on the Web is nearly ubiquitous. Web search engines provide access to textual and multimedia content in a wide variety of settings including both home and work, as well as in mobile situations. Second, there is the number of searchers attempting to access this content via Web search engines. The scale of topics submitted by these users is surely unparalleled in pre-Web end user searching. Third, the variety of content, users, and systems is certainly unique. This combined diversity on the Web in both content and users is extreme.

In response to this diversity, Web search engines service a variety of purposes for users. In addition to satisfying information problems, modern Web search engines are navigational tools to take users to specific uniform resource locators (URLs) or to aid in browsing. People use search engines as applications to conduct e-commerce transactions, such as with sponsored search or Google’s payment system. Search engines provide access to content collections of images, songs, and videos rather than directly addressing an information need with a specific object. Search engines provide access to transactional services such as maps, online auctions, driving directions, or even other search engines. Search engines perform social networking functions, as with Yahoo! Answers. Web search engines are spell checkers, thesauruses, and dictionaries. They are games, such as Google Whacking or vanity searching. Modern Web search engines are adding an increasing diverse range of features. Providers are placing more and highly varied content and services on the Web. In response, people are employing search engines in new, novel, and increasing diverse ways.

It is this cornucopia of alternatives where Web search engines differ most from classic information search and pre-Web retrieval systems. Referring back to facets outlined by Belkin, the method of interaction has remained the same (i.e., enter query, retrieve results, scan results, view results, refine query as needed). The mode of retrieval is similar, albeit within a hypermedia environment (Marchionini, 1995). In terms of goals and type of resources, however, the changes are dramatic. In fact, the facets of goals and range of resources are classic examples of the long tail effect of the Web. Namely, the Web has extended significantly both the range of search goals for people and the range of resources available (Anderson, 2006), and these resources need not be informational. We refer to the type of resource desired in the user’s expression to the system as user intent. Within this great diversity, Web search engines can better assist people in finding the resources they are looking for by more clearly identifying the intent behind the query.

In this research, we developed a methodology to classify user intent in Web searching. We categorized user searches based on intent in terms of the type of content specified by the query and other user expressions, and we operationalized these classifications with defining characteristics. We implemented these catagories in a program that automatically classified Web search engine queries. We discuss how one can use this approach to improve Web search engine performance by provide more results in line with searchers’ underlying intent.

The next section presents related research concerning modeling Web queries.

Section snippets

Related studies

Research aimed at discovering the intent of Web searchers is a growing field of Web focus. Determining the underlying intent of user searches has the potential to drastically improve system performance of Web search engine (Gisbergen, Most, & Aelen, 2007), with impact in the areas of information retrieval, data mining, and e-commerce. User intent research falls into three sub-areas, which are: (1) empirical studies and surveys of search engine use, (2) manual analysis of search engine

Research objectives

The research objectives are described below:

  • 1.

    Develop a comprehensive classification of Web searching user intent.

    For research objective one, we analysed prior work in the area along with an analysis of numerous actual Web searching transaction logs in order to develop a detail categorization of Web searching based on user intent. Given the plethora of categories and classifications, it is difficult to compare results across studies and research experiments. Such a comparison is vitally needed in

Classification of Web searching

For research objective one, we performed a comprehensive review of prior work in the area of user intent in Web searching. We cross correlated reported results from these studies to align user intent classes that were similar but variously labeled. We also supplemented this literature review by using results from our own data analysis. From this review and analysis, we derived a comprehensive categorization of Web searching intent and correlated this categorization with prior published works.

Research objective one

For research objective one (Develop a comprehensive classification of Web searching user intent), we present in Table 2 a three-level hierarchical taxonomy, with the top most level being informational, navigational, and transactional. Each of these level one categories has multiple level two classifications. Some classifications also can involve a third level classification.

Below this developed taxonomy, Table 2 presents user intent studies and their best-fit classification across studies. The

Discussion and implications

In this study, we employed a three-level classification of Web searching that is useful in identifying the intent of the searcher. This model is based on our own analysis and on prior published work, most notably that of Broder, 2002, Rose and Levinson, 2004. However, Broder (2002) did not present a description of the process and metrics used to classify the queries. Similarly, Rose and Levinson (2004) also did not elaborate on the details of their classifications. In our work, we have

Conclusion and further research

In order for Web search engines to continue to improve, they must leverage an increased knowledge of user behavior in order to identify the underlying intent of searchers. In this research, we highlighted characteristics of Web queries based on user intent. These characteristics were derived from an examination of Web queries from multiple search engine transaction logs. We have also demonstrated an automated method that can successfully classify Web queries based on user intent. Web search

Acknowledgements

We would like to thank Excite, AlltheWeb.com, AltaVista, and especially Infospace.com for providing the data for this analysis, without which we could not have conducted this research. We encourage other search engine companies to engage members of academic community in Web searching research. The Air Force Office of Scientific Research (AFOSR) and the National Science Foundation (NSF) funded portions of this research.

References (53)

  • Belkin, N., Cool, C., Kelly, D., Lee, H.-J., Muresan, G., Tang, M.-C., et al. (2003). In Query length in interactive...
  • D. Bodoff

    Relevance for browsing, relevance for searching

    Journal of the American Society of Information Science and Technology

    (2004)
  • A. Broder

    A taxonomy of Web search

    SIGIR Forum

    (2002)
  • Byrne, M., John, B., Wehrle, N., & Crow, D. (1999). In The tangled Web we wove: A taskonomy of WWW use (pp. 544–551)....
  • Carmel, E., Crawford, S., & Chen, H. (1992). In Browsing in hypertext: A cognitive study (pp. 865–884). Paper presented...
  • Chi, E. H., Pirolli, P., Chen, K., & Pitkow, J. (2001). In Using information scent to model user information needs and...
  • Choo, C., & Turnbull, D. (2000). Information seeking on the web: An integrated model of browsing and searching. First...
  • Choo, C., Betlor, B., & Turnbull, D. (1998). In A behavioral model of information seeking on the Web: Preliminary...
  • W.B. Croft et al.

    I3: A new approach to the design of document retrieval systems

    Journal of the American Society for Information Science

    (1987)
  • Cronen-Townsend, S., Zhou, Y., & Croft, W. B. (2002). In Predicting query performance (pp. 299–306). Paper presented at...
  • Dai, H. K., Nie, Z., Wang, L., Zhao, L., Wen, J. -R., & Li, Y. (2006). In Detecting online commercial intention (OCI)...
  • E.N. Efthimiadis

    Interactive query expansion: A user-based evaluation in a relevance feedback environment

    Journal of the American Society of Information Science and Technology

    (2000)
  • M.S.V. Gisbergen et al.

    Visual attention to online search engine results

    (2007)
  • P. Ingwersen

    Cognitive perspectives of information retrieval interaction: Elements of a cognitive IR theory

    Journal of Documentation

    (1996)
  • B.J. Jansen

    Using temporal patterns of interactions to design effective automated searching assistance systems

    Communications of the ACM

    (2006)
  • B.J. Jansen et al.

    Evaluating the effectiveness of and patterns of interactions with automated searching assistance

    Journal of the American Society for Information Science and Technology

    (2005)
  • Cited by (0)

    View full text