ABSTRACT
In web search, relevance is one of the most important factors to meet users' satisfaction, and the success of a web search engine heavily depends on its performance on relevance. It has been observed that many hard cases in search relevance are due to term mismatch between query and documnt (e.g., query 'ny times' does not match well with document only containing 'new york times'), and thus it is not exaggerated to say that dealing with mismatch between query and document is one of the most critical research problems in web search. Recently researchers have spent significant effort to address the grand challenge. The major approach is to conduct more query and document understanding, and perform better matching between enriched query and document representations. With the availability of large amount of log data and advanced machine learning techniques, this becomes more feasible and significant progress has been made recently.
In this tutorial, we will give a systematic and detailed presentation on newly developed machine learning technologies for query document matching in search. We will focus on the fundamental problems, as well as the novel solutions for query document matching at word form level, word sense level, topic level, and structure level. We will talk about novel technologies about query spelling error correction [3, 13], query rewriting [1, 4, 6, 7], query classification [2], topic modeling of documents [5, 9], query document matching [8, 10, 11, 12], and query document-title translation. The ideas and solutions introduced in this tutorial may motivate industrial practitioners to turn the research fruits into product reality. The summary of the state-of-the-art methods and the discussions on the technical issues in this tutorial may stimulate academic researchers to find new research directions and solutions.
Matching between query and document is not limited to search, and similar problems can be observed at online advertisement, recommendation system, and other applications, as matching between objects from two spaces. The technologies we introduce can be generalized into more general machine learning techniques, which we call learning to match.
- M. Bendersky, D. Metzler, and W. B. Croft. Parameterized concept weighting in verbose queries. In SIGIR, pp. 605--614, 2011. Google ScholarDigital Library
- P. N. Bennett, K. Svore, and S. T. Dumais. Classification-enhanced ranking. In WWW, pp. 111--120, 2010. Google ScholarDigital Library
- E. Brill and R. C. Moore. An improved error model for noisy channel spelling correction. In ACL, 2000. Google ScholarDigital Library
- A. Broder, P. Ciccolo, E. Gabrilovich, V. Josifovski, D. Metzler, L. Riedel, and J. Yuan. Online expansion of rare queries for sponsored search. In WWW, pp. 511--520, 2009. Google ScholarDigital Library
- J. Gao, K. Toutanova, and W.-t. Yih. Clickthrough-based latent semantic models for web search. In SIGIR, pp. 675--684, 2011. Google ScholarDigital Library
- J. Guo, G. Xu, X. Cheng, and H. Li. Named entity recognition in query. In SIGIR, pp. 267--274, 2009. Google ScholarDigital Library
- D. Sheldon, M. Shokouhi, M. Szummer, and N. Craswell. Lambdamerge: merging the results of query reformulations. In WSDM, pp. 795--804, 2011. Google ScholarDigital Library
- C. Wang, R. Raina, D. Fong, D. Zhou, J. Han, and G. Badros. Learning relevance from heterogeneous social network and its application in online targeting. In SIGIR, pp. 655--664, 2011. Google ScholarDigital Library
- Q. Wang, J. Xu, H. Li, and N. Craswell. Regularized latent semantic indexing. In SIGIR'11, pp. 685--694. Google ScholarDigital Library
- Z. Wang, G. Xu, H. Li, and M. Zhang. A fast and accurate method for approximate string search. In ACL-HLT, pp. 52--61, 2011. Google ScholarDigital Library
- W. Wu, H. Li, J. Xu, and S. Oyama. Learning a robust relevance model for search using kernel methods. JMLR, pp. 1429--1458, 2011. Google ScholarDigital Library
- J. Xu, W. Wu, H. Li, and G. Xu. A kernel approach to addressing term mismatch. In WWW, pp. 153--154, 2011. Google ScholarDigital Library
- J. Xu and G. Xu. Learning similarity function for rare queries. In WSDM, pp. 615--624, 2011. Google ScholarDigital Library
Index Terms
- Machine learning for query-document matching in search
Recommendations
Identifying popular search goals behind search queries to improve web search ranking
AIRS'11: Proceedings of the 7th Asia conference on Information Retrieval TechnologyWeb users usually have a certain search goal before they submit a search query. However, many laypersons can't transform their search goals into suitable queries. Thus, understanding original search goals behind a query is very important for search ...
Implementing and evaluating phrasal query suggestions for proximity search
This paper describes and evaluates a unified approach to phrasal query suggestions in the context of a high-precision search engine. The search engine performs ranked extended-Boolean searches with the proximity operator near being the default ...
Learning Query and Document Relevance from a Web-scale Click Graph
SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information RetrievalClick-through logs over query-document pairs provide rich and valuable information for multiple tasks in information retrieval. This paper proposes a vector propagation algorithm on the click graph to learn vector representations for both queries and ...
Comments