tutorial

Machine learning for query-document matching in search

Authors:
Hang Li

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

,
Jun Xu

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

WSDM '12: Proceedings of the fifth ACM international conference on Web search and data miningFebruary 2012Pages 767–768https://doi.org/10.1145/2124295.2124393

Published:08 February 2012Publication History

WSDM '12: Proceedings of the fifth ACM international conference on Web search and data mining

Pages 767–768

ABSTRACT

In web search, relevance is one of the most important factors to meet users' satisfaction, and the success of a web search engine heavily depends on its performance on relevance. It has been observed that many hard cases in search relevance are due to term mismatch between query and documnt (e.g., query 'ny times' does not match well with document only containing 'new york times'), and thus it is not exaggerated to say that dealing with mismatch between query and document is one of the most critical research problems in web search. Recently researchers have spent significant effort to address the grand challenge. The major approach is to conduct more query and document understanding, and perform better matching between enriched query and document representations. With the availability of large amount of log data and advanced machine learning techniques, this becomes more feasible and significant progress has been made recently.

In this tutorial, we will give a systematic and detailed presentation on newly developed machine learning technologies for query document matching in search. We will focus on the fundamental problems, as well as the novel solutions for query document matching at word form level, word sense level, topic level, and structure level. We will talk about novel technologies about query spelling error correction [3, 13], query rewriting [1, 4, 6, 7], query classification [2], topic modeling of documents [5, 9], query document matching [8, 10, 11, 12], and query document-title translation. The ideas and solutions introduced in this tutorial may motivate industrial practitioners to turn the research fruits into product reality. The summary of the state-of-the-art methods and the discussions on the technical issues in this tutorial may stimulate academic researchers to find new research directions and solutions.

Matching between query and document is not limited to search, and similar problems can be observed at online advertisement, recommendation system, and other applications, as matching between objects from two spaces. The technologies we introduce can be generalized into more general machine learning techniques, which we call learning to match.

References

M. Bendersky, D. Metzler, and W. B. Croft. Parameterized concept weighting in verbose queries. In SIGIR, pp. 605--614, 2011. Google ScholarDigital Library
P. N. Bennett, K. Svore, and S. T. Dumais. Classification-enhanced ranking. In WWW, pp. 111--120, 2010. Google ScholarDigital Library
E. Brill and R. C. Moore. An improved error model for noisy channel spelling correction. In ACL, 2000. Google ScholarDigital Library
A. Broder, P. Ciccolo, E. Gabrilovich, V. Josifovski, D. Metzler, L. Riedel, and J. Yuan. Online expansion of rare queries for sponsored search. In WWW, pp. 511--520, 2009. Google ScholarDigital Library
J. Gao, K. Toutanova, and W.-t. Yih. Clickthrough-based latent semantic models for web search. In SIGIR, pp. 675--684, 2011. Google ScholarDigital Library
J. Guo, G. Xu, X. Cheng, and H. Li. Named entity recognition in query. In SIGIR, pp. 267--274, 2009. Google ScholarDigital Library
D. Sheldon, M. Shokouhi, M. Szummer, and N. Craswell. Lambdamerge: merging the results of query reformulations. In WSDM, pp. 795--804, 2011. Google ScholarDigital Library
C. Wang, R. Raina, D. Fong, D. Zhou, J. Han, and G. Badros. Learning relevance from heterogeneous social network and its application in online targeting. In SIGIR, pp. 655--664, 2011. Google ScholarDigital Library
Q. Wang, J. Xu, H. Li, and N. Craswell. Regularized latent semantic indexing. In SIGIR'11, pp. 685--694. Google ScholarDigital Library
Z. Wang, G. Xu, H. Li, and M. Zhang. A fast and accurate method for approximate string search. In ACL-HLT, pp. 52--61, 2011. Google ScholarDigital Library
W. Wu, H. Li, J. Xu, and S. Oyama. Learning a robust relevance model for search using kernel methods. JMLR, pp. 1429--1458, 2011. Google ScholarDigital Library
J. Xu, W. Wu, H. Li, and G. Xu. A kernel approach to addressing term mismatch. In WWW, pp. 153--154, 2011. Google ScholarDigital Library
J. Xu and G. Xu. Learning similarity function for rare queries. In WSDM, pp. 615--624, 2011. Google ScholarDigital Library

Index Terms

Machine learning for query-document matching in search
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing
    2. Retrieval models and ranking

Recommendations

Identifying popular search goals behind search queries to improve web search ranking
AIRS'11: Proceedings of the 7th Asia conference on Information Retrieval Technology

Web users usually have a certain search goal before they submit a search query. However, many laypersons can't transform their search goals into suitable queries. Thus, understanding original search goals behind a query is very important for search ...
Read More
Implementing and evaluating phrasal query suggestions for proximity search

This paper describes and evaluates a unified approach to phrasal query suggestions in the context of a high-precision search engine. The search engine performs ranked extended-Boolean searches with the proximity operator near being the default ...
Read More
Learning Query and Document Relevance from a Web-scale Click Graph
SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval

Click-through logs over query-document pairs provide rich and valuable information for multiple tasks in information retrieval. This paper proposes a vector propagation algorithm on the click graph to learn vector representations for both queries and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WSDM '12: Proceedings of the fifth ACM international conference on Web search and data mining
February 2012
792 pages
ISBN:9781450307475
DOI:10.1145/2124295
General Chairs:
Eytan Adar
University of Michigan, USA
,
Jaime Teevan
Microsoft Research, USA
,
Program Chairs:
Eugene Agichtein
Emory University, USA
,
Yoelle Maarek
Yahoo! Research, Israel
Copyright © 2012 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 February 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
machine learning
query-document matching
web search
Qualifiers
- tutorial
Conference

Acceptance Rates
Overall Acceptance Rate498of2,863submissions,17%
Upcoming Conference
WSDM '25

Sponsor:

sigir

sigir

sigir

sigir

The Eighteenth ACM International Conference on Web Search and Data Mining

April 7 - 11, 2025

Hannover , Germany
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 393
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Machine learning for query-document matching in search

WSDM '12: Proceedings of the fifth ACM international conference on Web search and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Identifying popular search goals behind search queries to improve web search ranking

Implementing and evaluating phrasal query suggestions for proximity search

Learning Query and Document Relevance from a Web-scale Click Graph