skip to main content
10.1145/2124295.2124393acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
tutorial

Machine learning for query-document matching in search

Published:08 February 2012Publication History

ABSTRACT

In web search, relevance is one of the most important factors to meet users' satisfaction, and the success of a web search engine heavily depends on its performance on relevance. It has been observed that many hard cases in search relevance are due to term mismatch between query and documnt (e.g., query 'ny times' does not match well with document only containing 'new york times'), and thus it is not exaggerated to say that dealing with mismatch between query and document is one of the most critical research problems in web search. Recently researchers have spent significant effort to address the grand challenge. The major approach is to conduct more query and document understanding, and perform better matching between enriched query and document representations. With the availability of large amount of log data and advanced machine learning techniques, this becomes more feasible and significant progress has been made recently.

In this tutorial, we will give a systematic and detailed presentation on newly developed machine learning technologies for query document matching in search. We will focus on the fundamental problems, as well as the novel solutions for query document matching at word form level, word sense level, topic level, and structure level. We will talk about novel technologies about query spelling error correction [3, 13], query rewriting [1, 4, 6, 7], query classification [2], topic modeling of documents [5, 9], query document matching [8, 10, 11, 12], and query document-title translation. The ideas and solutions introduced in this tutorial may motivate industrial practitioners to turn the research fruits into product reality. The summary of the state-of-the-art methods and the discussions on the technical issues in this tutorial may stimulate academic researchers to find new research directions and solutions.

Matching between query and document is not limited to search, and similar problems can be observed at online advertisement, recommendation system, and other applications, as matching between objects from two spaces. The technologies we introduce can be generalized into more general machine learning techniques, which we call learning to match.

References

  1. M. Bendersky, D. Metzler, and W. B. Croft. Parameterized concept weighting in verbose queries. In SIGIR, pp. 605--614, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. P. N. Bennett, K. Svore, and S. T. Dumais. Classification-enhanced ranking. In WWW, pp. 111--120, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. E. Brill and R. C. Moore. An improved error model for noisy channel spelling correction. In ACL, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Broder, P. Ciccolo, E. Gabrilovich, V. Josifovski, D. Metzler, L. Riedel, and J. Yuan. Online expansion of rare queries for sponsored search. In WWW, pp. 511--520, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Gao, K. Toutanova, and W.-t. Yih. Clickthrough-based latent semantic models for web search. In SIGIR, pp. 675--684, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Guo, G. Xu, X. Cheng, and H. Li. Named entity recognition in query. In SIGIR, pp. 267--274, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Sheldon, M. Shokouhi, M. Szummer, and N. Craswell. Lambdamerge: merging the results of query reformulations. In WSDM, pp. 795--804, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Wang, R. Raina, D. Fong, D. Zhou, J. Han, and G. Badros. Learning relevance from heterogeneous social network and its application in online targeting. In SIGIR, pp. 655--664, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Q. Wang, J. Xu, H. Li, and N. Craswell. Regularized latent semantic indexing. In SIGIR'11, pp. 685--694. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Z. Wang, G. Xu, H. Li, and M. Zhang. A fast and accurate method for approximate string search. In ACL-HLT, pp. 52--61, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. W. Wu, H. Li, J. Xu, and S. Oyama. Learning a robust relevance model for search using kernel methods. JMLR, pp. 1429--1458, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Xu, W. Wu, H. Li, and G. Xu. A kernel approach to addressing term mismatch. In WWW, pp. 153--154, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Xu and G. Xu. Learning similarity function for rare queries. In WSDM, pp. 615--624, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Machine learning for query-document matching in search

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader