research-article

Clickthrough-based latent semantic models for web search

Authors:
Jianfeng Gao

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

,
Kristina Toutanova

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

,
Wen-tau Yih

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information RetrievalJuly 2011Pages 675–684https://doi.org/10.1145/2009916.2010007

Published:24 July 2011Publication History

SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

Pages 675–684

ABSTRACT

This paper presents two new document ranking models for Web search based upon the methods of semantic representation and the statistical translation-based approach to information retrieval (IR). Assuming that a query is parallel to the titles of the documents clicked on for that query, large amounts of query-title pairs are constructed from clickthrough data; two latent semantic models are learned from this data. One is a bilingual topic model within the language modeling framework. It ranks documents for a query by the likelihood of the query being a semantics-based translation of the documents. The semantic representation is language independent and learned from query-title pairs, with the assumption that a query and its paired titles share the same distribution over semantic topics. The other is a discriminative projection model within the vector space modeling framework. Unlike Latent Semantic Analysis and its variants, the projection matrix in our model, which is used to map from term vectors into sematic space, is learned discriminatively such that the distance between a query and its paired title, both represented as vectors in the projected semantic space, is smaller than that between the query and the titles of other documents which have no clicks for that query. These models are evaluated on the Web search task using a real world data set. Results show that they significantly outperform their corresponding baseline models, which are state-of-the-art.

References

Asuncion, A., Welling, M, Smyth, P., and Teh, Y W. 2009. On smoothing and inference for topic models. In Proceedings of Uncertainty in Artificial Intelligence, pp. 27--34. Google ScholarDigital Library
Berger, A., and Lafferty, J. 1999. Information retrieval as statistical translation. In SIGIR, pp. 222--229. Google ScholarDigital Library
Blei, D., and Lafferty, J. 2007. A correlated topic model of science. The Annals of Applied Statistics, Vol. 1, No. 1, 17--35.Google ScholarCross Ref
Blei, D. M., Ng, A. Y., and Jordan, M. J. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research, 3: 993--1022. Google ScholarDigital Library
Brown, P. F., Della Pietra, S. A., Della Pietra, V. J., and Mercer, R. L. 1993. The mathematics of statistical machine translation: parameter estimation. Computational Linguistics, 19(2): 263--311. Google ScholarDigital Library
Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, and Hullender, G. 2005. Learning to rank using gradient descent. In ICML, pp. 89--96. Google ScholarDigital Library
Chien, J-T., and Wu, M-S. 2008. Adaptive Bayesian latent semantic analysis. IEEE Trans on Audio, Speech, and Language Processing, 16(1): 198--207. Google ScholarDigital Library
de Freitas, N., and Barnard, K. 2001. Bayesian latent semantic analysis of multimedia databases. Tech Report TR-2001--15, University of British Columbia. Google ScholarDigital Library
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T., and Harshman, R. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6): 391--407Google ScholarCross Ref
Dempster, A., Laird, N., and Rubin, D. 1977. Maximum likeli-hood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39: 1--38.Google Scholar
Dumais, S. T., Letsche, T. A., Littman, M. L., and Landauer, T. K. 1997. Automatic cross-linguistic information retrieval using latent semantic indexing. In AAAI-97 Spring Symposium Series: Cross-Language Text and Speech Retrieval.Google Scholar
Diamantaras, K. I., and Kung, S. Y. 1996. Principle Component Neural Networks: Theory and Applications. Wiley-Interscience. Google ScholarDigital Library
Ganchev, K., Graca, J., Gillenwater, J., and Taskar, B. 2010. Posterior regularization for structured latent variable models. Journal of Machine Learning Research, 11 (2010): 2001--2049. Google ScholarDigital Library
Gao, J., He, X., and Nie, J-Y. 2010. Clickthrough-based translation models for web search: from word models to phrase models. In CIKM, pp. 1139--1148. Google ScholarDigital Library
Gao, J., Wu, Q., Burges, C., Svore, K., Su, Y., Khan, N., Shah, S., and Zhou, H. 2009. Model adaptation via model interpolation and boosting for web search ranking. In EMNLP, 505--513. Google ScholarDigital Library
Gao, J., Yuan, W., Li, X., Deng, K., and Nie, J-Y. 2009. Smoothing clickthrough data for web search ranking. In SIGIR. Google ScholarDigital Library
Girolami, M., and Kaban, A. 2003. On an equivalence between PLSA and LDA. In SIGIR, pp. 433--434. Google ScholarDigital Library
Griffiths, T. L., Tenenbaum, J. B., and Steyvers, M. 2007. Topics in semantic representation. Psychological Review, Vol. 114, No. 2, 211--244.Google ScholarCross Ref
Hofmann, T. 1999. Probabilistic latent semantic indexing. In SIGIR, pp. 50--57. Google ScholarDigital Library
Huang, J., Gao, J., Miao, J., Li, X., Wang, K., and Behr, F. 2010. Exploring web scale language models for search query pro-cessing. In Proc. WWW 2010, pp. 451--460. Google ScholarDigital Library
Jarvelin, K. and Kekalainen, J. 2000. IR evaluation methods for retrieving highly relevant documents. In SIGIR, pp. 41--48 Google ScholarDigital Library
Jin, R., Hauptmann, A. G., and Zhai, C. 2002. Title language model for information retrieval. In SIGIR, pp. 42--48. Google ScholarDigital Library
Koehn, P., Och, F., and Marcu, D. 2003. Statistical phrase-based translation. In HLT/NAACL, pp. 127--133. Google ScholarDigital Library
Manning, C. D., and Schutze, H. 1999. Foundations of Statistical Natural Language Processing. The MIT Press. Google ScholarDigital Library
Mimno, D., Wallach, H. J., Naradowsky, J., Smith, D. A., and McCallum, A. 2009. Polylingual topic models. In EMNLP, pp. 880--889. Google ScholarDigital Library
Och, F. 2002. Statistical machine translation: from single-word models to alignment templates. PhD thesis, RWTH Aachen.Google Scholar
Platt, J., Toutanova, K., and Yih, W. 2010. Translingual document representations from discriminative projections. In EMNLP, pp. 251--261. Google ScholarDigital Library
Ponte, J., and Croft, W. B. 1998. A language model approach to information retrieval. In SIGIR, pp. 275--281. Google ScholarDigital Library
Svore, K., and Burges, C. 2009. A machine learning approach for improved BM25 retrieval. In CIKM. Google ScholarDigital Library
Vinokourov, A., Shawe-taylor, J., and Cristianini, N. 2003. Inferring a semantic representation of text via cross-language correlation analysis. In NIPS, pp. 1473--1480.Google Scholar
Wang, K., Li, X., and Gao, J. 2010. Multi-style language model for web scale information retrieval. In SIGIR, pp. 467--474. Google ScholarDigital Library
Wei, X., and Croft, W. B. 2006. LDA-based document models for ad-hoc retrieval. In SIGIR, pp. 178--185. Google ScholarDigital Library
Yih, W., Toutanova, K., Platt, J., and Meek, C. 2011. Learning discriminative projections for text similarity measures. In CoNLL. Google ScholarDigital Library
Zhai, C., and Lafferty, J. 2001. A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR, pp. 334--342. Google ScholarDigital Library

Index Terms

Clickthrough-based latent semantic models for web search
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Learning deep structured semantic models for web search using clickthrough data
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management

Latent semantic models, such as LSA, intend to map a query to its relevant documents at the semantic level where keyword-based matching often fails. In this study we strive to develop a series of new latent semantic models with a deep structure that ...
Read More
Clickthrough-based translation models for web search: from word models to phrase models
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management

Web search is challenging partly due to the fact that search queries and Web documents use different language styles and vocabularies. This paper provides a quantitative analysis of the language discrepancy issue, and explores the use of clickthrough ...
Read More
Modeling click-through based word-pairs for web search
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Statistical translation models and latent semantic analysis (LSA) are two effective approaches to exploiting click-through data for Web search ranking. While the former learns semantic relationships between query terms and document terms directly, the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
July 2011
1374 pages
ISBN:9781450307574
DOI:10.1145/2009916
General Chairs:
Wei-Ying Ma
Microsoft Research Asia, China
,
Jian-Yun Nie
University of Montreal, Canada
,
Program Chairs:
Ricardo Baeza-Yates
Yahoo! Research, Spain
,
Tat-Seng Chua
National University of Singapore
,
W. Bruce Croft
University of Massachusetts, Amherst, USA
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 July 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
clickthrough data
latent semantic analysis
linear projection
topic model
translation model
web search
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 43
  Total Citations
  View Citations
- 583
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Clickthrough-based latent semantic models for web search

SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Learning deep structured semantic models for web search using clickthrough data

Clickthrough-based translation models for web search: from word models to phrase models

Modeling click-through based word-pairs for web search