skip to main content
10.1145/2009916.2010007acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Clickthrough-based latent semantic models for web search

Published:24 July 2011Publication History

ABSTRACT

This paper presents two new document ranking models for Web search based upon the methods of semantic representation and the statistical translation-based approach to information retrieval (IR). Assuming that a query is parallel to the titles of the documents clicked on for that query, large amounts of query-title pairs are constructed from clickthrough data; two latent semantic models are learned from this data. One is a bilingual topic model within the language modeling framework. It ranks documents for a query by the likelihood of the query being a semantics-based translation of the documents. The semantic representation is language independent and learned from query-title pairs, with the assumption that a query and its paired titles share the same distribution over semantic topics. The other is a discriminative projection model within the vector space modeling framework. Unlike Latent Semantic Analysis and its variants, the projection matrix in our model, which is used to map from term vectors into sematic space, is learned discriminatively such that the distance between a query and its paired title, both represented as vectors in the projected semantic space, is smaller than that between the query and the titles of other documents which have no clicks for that query. These models are evaluated on the Web search task using a real world data set. Results show that they significantly outperform their corresponding baseline models, which are state-of-the-art.

References

  1. Asuncion, A., Welling, M, Smyth, P., and Teh, Y W. 2009. On smoothing and inference for topic models. In Proceedings of Uncertainty in Artificial Intelligence, pp. 27--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Berger, A., and Lafferty, J. 1999. Information retrieval as statistical translation. In SIGIR, pp. 222--229. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Blei, D., and Lafferty, J. 2007. A correlated topic model of science. The Annals of Applied Statistics, Vol. 1, No. 1, 17--35.Google ScholarGoogle ScholarCross RefCross Ref
  4. Blei, D. M., Ng, A. Y., and Jordan, M. J. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research, 3: 993--1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Brown, P. F., Della Pietra, S. A., Della Pietra, V. J., and Mercer, R. L. 1993. The mathematics of statistical machine translation: parameter estimation. Computational Linguistics, 19(2): 263--311. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, and Hullender, G. 2005. Learning to rank using gradient descent. In ICML, pp. 89--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Chien, J-T., and Wu, M-S. 2008. Adaptive Bayesian latent semantic analysis. IEEE Trans on Audio, Speech, and Language Processing, 16(1): 198--207. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. de Freitas, N., and Barnard, K. 2001. Bayesian latent semantic analysis of multimedia databases. Tech Report TR-2001--15, University of British Columbia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T., and Harshman, R. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6): 391--407Google ScholarGoogle ScholarCross RefCross Ref
  10. Dempster, A., Laird, N., and Rubin, D. 1977. Maximum likeli-hood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39: 1--38.Google ScholarGoogle Scholar
  11. Dumais, S. T., Letsche, T. A., Littman, M. L., and Landauer, T. K. 1997. Automatic cross-linguistic information retrieval using latent semantic indexing. In AAAI-97 Spring Symposium Series: Cross-Language Text and Speech Retrieval.Google ScholarGoogle Scholar
  12. Diamantaras, K. I., and Kung, S. Y. 1996. Principle Component Neural Networks: Theory and Applications. Wiley-Interscience. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ganchev, K., Graca, J., Gillenwater, J., and Taskar, B. 2010. Posterior regularization for structured latent variable models. Journal of Machine Learning Research, 11 (2010): 2001--2049. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Gao, J., He, X., and Nie, J-Y. 2010. Clickthrough-based translation models for web search: from word models to phrase models. In CIKM, pp. 1139--1148. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Gao, J., Wu, Q., Burges, C., Svore, K., Su, Y., Khan, N., Shah, S., and Zhou, H. 2009. Model adaptation via model interpolation and boosting for web search ranking. In EMNLP, 505--513. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Gao, J., Yuan, W., Li, X., Deng, K., and Nie, J-Y. 2009. Smoothing clickthrough data for web search ranking. In SIGIR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Girolami, M., and Kaban, A. 2003. On an equivalence between PLSA and LDA. In SIGIR, pp. 433--434. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Griffiths, T. L., Tenenbaum, J. B., and Steyvers, M. 2007. Topics in semantic representation. Psychological Review, Vol. 114, No. 2, 211--244.Google ScholarGoogle ScholarCross RefCross Ref
  19. Hofmann, T. 1999. Probabilistic latent semantic indexing. In SIGIR, pp. 50--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Huang, J., Gao, J., Miao, J., Li, X., Wang, K., and Behr, F. 2010. Exploring web scale language models for search query pro-cessing. In Proc. WWW 2010, pp. 451--460. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jarvelin, K. and Kekalainen, J. 2000. IR evaluation methods for retrieving highly relevant documents. In SIGIR, pp. 41--48 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Jin, R., Hauptmann, A. G., and Zhai, C. 2002. Title language model for information retrieval. In SIGIR, pp. 42--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Koehn, P., Och, F., and Marcu, D. 2003. Statistical phrase-based translation. In HLT/NAACL, pp. 127--133. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Manning, C. D., and Schutze, H. 1999. Foundations of Statistical Natural Language Processing. The MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Mimno, D., Wallach, H. J., Naradowsky, J., Smith, D. A., and McCallum, A. 2009. Polylingual topic models. In EMNLP, pp. 880--889. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Och, F. 2002. Statistical machine translation: from single-word models to alignment templates. PhD thesis, RWTH Aachen.Google ScholarGoogle Scholar
  27. Platt, J., Toutanova, K., and Yih, W. 2010. Translingual document representations from discriminative projections. In EMNLP, pp. 251--261. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Ponte, J., and Croft, W. B. 1998. A language model approach to information retrieval. In SIGIR, pp. 275--281. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Svore, K., and Burges, C. 2009. A machine learning approach for improved BM25 retrieval. In CIKM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Vinokourov, A., Shawe-taylor, J., and Cristianini, N. 2003. Inferring a semantic representation of text via cross-language correlation analysis. In NIPS, pp. 1473--1480.Google ScholarGoogle Scholar
  31. Wang, K., Li, X., and Gao, J. 2010. Multi-style language model for web scale information retrieval. In SIGIR, pp. 467--474. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Wei, X., and Croft, W. B. 2006. LDA-based document models for ad-hoc retrieval. In SIGIR, pp. 178--185. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Yih, W., Toutanova, K., Platt, J., and Meek, C. 2011. Learning discriminative projections for text similarity measures. In CoNLL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Zhai, C., and Lafferty, J. 2001. A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR, pp. 334--342. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Clickthrough-based latent semantic models for web search

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
        July 2011
        1374 pages
        ISBN:9781450307574
        DOI:10.1145/2009916

        Copyright © 2011 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 24 July 2011

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate792of3,983submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader