skip to main content
10.1145/1871437.1871567acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Probabilistic first pass retrieval for search advertising: from theory to practice

Published: 26 October 2010 Publication History

Abstract

Information retrieval in search advertising, as in other ad-hoc retrieval tasks, aims to find the most appropriate ranking of the ad documents of a corpus for a given query. In addition to ranking the ad documents, we also need to filter or threshold irrelevant ads from participating in the auction to be displayed alongside search results. In this work, we describe our experience in implementing a successful ad retrieval system for a commercial search engine based on the Language Modeling (LM) framework for retrieval. The LM demonstrates significant performance improvements over the baseline vector space model (TF-IDF) system that was in production at the time. From a modeling perspective, we propose a novel approach to incorporate query segmentation and phrases in the LM framework, discuss impact of score normalization for relevance filtering, and present preliminary results of incorporating query expansions using query rewriting techniques. From an implementation perspective, we also discuss real-time latency constraints of a production search engine and how we overcome them by adapting the WAND algorithm to work with language models. In sum, our LM formulation is considerably better in terms of accuracy metrics such as Precision-Recall (10% improvement in AUC) and nDCG (8% improvement in nDCG@5) on editorial data and also demonstrates significant improvements in clicks in live user tests (0.787% improvement in Click Yield, with 8% coverage increase). Finally, we hope that this paper provides the reader with adequate insights into the challenges of building a system that serves millions of users every day.

References

[1]
www.emarketer.com/Article.aspx?id=1006319.
[2]
T. Anastasakos, D. Hillard, S. Kshetramade, and H. Raghavan. A collaborative filtering approach to ad recommendation using the query ad click graph. Technical Report YL-2009-006, Yahoo! Labs, Aug 2009.
[3]
A. Berger and J. Lafferty. Information retrieval as statistical translation. In SIGIR '99.
[4]
A. Broder, P. Ciccolo, E. Gabrilovich, V. Josifovski, D. Metzler, L. Riedel, and J. Yuan. Online expansion of rare queries for sponsored search. In WWW '09, 2009.
[5]
A. Z. Broder, D. Carmel, M. Herscovici, A. Soffer, and J. Zien. Efficient query evaluation using a two-level retrieval process. In CIKM '03.
[6]
A. Z. Broder, P. Ciccolo, M. Fontoura, E. Gabrilovich, V. Josifovski, and L. Riedel. Search advertising using web relevance feedback. In CIKM, 2008.
[7]
A. Z. Broder, M. Fontoura, E. Gabrilovich, A. Joshi, V. Josifovski, and T. Zhang. Robust classification of rare queries using web knowledge. In SIGIR '07. ACM.
[8]
B. Edelman, M. Ostrovsky, and M. Schwarz. Internet advertising and the generalized second-price auction: Selling billions of dollars worth of keywords. American Economic Review, 97(1), March 2007.
[9]
D. Harman. Towards interactive query expansion. In SIGIR, 1988.
[10]
D. Hillard, S. Schroedl, E. Manavoglu, H. Raghavan, and C. Leggetter. Improving ad relevance in sponsored search. In WSDM, 2010.
[11]
B. Jansen and M. Resnick. Examining searcher perceptions of and interactions with sponsored results. In Workshop on Sponsored Search Auctions, 2005.
[12]
K. Järvelin and J. Kekäläinen. Cumulated gain-based evaluation of ir techniques. ACM Trans. Inf. Syst., 20(4):422--446, 2002.
[13]
R. Jones, B. Rey, O. Madani, and W. Greiner. Generating query substitutions. In WWW, 2006.
[14]
V. Lavrenko and W. B. Croft. Relevance based language models. In SIGIR, 2001.
[15]
M. Lease, J. Allan, and W. B. Croft. Regression Rank: Learning to Meet the Opportunity of Descriptive Queries. In ECIR, 2009.
[16]
X. Li, Y.-Y. Wang, and A. Acero. Extracting structured information from user queries with semi-supervised conditional random fields. In SIGIR, 2009.
[17]
X. Liu and W. B. Croft. Statistical language modeling for information retrieval. The Annual Review of Information Science and Technology, 39:3--31, 2004.
[18]
D. Metzler and W. B. Croft. A markov random field model for term dependencies. In SIGIR, pages 472--479, 2005.
[19]
D. Metzler, S. Dumais, and C. Meek. Similarity measures for short segments of text. In In Proc. of ECIR, 2007.
[20]
C. Middleton and R. Baeza-yates. A comparison of open source search engines.
[21]
V. Murdock, M. Ciaramita, and V. Plachouras. A noisy-channel approach to contextual advertising. In ADKDD '07: Workshop on Data mining and audience intelligence for advertising, 2007.
[22]
J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In SIGIR, 1998.
[23]
H. Raghavan and R. Iyer. Evaluating vector-space and probabilistic models for query to ad matching. In SIGIR Workshop on Information Retrieval in Advertising, 2008.
[24]
S. Ravi, A. Z. Broder, E. Gabrilovich, V. Josifovski, S. Pandey, and B. Pang. Automatic generation of bid phrases for online advertising. In WSDM, 2010.
[25]
B. Ribeiro-Neto, M. Cristo, P. B. Golgher, and E. S. de Moura. Impedance coupling in content-targeted advertising. In SIGIR, 2005.
[26]
M. Richardson, E. Dominowska, and R. Ragno. Predicting clicks: estimating the click-through rate for new ads. In WWW, 2007.
[27]
S. E. Robertson. The probability ranking principle in ir. pages 281--286, 1997.
[28]
G. Salton, A. Wong, and C. S. Yang. A vector space model for automatic indexing. Commun. ACM, 18(11), 1975.
[29]
M. Spitters and W. Kraaij. A language modeling approach to tracking news events. In Proc. of the TDT '00 workshop.
[30]
T. Strohman and W. B. Croft. Efficient document retrieval in main memory. In SIGIR, 2007.
[31]
B. Tan and F. Peng. Unsupervised query segmentation using generative language models & wikipedia. In WWW '08.
[32]
W. tau Yih, J. Goodman, and V. R. Carvalho. Finding advertising keywords on web optpages. In WWW '06, 2006.
[33]
TREC. http://trec.nist.gov.
[34]
J. Xu and W. B. Croft. Improving the effectiveness of information retrieval with local context analysis. ACM Trans. Inf. Syst., 18(1), 2000.
[35]
C. Zhai and J. Lafferty. Model-based feedback in the language modeling approach to information retrieval. In CIKM, 2001.
[36]
C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst., 22(2):179--214, 2004.
[37]
J. Zobel and A. Moffat. Inverted files for text search engines. ACM Comput. Surv., 38(2):6, 2006.

Cited By

View all
  • (2018)Computational AdvertisingFoundations and Trends in Information Retrieval10.1561/15000000458:4–5(263-418)Online publication date: 14-Dec-2018
  • (2018)Scalable Query N-Gram Embedding for Improving Matching and Relevance in Sponsored SearchProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3219819.3219897(52-61)Online publication date: 19-Jul-2018
  • (2016)The Role of Relevance in Sponsored SearchProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983840(185-194)Online publication date: 24-Oct-2016
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management
October 2010
2036 pages
ISBN:9781450300995
DOI:10.1145/1871437
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. advertising
  2. language models
  3. sponsored search

Qualifiers

  • Research-article

Conference

CIKM '10

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)1
Reflects downloads up to 18 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2018)Computational AdvertisingFoundations and Trends in Information Retrieval10.1561/15000000458:4–5(263-418)Online publication date: 14-Dec-2018
  • (2018)Scalable Query N-Gram Embedding for Improving Matching and Relevance in Sponsored SearchProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3219819.3219897(52-61)Online publication date: 19-Jul-2018
  • (2016)The Role of Relevance in Sponsored SearchProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983840(185-194)Online publication date: 24-Oct-2016
  • (2015)Effective Healthcare Advertising Using Latent Dirichlet Allocation and Inference EngineAdvances in Information Retrieval10.1007/978-3-319-16354-3_74(672-677)Online publication date: 2015
  • (2013)User-Aware AdvertisabilityInformation Retrieval Technology10.1007/978-3-642-45068-6_39(452-463)Online publication date: 2013
  • (2011)A language model approach to capture commercial intent and information relevance for sponsored searchProceedings of the 20th ACM international conference on Information and knowledge management10.1145/2063576.2063665(599-604)Online publication date: 24-Oct-2011
  • (2011)The sum of its parts: reducing sparsity in click estimation with query segmentsInformation Retrieval10.1007/s10791-010-9152-614:3(315-336)Online publication date: 3-Feb-2011

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media