research-article

Improving Tail Query Performance by Fusion Model

Authors:

Shaoping MaAuthors Info & Claims

CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management

Pages 559 - 568

https://doi.org/10.1145/2661829.2661943

Published: 03 November 2014 Publication History

Abstract

Tail queries, which occur with low frequency, make up a large fraction of unique queries and often affect a user's experience during Web searching. Because of the data sparseness problem, information that can be leveraged for tail queries is not sufficient. Hence, it is important and difficult to improve the tail query performance. According to our observation, 26% of the tail queries are not essentially scarce: they are expressed in an unusual way, but the information requirements are not rare. In this study, we improve the tail query performance by fusing the results from original query and the query reformulation candidates. Other than results re-ranking, new results can be introduced by the fusion model. We emphasize that queries that can be improved are not only bad queries, and we propose to extract features that predict whether the performance can be improved. Then, we utilize a learning-to-rank method, which is trained to directly optimize a retrieval metric, to fuse the documents and obtain a final results list. We conducted experiments using data from two popular Chinese search engines. The results indicate that our fusion method significantly improves the performance of the tail queries and outperforms the state-of-the-art approaches on the same reformulations. Experiments show that our method is effective for the non-tail queries as well.

References

[1]

G. Amati, C. Carpineto, and G. Romano. Query difficulty, robustness, and selective application of query expansion. In Advances in information retrieval, pages 127--137. Springer, 2004.

[2]

N. Balasubramanian, G. Kumaran, and V. R. Carvalho. Predicting query performance on the web. In Proc. 33rd Intl. ACM SIGIR Conf. on Research and development in information retrieval, pages 785--786. ACM, 2010.

Digital Library

[3]

P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, and S. Vigna. The query-flow graph: model and applications. In Proc. 17th ACM Conf. on Information and knowledge management, pages 609--618. ACM, 2008.

Digital Library

[4]

F. Bonchi, R. Perego, F. Silvestri, H. Vahabi, and R. Venturini. Recommendations for the long tail by term-query graph. In Proc. 20th Intl. Conf. companion on World wide web, pages 15--16. ACM, 2011.

Digital Library

[5]

A. Broder, P. Ciccolo, E. Gabrilovich, V. Josifovski, D. Metzler, L. Riedel, and J. Yuan. Online expansion of rare queries for sponsored search. In Proc. 18th Intl. Conf. on World wide web, pages 511--520. ACM, 2009.

Digital Library

[6]

C. J. Burges, R. Ragno, and Q. V. Le. Learning to rank with nonsmooth cost functions. In NIPS, volume 6, pages 193--200, 2006.

[7]

D. Carmel, E. Yom-Tov, A. Darlow, and D. Pelleg. What makes a query difficult? In Proc. 29th annual Intl. ACM SIGIR Conf. on Research and development in information retrieval, pages 390--397. ACM, 2006.

Digital Library

[8]

O. Chapelle and Y. Chang. Yahoo! learning to rank challenge overview. In Yahoo! Learning to Rank Challenge, pages 1--24, 2011.

[9]

O. Chapelle, D. Metlzer, Y. Zhang, and P. Grinspan. Expected reciprocal rank for graded relevance. In Proc. 18th ACM Conf. on Information and knowledge management, pages 621--630. ACM, 2009.

Digital Library

[10]

S. Cronen-Townsend, Y. Zhou, and W. B. Croft. Predicting query performance. In Proc. 25th annual Intl. ACM SIGIR Conf. on Research and development in information retrieval, pages 299--306. ACM, 2002.

Digital Library

[11]

F. Diaz. Performance prediction using spatial autocorrelation. In Proc. 30th annual Intl. ACM SIGIR Conf. on Research and development in information retrieval, pages 583--590. ACM, 2007.

Digital Library

[12]

D. Downey, S. Dumais, and E. Horvitz. Heads and tails: studies of web search with common and rare queries. In Proc. 30th annual Intl. ACM SIGIR Conf. on Research and development in information retrieval, pages 847--848. ACM, 2007.

Digital Library

[13]

D. Downey, S. Dumais, D. Liebling, and E. Horvitz. Understanding the relationship between searchers' queries and information goals. In Proc. 17th ACM Conf. on Information and knowledge management, pages 449--458. ACM, 2008.

Digital Library

[14]

E. A. Fox and J. A. Shaw. Combination of multiple searches. NIST SPECIAL PUBLICATION SP, pages 243--243, 1994.

[15]

J. H. Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics, pages 1189--1232, 2001.

[16]

S. Goel, A. Broder, E. Gabrilovich, and B. Pang. Anatomy of the long tail: ordinary people with extraordinary tastes. In Proc. 3rd ACM Intl. Conf. on Web search and data mining, pages 201--210. ACM, 2010.

Digital Library

[17]

Q. Guo, R. W. White, S. T. Dumais, J. Wang, and B. Anderson. Predicting query performance using query, result, and user interaction features. In Adaptivity, Personalization and Fusion of Heterogeneous Information, pages 198--201. LE CENTRE DE HAUTES ETUDES Intl.ES D'INFORMATIQUE DOCUMENTAIRE, 2010.

Digital Library

[18]

C. Hauff, D. Hiemstra, and F. de Jong. A survey of pre-retrieval query performance predictors. In Proc. 17th ACM Conf. on Information and knowledge management, pages 1419--1420. ACM, 2008.

Digital Library

[19]

R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton. Adaptive mixtures of local experts. Neural computation, 3(1):79--87, 1991.

[20]

K. Järvelin and J. Kekäläinen. Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems (TOIS), 20(4):422--446, 2002.

Digital Library

[21]

T. Joachims. Optimizing search engines using clickthrough data. In Proc. eighth ACM SIGKDD Intl. Conf. on Knowledge discovery and data mining, pages 133--142. ACM, 2002.

Digital Library

[22]

T. Joachims et al. Evaluating retrieval performance using clickthrough data., 2003.

[23]

Y. Liu, R. Song, Y. Chen, J.-Y. Nie, and J.-R. Wen. Adaptive query suggestion for difficult queries. In Proc. 35th Intl. ACM SIGIR Conf. on Research and development in information retrieval, pages 15--24. ACM, 2012.

Digital Library

[24]

M. Montague and J. A. Aslam. Condorcet fusion for improved retrieval. In Proc. eleventh Intl. Conf. on Information and knowledge management, pages 538--548. ACM, 2002.

Digital Library

[25]

S. Pandey, K. Punera, M. Fontoura, and V. Josifovski. Estimating advertisability of tail queries for sponsored search. In Proc. 33rd Intl. ACM SIGIR Conf. on Research and development in information retrieval, pages 563--570. ACM, 2010.

Digital Library

[26]

S. Robertson, H. Zaragoza, and M. Taylor. Simple bm25 extension to multiple weighted fields. In Proc. thirteenth ACM Intl. Conf. on Information and knowledge management, pages 42--49. ACM, 2004.

Digital Library

[27]

S. E. Robertson and S. Walker. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In Proc. 17th annual Intl. ACM SIGIR Conf. on Research and development in information retrieval, pages 232--241. Springer-Verlag New York, Inc., 1994.

Digital Library

[28]

D. Sheldon, M. Shokouhi, M. Szummer, and N. Craswell. Lambdamerge: merging the results of query reformulations. In Proc. 4th ACM Intl. Conf. on Web search and data mining, pages 795--804. ACM, 2011.

Digital Library

[29]

A. Shtok, O. Kurland, and D. Carmel. Predicting query performance by query-drift estimation. In Advances in Information Retrieval Theory, pages 305--312. Springer, 2009.

Digital Library

[30]

Y. Song and L.-w. He. Optimal rare query suggestion with implicit user feedback. In Proc. 19th Intl. Conf. on World wide web, pages 901--910. ACM, 2010.

Digital Library

[31]

I. Szpektor, A. Gionis, and Y. Maarek. Improving recommendation for long-tail queries via templates. In Proc. 20th Intl. Conf. on World wide web, pages 47--56. ACM, 2011.

Digital Library

[32]

Q. Wu, C. J. Burges, K. M. Svore, and J. Gao. Adapting boosting for information retrieval measures. Information Retrieval, 13(3):254--270, 2010.

Digital Library

[33]

J. Xu and W. B. Croft. Query expansion using local and global document analysis. In Proc. 19th annual Intl. ACM SIGIR Conf. on Research and development in information retrieval, pages 4--11. ACM, 1996.

Digital Library

[34]

T. Yao, M. Zhang, Y. Liu, S. Ma, and L. Ru. Empirical study on rare query characteristics. In Web Intelligence and Intelligent Agent Technology (WI-IAT), 2011 IEEE/WIC/ACM Intl. Conf. on, volume 1, pages 7--14. IEEE, 2011.

Digital Library

[35]

E. Yom-Tov, S. Fine, D. Carmel, and A. Darlow. Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval. In Proc. 28th annual Intl. ACM SIGIR Conf. on Research and development in information retrieval, pages 512--519. ACM, 2005.

Digital Library

[36]

E. Yom-Tov, S. Fine, D. Carmel, and A. Darlow. Metasearch and federation using query difficulty prediction. In Proc. ACM SIGIR Workshop on Predicting Query Difficulty, Salvador, Brazil, 2005.

[37]

H. Zaragoza, B. B. Cambazoglu, and R. Baeza-Yates. Web search solved?: all result rankings the same? In Proc. 19th ACM Intl. Conf. on Information and knowledge management, pages 529--538. ACM, 2010.

Digital Library

[38]

K. Zhou, X. Li, and H. Zha. Collaborative ranking: improving the relevance for tail queries. In Proc. 21st ACM Intl. Conf. on Information and knowledge management, pages 1900--1904. ACM, 2012.

Digital Library

Cited By

Maji SPatel PThakarar BKumar MTripathi K(2020)A Regularised Intent Model for Discovering Multiple Intents in E-Commerce Tail QueriesAdvances in Information Retrieval10.1007/978-3-030-45439-5_43(651-665)Online publication date: 14-Apr-2020
https://dl.acm.org/doi/10.1007/978-3-030-45439-5_43
Maji SKumar RBansal MRoy KKumar MGoyal PPiwowarski BChevalier MGaussier EMaarek YNie JScholer F(2019)Addressing Vocabulary Gap in E-commerce SearchProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331323(1073-1076)Online publication date: 18-Jul-2019
https://dl.acm.org/doi/10.1145/3331184.3331323
Kurland OCulpepper JCollins-Thompson KMei QDavison BLiu YYilmaz E(2018)Fusion in Information RetrievalThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3210186(1383-1386)Online publication date: 27-Jun-2018
https://dl.acm.org/doi/10.1145/3209978.3210186
Show More Cited By

Index Terms

Improving Tail Query Performance by Fusion Model
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing
    2. Retrieval models and ranking

Recommendations

Collaborative ranking: improving the relevance for tail queries
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

It is well known that tail queries contribute to a substantial fraction of distinct queries submitted to search engines and thus become a major battle field for search engines. Unfortunately, compared with popular queries, it is much more difficult to ...
Learning to rank query reformulations
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

Query reformulation techniques based on query logs have recently proven to be effective for web queries. However, when initial queries have reasonably good quality, these techniques are often not reliable enough to identify the helpful reformulations ...
Reformulation-Based Query Answering for RDF Graphs with RDFS Ontologies
The Semantic Web
Abstract
Query answering in RDF knowledge bases has traditionally been performed either through graph saturation, i.e., adding all implicit triples to the graph, or through query reformulation, i.e., modifying the query to look for the explicit triples ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management

November 2014

2152 pages

ISBN:9781450325981

DOI:10.1145/2661829

General Chairs:
Jianzhong Li
Harbin Inst. of Technology
,
X. Sean Wang
Fudan University
,
Program Chairs:
Minos Garofalakis
Technical University of Crete, Greece
,
Ian Soboroff
National Institute of Standards, USA
,
Torsten Suel
New York University, USA
,
Min Wang
Google Research, USA

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

CIKM '14

Sponsor:

CIKM '14: 2014 ACM Conference on Information and Knowledge Management

November 3 - 7, 2014

Shanghai, China

Acceptance Rates

CIKM '14 Paper Acceptance Rate 175 of 838 submissions, 21%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
267
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)1

Reflects downloads up to 27 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Maji SPatel PThakarar BKumar MTripathi K(2020)A Regularised Intent Model for Discovering Multiple Intents in E-Commerce Tail QueriesAdvances in Information Retrieval10.1007/978-3-030-45439-5_43(651-665)Online publication date: 14-Apr-2020
https://dl.acm.org/doi/10.1007/978-3-030-45439-5_43
Maji SKumar RBansal MRoy KKumar MGoyal PPiwowarski BChevalier MGaussier EMaarek YNie JScholer F(2019)Addressing Vocabulary Gap in E-commerce SearchProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331323(1073-1076)Online publication date: 18-Jul-2019
https://dl.acm.org/doi/10.1145/3331184.3331323
Kurland OCulpepper JCollins-Thompson KMei QDavison BLiu YYilmaz E(2018)Fusion in Information RetrievalThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3210186(1383-1386)Online publication date: 27-Jun-2018
https://dl.acm.org/doi/10.1145/3209978.3210186
Yin DHu YTang JDaly TZhou MOuyang HChen JKang CDeng HNobata CLanglois JChang YKrishnapuram BShah MSmola AAggarwal CShen DRastogi R(2016)Ranking Relevance in Yahoo SearchProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/2939672.2939677(323-332)Online publication date: 13-Aug-2016
https://dl.acm.org/doi/10.1145/2939672.2939677

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten