Optimizing top-k retrieval: submodularity analysis and search strategies

Sha, Chaofeng; Wang, Keqiang; Zhang, Dell; Wang, Xiaoling; Zhou, Aoying

doi:10.1007/s11704-015-5222-7

Optimizing top-k retrieval: submodularity analysis and search strategies

Research Article
Published: 19 January 2016

Volume 10, pages 477–487, (2016)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Chaofeng Sha¹,
Keqiang Wang²,
Dell Zhang³,
Xiaoling Wang² &
…
Aoying Zhou²

67 Accesses
4 Citations
Explore all metrics

Abstract

The key issue in top-k retrieval, finding a set of k documents (from a large document collection) that can best answer a user’s query, is to strike the optimal balance between relevance and diversity. In this paper, we study the top-k retrieval problem in the framework of facility location analysis and prove the submodularity of that objective function which provides a theoretical approximation guarantee of factor 1−$\frac{1}{e}$ for the (best-first) greedy search algorithm. Furthermore, we propose a two-stage hybrid search strategy which first obtains a high-quality initial set of top-k documents via greedy search, and then refines that result set iteratively via local search. Experiments on two large TREC benchmark datasets show that our two-stage hybrid search strategy approach can supersede the existing ones effectively and efficiently.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal algorithms for selecting top-k combinations of attributes: theory and applications

Article 26 October 2017

Top-k List Aggregation: Mathematical Formulations and Polyhedral Comparisons

Compact Indexes for Flexible Top- $$k$$ Retrieval

References

Manning C D, Raghavan P, Schütze H. Introduction to Information Retrieval. Cambridge: Cambridge University Press, 2008
Book MATH Google Scholar
Chen H, Karger D R. Less is more: probabilistic models for retrieving fewer relevant documents. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2006, 429–436
Google Scholar
Carbonell J G, Goldstein J. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 1998, 335–336
Google Scholar
Zhai C, Cohen W W, Lafferty J D. Beyond independent relevance: Methods and evaluation metrics for subtopic retrieval. In: Proceedings of the 26th Annal International ACM SIGIR Conference on Research and Development in Information Retrieval. 2003, 10–17
Google Scholar
Wang J, Zhu J. Portfolio theory of information retrieval. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2009, 115–122
Google Scholar
Zuccon G, Azzopardi L. Using the quantum probability ranking principle to rank interdependent documents. In: Proceedings of the 32th European Conference on Information Retrieval Research. 2010, 357–369
Google Scholar
Chandar P, Carterette B. Diversification of search results using webgraphs. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2010, 869–870
Google Scholar
Santos R L T, Macdonald C, Ounis I. Intent-aware search result diversification. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2011, 595–604
Google Scholar
Zuccon G, Azzopardi L, Zhang D, Wang J. Top-k retrieval using facility location analysis. In: Proceedings of the 34th European Conference on Information Retrieval Research. 2012, 305–316
Google Scholar
Gonzalez T F. Handbook of Approximation Algorithms and Metaheuristics. Boca Raton: CRC Press, 2007
Book MATH Google Scholar
Russell S, Norvig P. Artificial Intelligence: A Modern Approach. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 2009
MATH Google Scholar
Sha C, Wang K, Zhang D, Wang X, Zhou A. Optimizing top-k retrieval: submodularity analysis and search strategies. In: Proceedings of the 15th International Conference on Web-Age Information Management. 2014, 18–29
Google Scholar
Nemhauser G, Wolsey L, Fisher M. An analysis of approximations for maximizing submodular set functions —I. Mathematical Programming, 1978, 14(1): 265–294
Article MathSciNet MATH Google Scholar
Agrawal R, Gollapudi S, Halverson A, Ieong S. Diversifying search results. In: Proceedings of the 2nd ACM International Conference on Web Search and Data Mining. 2009, 5–14
Chapter Google Scholar
He J, Hollink V, de Vries A P. Combining implicit and explicit topic representations for result diversification. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2012, 851–860
Google Scholar
Santos R L T, Macdonald C, Ounis I. Exploiting query reformulations for Web search result diversification. In: Proceedings of the 19th International World Wide Web Conference. 2010, 881–890
Google Scholar
Vallet D, Castells P. Personalized diversification of search results. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2012, 841–850
Google Scholar
Vargas S, Castells P, Vallet D. Explicit relevance models in intentoriented information retrieval diversification. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2012, 75–84
Google Scholar
Gollapudi S, Sharma A. An axiomatic approach for result diversification. In: Proceedings of the 18th International Conference on World Wide Web. 2009, 381–390
Chapter Google Scholar
Krause A, Golovin D. Submodular function maximization. Tractability: Practical Approaches to Hard Problems, 2012, 3: 19
Google Scholar
Lin H, Bilmes J. A class of submodular functions for document summarization. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011, 510–520
Google Scholar
Chapelle O, Metlzer D, Zhang Y, Grinspan P. Expected reciprocal rank for graded relevance. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 621–630
Google Scholar
Clarke C L A, Kolla M, Cormack G V, Vechtomova O, Ashkan A, Buttcher S, MacKinnon I. Novelty and diversity in information retrieval evaluation. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2008, 659–666
Google Scholar
Krause A, Guestrin C. Near-optimal nonmyopic value of information in graphical models. In: Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence. 2005, 324–331
Google Scholar
Kempe D, Kleinberg J, Tardos E. Maximizing the spread of influence through a social network. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2003, 137–146
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Shanghai Key Laboratory of Intelligent Information Processing, Fudan University, Shanghai, 200433, China
Chaofeng Sha
Shanghai Key Laboratory of Trustworthy Computing, East China Normal University, Shanghai, 200062, China
Keqiang Wang, Xiaoling Wang & Aoying Zhou
Department of Computer Science and Information Systems, Birkbeck, University of London, London, WC1E 7HX, UK
Dell Zhang

Authors

Chaofeng Sha
View author publications
You can also search for this author inPubMed Google Scholar
Keqiang Wang
View author publications
You can also search for this author inPubMed Google Scholar
Dell Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Xiaoling Wang
View author publications
You can also search for this author inPubMed Google Scholar
Aoying Zhou
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Chaofeng Sha.

Additional information

Chaofeng Sha is an associate professor in Fudan University, China. He received the BS degree in applied mathematics in 1998 from Xidian University, China, the MS degree in 2001 and the PhD degree in 2009 fromFudan University, China, both in computer science. Since 2001, he has been in the School of Computer Science at Fudan University. His work is in the area of data mining and data management.

Keqiang Wang received the bachelor degree from East China Normal University (ECNU), China in 2012. He is currently a PhD student at ECNU. His research interests mainly focus on recommender system and data mining.

Dell Zhang is a senior lecturer in the Department of Computer Science and Information Systems at Birkbeck, University of London (UOL), UK. He is also a senior member of ACM, a senior member of IEEE, and a Fellow of RSS.He joined Birkbeck in 2005. Before he moved to the UK, he was a research fellow at the Singapore- MIT Alliance. His research is on the theme of improving information retrieval and organisation through machine learning or data mining.

Xiaoling Wang received the bachelor, master, and doctoral degrees from Southeastern University, China in 1997, 2000, and 2003, respectively. She is currently a professor and vice dean in Software Engineering Institute, East China Normal University (ECNU), China. She was an assistant professor and an associate professor at Fudan University from 2003 to 2008, and joined ECNU in 2008. She achieved the Programs of New-Century Talent of Ministry of Education of China. Her research interests mainly include Web data management, data mining and data service technology.

Aoying Zhou is a professor in computer science at East China Normal University (ECNS), China, where he is heading the Institute of Massive Computing. Before joining ECNU in 2008, he worked for Fudan University at the Computer Science Department for 15 years. He is the winner of the National Science Fund for Distinguished Young Scholars supported by the National Natural Science Foundation of China and the professorship appointment under Changjiang Scholars Program of Ministry of Education. He is now acting as a vice director of ACMSIGMOD China and Database Technology Committee of China Computer Federation. He is serving as a member of the editorial boards VLDB Journal,WWWJournal, etc. His research interests include data management, memory cluster computing, big data benchmarking and performance optimization.

Electronic supplementary material

Supplementary material, approximately 309 KB.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sha, C., Wang, K., Zhang, D. et al. Optimizing top-k retrieval: submodularity analysis and search strategies. Front. Comput. Sci. 10, 477–487 (2016). https://doi.org/10.1007/s11704-015-5222-7

Download citation

Received: 09 June 2015
Accepted: 09 September 2015
Published: 19 January 2016
Issue Date: June 2016
DOI: https://doi.org/10.1007/s11704-015-5222-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimizing top-k retrieval: submodularity analysis and search strategies

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Optimal algorithms for selecting top-k combinations of attributes: theory and applications

Top-k List Aggregation: Mathematical Formulations and Polyhedral Comparisons

Compact Indexes for Flexible Top- $$k$$ Retrieval

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 309 KB.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now