skip to main content
research-article

Fast Filtering of Search Results Sorted by Attribute

Published: 24 November 2021 Publication History

Abstract

Modern search services often provide multiple options to rank the search results, e.g., sort “by relevance”, “by price” or “by discount” in e-commerce. While the traditional rank by relevance effectively places the relevant results in the top positions of the results list, the rank by attribute could place many marginally relevant results in the head of the results list leading to poor user experience. In the past, this issue has been addressed by investigating the relevance-aware filtering problem, which asks to select the subset of results maximizing the relevance of the attribute-sorted list. Recently, an exact algorithm has been proposed to solve this problem optimally. However, the high computational cost of the algorithm makes it impractical for the Web search scenario, which is characterized by huge lists of results and strict time constraints. For this reason, the problem is often solved using efficient yet inaccurate heuristic algorithms. In this article, we first prove the performance bounds of the existing heuristics. We then propose two efficient and effective algorithms to solve the relevance-aware filtering problem. First, we propose OPT-Filtering, a novel exact algorithm that is faster than the existing state-of-the-art optimal algorithm. Second, we propose an approximate and even more efficient algorithm, ϵ-Filtering, which, given an allowed approximation error ϵ, finds a (1-ϵ)–optimal filtering, i.e., the relevance of its solution is at least (1-ϵ) times the optimum. We conduct a comprehensive evaluation of the two proposed algorithms against state-of-the-art competitors on two real-world public datasets. Experimental results show that OPT-Filtering achieves a significant speedup of up to two orders of magnitude with respect to the existing optimal solution, while ϵ-Filtering further improves this result by trading effectiveness for efficiency. In particular, experiments show that ϵ-Filtering can achieve quasi-optimal solutions while being faster than all state-of-the-art competitors in most of the tested configurations.

References

[1]
Ioannis Arapakis, Xiao Bai, and Berkant Barla Cambazoglu. 2014. Impact of response latency on user behavior in web search. In Proceedings of the 37th International ACM Conference on Research and Development in Information Retrieval. 103–112.
[2]
Ricardo A. Baeza-Yates and Berthier A. Ribeiro-Neto. 2011. Modern Information Retrieval - the Concepts and technology behind Search, (2nd. ed.). Pearson Education Ltd., Harlow, England.
[3]
Berkant Barla Cambazoglu and Ricardo A. Baeza-Yates. 2011. Scalability challenges in web search engines. In Advanced Topics in Information Retrieval. Vol. 33. Springer, 27–50.
[4]
David Carmel, Guy Halawi, Liane Lewin-Eytan, Yoelle Maarek, and Ariel Raviv. 2015. Rank by time or by relevance?: Revisiting email search. In Proceedings of the 24th ACM International Conference on Information and Knowledge Management. ACM, 283–292.
[5]
David Carmel, Liane Lewin-Eytan, Alex Libov, Yoelle Maarek, and Ariel Raviv. 2017. Promoting relevant results in time-ranked mail search. In Proceedings of the 26th International ACM Conference on World Wide Web. 1551–1559.
[6]
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms, (3rd. ed). MIT Press.
[7]
Dirk Van den Poel. 2012. Book review: Ensemble methods: Foundations and algorithms. IEEE Intelligent Informatics Bulletin 13, 1 (2012), 33–34.
[8]
Esra Ilbahar and Selçuk Çebi. 2017. Classification of design parameters for E-commerce websites: A novel fuzzy kano approach. Telematics Informatics 34, 8 (2017), 1814–1825.
[9]
Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20, 4 (2002), 422–446.
[10]
Myeongjae Jeon, Saehoon Kim, Seung-won Hwang, Yuxiong He, Sameh Elnikety, Alan L. Cox, and Scott Rixner. 2014. Predictive parallelization: Taming tail latencies in web search. In Proceedings of the 37th International ACM Conference on Research and Development in Information Retrieval. 253–262.
[11]
Evangelos Kanoulas and Javed A. Aslam. 2009. Empirical justification of the gain and discount function for nDCG. In Proceedings of the 18th International ACM Conference on Information and Knowledge Management. 611–620.
[12]
Ron Kohavi, Alex Deng, Brian Frasca, Toby Walker, Ya Xu, and Nils Pohlmann. 2013. Online controlled experiments at large scale. In Proceedings of the 19th International ACM Conference on Knowledge Discovery and Data Mining. 1168–1176.
[13]
Jimmy Lin, Yulu Wang, Miles Efron, and Garrick Sherman. 2014. Overview of the TREC-2014 microblog track. In Proceedings of the 23rd Text Retrieval Conference. 500-308.
[14]
Tie-Yan Liu. 2009. Learning to rank for information retrieval. Foundations and Trends in Information Retrieval 3, 3 (2009), 225–331.
[15]
Julian J. McAuley, Christopher Targett, Qinfeng Shi, and Anton van den Hengel. 2015. Image-Based recommendations on styles and substitutes. In Proceedings of the 38th International ACM Conference on Research and Development in Information Retrieval. 43–52.
[16]
Alistair Moffat and Justin Zobel. 2008. Rank-biased precision for measurement of retrieval effectiveness. ACM Transactions on Information Systems 27, 1 (2008), 2:1–2:27.
[17]
Franco Maria Nardini, Roberto Trani, and Rossano Venturini. 2019. Fast approximate filtering of search results sorted by attribute. In Proceedings of the 42nd International ACM Conference on Research and Development in Information Retrieval. 815–824.
[18]
Rajiv Pasricha and Julian J. McAuley. 2018. Translation-based factorization machines for sequential recommendation. In Proceedings of the 12th International ACM Conference on Recommender Systems. 63–71.
[19]
Mr Biraj Patel and Dr Dipti Shah. 2011. Meta search ranking strategies. International Journal of Information and Computing Technology 976, 5999 (2011), 24–25.
[20]
Tao Qin, Tie-Yan Liu, Jun Xu, and Hang Li. 2010. LETOR: A benchmark collection for research on learning to rank for information retrieval. Information Retrieval 13, 4 (2010), 346–374.
[21]
Nikita V. Spirin, Mikhail P. Kuznetsov, Julia Kiseleva, Yaroslav V. Spirin, and Pavel A. Izhutov. 2015. Relevance-aware filtering of tuples sorted by an attribute value via direct optimization of search quality metrics. In Proceedings of the 38th International ACM Conference on Research and Development in Information Retrieval. 979–982.
[22]
Andrew Trotman, Surya Kallumadi, and Jon Degenhardt. 2018. High accuracy recall task. In Proceedings of the SIGIR 2018 Workshop on eCommerce Co-located with the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, Vol. 2319.
[23]
Aleksandr Vorobev, Aleksei Ustimenko, Gleb Gusev, and Pavel Serdyukov. 2019. Learning to select for a predefined ranking. In Proceedings of the 36th International Conference on Machine Learning, Vol. 97. PMLR, 6477–6486.
[24]
Yining Wang, Liwei Wang, Yuanzhi Li, Di He, and Tie-Yan Liu. 2013. A theoretical analysis of NDCG type ranking measures. In Proceedings of the 26th Annual JMLR Conference on Learning Theory. 25–54.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Information Systems
ACM Transactions on Information Systems  Volume 40, Issue 2
April 2022
587 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/3484931
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 November 2021
Accepted: 01 July 2021
Revised: 01 June 2021
Received: 01 December 2020
Published in TOIS Volume 40, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Relevance-aware filtering
  2. filtering algorithms
  3. approximation algorithms
  4. efficiency-effectiveness trade-offs

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • “Algorithms, Data Structures and Combinatorics for Machine Learning”
  • OK-INSAID
  • MIUR-PON 2018

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 156
    Total Downloads
  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)1
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media