skip to main content
research-article

Fast Filtering of Search Results Sorted by Attribute

Published:24 November 2021Publication History
Skip Abstract Section

Abstract

Modern search services often provide multiple options to rank the search results, e.g., sort “by relevance”, “by price” or “by discount” in e-commerce. While the traditional rank by relevance effectively places the relevant results in the top positions of the results list, the rank by attribute could place many marginally relevant results in the head of the results list leading to poor user experience. In the past, this issue has been addressed by investigating the relevance-aware filtering problem, which asks to select the subset of results maximizing the relevance of the attribute-sorted list. Recently, an exact algorithm has been proposed to solve this problem optimally. However, the high computational cost of the algorithm makes it impractical for the Web search scenario, which is characterized by huge lists of results and strict time constraints. For this reason, the problem is often solved using efficient yet inaccurate heuristic algorithms. In this article, we first prove the performance bounds of the existing heuristics. We then propose two efficient and effective algorithms to solve the relevance-aware filtering problem. First, we propose OPT-Filtering, a novel exact algorithm that is faster than the existing state-of-the-art optimal algorithm. Second, we propose an approximate and even more efficient algorithm, ϵ-Filtering, which, given an allowed approximation error ϵ, finds a (1-ϵ)–optimal filtering, i.e., the relevance of its solution is at least (1-ϵ) times the optimum. We conduct a comprehensive evaluation of the two proposed algorithms against state-of-the-art competitors on two real-world public datasets. Experimental results show that OPT-Filtering achieves a significant speedup of up to two orders of magnitude with respect to the existing optimal solution, while ϵ-Filtering further improves this result by trading effectiveness for efficiency. In particular, experiments show that ϵ-Filtering can achieve quasi-optimal solutions while being faster than all state-of-the-art competitors in most of the tested configurations.

References

  1. Ioannis Arapakis, Xiao Bai, and Berkant Barla Cambazoglu. 2014. Impact of response latency on user behavior in web search. In Proceedings of the 37th International ACM Conference on Research and Development in Information Retrieval. 103–112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ricardo A. Baeza-Yates and Berthier A. Ribeiro-Neto. 2011. Modern Information Retrieval - the Concepts and technology behind Search, (2nd. ed.). Pearson Education Ltd., Harlow, England.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Berkant Barla Cambazoglu and Ricardo A. Baeza-Yates. 2011. Scalability challenges in web search engines. In Advanced Topics in Information Retrieval. Vol. 33. Springer, 27–50.Google ScholarGoogle Scholar
  4. David Carmel, Guy Halawi, Liane Lewin-Eytan, Yoelle Maarek, and Ariel Raviv. 2015. Rank by time or by relevance?: Revisiting email search. In Proceedings of the 24th ACM International Conference on Information and Knowledge Management. ACM, 283–292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. David Carmel, Liane Lewin-Eytan, Alex Libov, Yoelle Maarek, and Ariel Raviv. 2017. Promoting relevant results in time-ranked mail search. In Proceedings of the 26th International ACM Conference on World Wide Web. 1551–1559. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms, (3rd. ed). MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Dirk Van den Poel. 2012. Book review: Ensemble methods: Foundations and algorithms. IEEE Intelligent Informatics Bulletin 13, 1 (2012), 33–34.Google ScholarGoogle Scholar
  8. Esra Ilbahar and Selçuk Çebi. 2017. Classification of design parameters for E-commerce websites: A novel fuzzy kano approach. Telematics Informatics 34, 8 (2017), 1814–1825. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20, 4 (2002), 422–446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Myeongjae Jeon, Saehoon Kim, Seung-won Hwang, Yuxiong He, Sameh Elnikety, Alan L. Cox, and Scott Rixner. 2014. Predictive parallelization: Taming tail latencies in web search. In Proceedings of the 37th International ACM Conference on Research and Development in Information Retrieval. 253–262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Evangelos Kanoulas and Javed A. Aslam. 2009. Empirical justification of the gain and discount function for nDCG. In Proceedings of the 18th International ACM Conference on Information and Knowledge Management. 611–620. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ron Kohavi, Alex Deng, Brian Frasca, Toby Walker, Ya Xu, and Nils Pohlmann. 2013. Online controlled experiments at large scale. In Proceedings of the 19th International ACM Conference on Knowledge Discovery and Data Mining. 1168–1176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jimmy Lin, Yulu Wang, Miles Efron, and Garrick Sherman. 2014. Overview of the TREC-2014 microblog track. In Proceedings of the 23rd Text Retrieval Conference. 500-308.Google ScholarGoogle Scholar
  14. Tie-Yan Liu. 2009. Learning to rank for information retrieval. Foundations and Trends in Information Retrieval 3, 3 (2009), 225–331. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Julian J. McAuley, Christopher Targett, Qinfeng Shi, and Anton van den Hengel. 2015. Image-Based recommendations on styles and substitutes. In Proceedings of the 38th International ACM Conference on Research and Development in Information Retrieval. 43–52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Alistair Moffat and Justin Zobel. 2008. Rank-biased precision for measurement of retrieval effectiveness. ACM Transactions on Information Systems 27, 1 (2008), 2:1–2:27. DOI:https://doi.org/10.1145/1416950.1416952 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Franco Maria Nardini, Roberto Trani, and Rossano Venturini. 2019. Fast approximate filtering of search results sorted by attribute. In Proceedings of the 42nd International ACM Conference on Research and Development in Information Retrieval. 815–824. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Rajiv Pasricha and Julian J. McAuley. 2018. Translation-based factorization machines for sequential recommendation. In Proceedings of the 12th International ACM Conference on Recommender Systems. 63–71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Mr Biraj Patel and Dr Dipti Shah. 2011. Meta search ranking strategies. International Journal of Information and Computing Technology 976, 5999 (2011), 24–25.Google ScholarGoogle Scholar
  20. Tao Qin, Tie-Yan Liu, Jun Xu, and Hang Li. 2010. LETOR: A benchmark collection for research on learning to rank for information retrieval. Information Retrieval 13, 4 (2010), 346–374. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Nikita V. Spirin, Mikhail P. Kuznetsov, Julia Kiseleva, Yaroslav V. Spirin, and Pavel A. Izhutov. 2015. Relevance-aware filtering of tuples sorted by an attribute value via direct optimization of search quality metrics. In Proceedings of the 38th International ACM Conference on Research and Development in Information Retrieval. 979–982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Andrew Trotman, Surya Kallumadi, and Jon Degenhardt. 2018. High accuracy recall task. In Proceedings of the SIGIR 2018 Workshop on eCommerce Co-located with the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, Vol. 2319.Google ScholarGoogle Scholar
  23. Aleksandr Vorobev, Aleksei Ustimenko, Gleb Gusev, and Pavel Serdyukov. 2019. Learning to select for a predefined ranking. In Proceedings of the 36th International Conference on Machine Learning, Vol. 97. PMLR, 6477–6486.Google ScholarGoogle Scholar
  24. Yining Wang, Liwei Wang, Yuanzhi Li, Di He, and Tie-Yan Liu. 2013. A theoretical analysis of NDCG type ranking measures. In Proceedings of the 26th Annual JMLR Conference on Learning Theory. 25–54.Google ScholarGoogle Scholar

Index Terms

  1. Fast Filtering of Search Results Sorted by Attribute

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Information Systems
      ACM Transactions on Information Systems  Volume 40, Issue 2
      April 2022
      587 pages
      ISSN:1046-8188
      EISSN:1558-2868
      DOI:10.1145/3484931
      Issue’s Table of Contents

      Copyright © 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 24 November 2021
      • Accepted: 1 July 2021
      • Revised: 1 June 2021
      • Received: 1 December 2020
      Published in tois Volume 40, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed
    • Article Metrics

      • Downloads (Last 12 months)36
      • Downloads (Last 6 weeks)7

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format