Abstract
Modern search services often provide multiple options to rank the search results, e.g., sort “by relevance”, “by price” or “by discount” in e-commerce. While the traditional rank by relevance effectively places the relevant results in the top positions of the results list, the rank by attribute could place many marginally relevant results in the head of the results list leading to poor user experience. In the past, this issue has been addressed by investigating the relevance-aware filtering problem, which asks to select the subset of results maximizing the relevance of the attribute-sorted list. Recently, an exact algorithm has been proposed to solve this problem optimally. However, the high computational cost of the algorithm makes it impractical for the Web search scenario, which is characterized by huge lists of results and strict time constraints. For this reason, the problem is often solved using efficient yet inaccurate heuristic algorithms. In this article, we first prove the performance bounds of the existing heuristics. We then propose two efficient and effective algorithms to solve the relevance-aware filtering problem. First, we propose OPT-Filtering, a novel exact algorithm that is faster than the existing state-of-the-art optimal algorithm. Second, we propose an approximate and even more efficient algorithm, ϵ-Filtering, which, given an allowed approximation error ϵ, finds a (1-ϵ)–optimal filtering, i.e., the relevance of its solution is at least (1-ϵ) times the optimum. We conduct a comprehensive evaluation of the two proposed algorithms against state-of-the-art competitors on two real-world public datasets. Experimental results show that OPT-Filtering achieves a significant speedup of up to two orders of magnitude with respect to the existing optimal solution, while ϵ-Filtering further improves this result by trading effectiveness for efficiency. In particular, experiments show that ϵ-Filtering can achieve quasi-optimal solutions while being faster than all state-of-the-art competitors in most of the tested configurations.
- Ioannis Arapakis, Xiao Bai, and Berkant Barla Cambazoglu. 2014. Impact of response latency on user behavior in web search. In Proceedings of the 37th International ACM Conference on Research and Development in Information Retrieval. 103–112. Google ScholarDigital Library
- Ricardo A. Baeza-Yates and Berthier A. Ribeiro-Neto. 2011. Modern Information Retrieval - the Concepts and technology behind Search, (2nd. ed.). Pearson Education Ltd., Harlow, England.Google ScholarDigital Library
- Berkant Barla Cambazoglu and Ricardo A. Baeza-Yates. 2011. Scalability challenges in web search engines. In Advanced Topics in Information Retrieval. Vol. 33. Springer, 27–50.Google Scholar
- David Carmel, Guy Halawi, Liane Lewin-Eytan, Yoelle Maarek, and Ariel Raviv. 2015. Rank by time or by relevance?: Revisiting email search. In Proceedings of the 24th ACM International Conference on Information and Knowledge Management. ACM, 283–292. Google ScholarDigital Library
- David Carmel, Liane Lewin-Eytan, Alex Libov, Yoelle Maarek, and Ariel Raviv. 2017. Promoting relevant results in time-ranked mail search. In Proceedings of the 26th International ACM Conference on World Wide Web. 1551–1559. Google ScholarDigital Library
- Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms, (3rd. ed). MIT Press. Google ScholarDigital Library
- Dirk Van den Poel. 2012. Book review: Ensemble methods: Foundations and algorithms. IEEE Intelligent Informatics Bulletin 13, 1 (2012), 33–34.Google Scholar
- Esra Ilbahar and Selçuk Çebi. 2017. Classification of design parameters for E-commerce websites: A novel fuzzy kano approach. Telematics Informatics 34, 8 (2017), 1814–1825. Google ScholarDigital Library
- Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20, 4 (2002), 422–446. Google ScholarDigital Library
- Myeongjae Jeon, Saehoon Kim, Seung-won Hwang, Yuxiong He, Sameh Elnikety, Alan L. Cox, and Scott Rixner. 2014. Predictive parallelization: Taming tail latencies in web search. In Proceedings of the 37th International ACM Conference on Research and Development in Information Retrieval. 253–262. Google ScholarDigital Library
- Evangelos Kanoulas and Javed A. Aslam. 2009. Empirical justification of the gain and discount function for nDCG. In Proceedings of the 18th International ACM Conference on Information and Knowledge Management. 611–620. Google ScholarDigital Library
- Ron Kohavi, Alex Deng, Brian Frasca, Toby Walker, Ya Xu, and Nils Pohlmann. 2013. Online controlled experiments at large scale. In Proceedings of the 19th International ACM Conference on Knowledge Discovery and Data Mining. 1168–1176. Google ScholarDigital Library
- Jimmy Lin, Yulu Wang, Miles Efron, and Garrick Sherman. 2014. Overview of the TREC-2014 microblog track. In Proceedings of the 23rd Text Retrieval Conference. 500-308.Google Scholar
- Tie-Yan Liu. 2009. Learning to rank for information retrieval. Foundations and Trends in Information Retrieval 3, 3 (2009), 225–331. Google ScholarDigital Library
- Julian J. McAuley, Christopher Targett, Qinfeng Shi, and Anton van den Hengel. 2015. Image-Based recommendations on styles and substitutes. In Proceedings of the 38th International ACM Conference on Research and Development in Information Retrieval. 43–52. Google ScholarDigital Library
- Alistair Moffat and Justin Zobel. 2008. Rank-biased precision for measurement of retrieval effectiveness. ACM Transactions on Information Systems 27, 1 (2008), 2:1–2:27. DOI:https://doi.org/10.1145/1416950.1416952 Google ScholarDigital Library
- Franco Maria Nardini, Roberto Trani, and Rossano Venturini. 2019. Fast approximate filtering of search results sorted by attribute. In Proceedings of the 42nd International ACM Conference on Research and Development in Information Retrieval. 815–824. Google ScholarDigital Library
- Rajiv Pasricha and Julian J. McAuley. 2018. Translation-based factorization machines for sequential recommendation. In Proceedings of the 12th International ACM Conference on Recommender Systems. 63–71. Google ScholarDigital Library
- Mr Biraj Patel and Dr Dipti Shah. 2011. Meta search ranking strategies. International Journal of Information and Computing Technology 976, 5999 (2011), 24–25.Google Scholar
- Tao Qin, Tie-Yan Liu, Jun Xu, and Hang Li. 2010. LETOR: A benchmark collection for research on learning to rank for information retrieval. Information Retrieval 13, 4 (2010), 346–374. Google ScholarDigital Library
- Nikita V. Spirin, Mikhail P. Kuznetsov, Julia Kiseleva, Yaroslav V. Spirin, and Pavel A. Izhutov. 2015. Relevance-aware filtering of tuples sorted by an attribute value via direct optimization of search quality metrics. In Proceedings of the 38th International ACM Conference on Research and Development in Information Retrieval. 979–982. Google ScholarDigital Library
- Andrew Trotman, Surya Kallumadi, and Jon Degenhardt. 2018. High accuracy recall task. In Proceedings of the SIGIR 2018 Workshop on eCommerce Co-located with the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, Vol. 2319.Google Scholar
- Aleksandr Vorobev, Aleksei Ustimenko, Gleb Gusev, and Pavel Serdyukov. 2019. Learning to select for a predefined ranking. In Proceedings of the 36th International Conference on Machine Learning, Vol. 97. PMLR, 6477–6486.Google Scholar
- Yining Wang, Liwei Wang, Yuanzhi Li, Di He, and Tie-Yan Liu. 2013. A theoretical analysis of NDCG type ranking measures. In Proceedings of the 26th Annual JMLR Conference on Learning Theory. 25–54.Google Scholar
Index Terms
- Fast Filtering of Search Results Sorted by Attribute
Recommendations
Fast Approximate Filtering of Search Results Sorted by Attribute
SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information RetrievalSeveral Web search services enable their users with the possibility of sorting the list of results by a specific attribute, e.g., sort "by price" in e-commerce. However, sorting the results by attribute could bring marginally relevant results in the top ...
Re-ranking search results using query logs
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge managementThis work addresses two common problems in search, frequently occurring with underspecified user queries: the top-ranked results for such queries may not contain documents relevant to the user's search intent, and fresh and relevant pages may not get ...
A study of results overlap and uniqueness among major web search engines
The performance and capabilities of Web search engines is an important and significant area of research. Millions of people world wide use Web search engines very day. This paper reports the results of a major study examining the overlap among results ...
Comments