Abstract
In this work, we targeted the search engine of a sports-related website that presented an opportunity for search result quality improvement. We reframed the engine as a Federated Search instance, where each collection represented a searchable entity type within the system, using Apache Solr for querying each resource and a Python Flask server to merge results. We extend previous work on individual search term weighing, making use of past search terms as a relevance indicator for user selected documents. To incorporate term weights we define four strategies combining two binary variables: integration with default relevance (linear scaling or linear combination) and search term frequency (raw value or log-smoothed). To evaluate our solution, we extracted two query sets from search logs: one with frequently submitted queries, and another with ambiguous result access patterns. We used click-through information as a relevance proxy and tried to mitigate its limitations by evaluating under distinct IR metrics, including MRR, MAP and NDCG. Moreover, we also measured Spearman rank correlation coefficients to test similarities between produced rankings and reference orderings according to user access patterns. Results show consistency across all metrics in both sets. Previous search terms were key to obtaining a higher effectiveness, with runs that used pure search term frequency performing best. Compared to the baseline, our best strategies were able to maintain quality on frequent queries and improve retrieval effectiveness on ambiguous queries, with up to \(\sim \)six percentage points better performance on most metrics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Arguello, J.: Federated search for heterogeneous environments. Ph.D. thesis, Carnegie Mellon University (2011)
Buckley, C., Voorhees, E.M.: Evaluating evaluation measure stability. SIGIR Forum 51(2), 235–242 (2017)
Callan, J.P., Lu, Z., Croft, W.B.: Searching distributed collections with inference networks. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1995, New York, NY, USA, pp. 21–28. ACM (1995)
Callan, J.: Distributed information retrieval. In: Advances in Information Retrieval, pp. 127–150. Kluwer Academic Publishers, Boston (2005)
Ding, C., Zhou, J.: Log-based indexing to improve website search. In: Proceedings of the 2007 ACM Symposium on Applied Computing - SAC 2007, New York, NY, USA, p. 829. ACM Press (2007)
Fagin, R., et al.: Searching the workplace web. In: Proceedings of the Twelfth International Conference on World Wide Web - WWW 2003, New York, NY, USA, p. 366. ACM Press (2003)
Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002)
Joachims, T.: Optimizing search engines using clickthrough data. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 2002, New York, NY, USA, p. 133. ACM Press (2002)
Kulkarni, A., Teevan, J., Svore, K.M., Dumais, S.T.: Understanding temporal query dynamics. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, WSDM 2011, New York, NY, USA, pp. 167–176. ACM (2011). https://doi.org/10.1145/1935826.1935862
Li, P.V., Thomas, P., Hawking, D.: Merging algorithms for enterprise search. In: Proceedings of the 18th Australasian Document Computing Symposium, ADCS 2013, New York, NY, USA, pp. 42–49. ACM (2013)
Liu, Y., Fu, Y., Zhang, M., Ma, S., Ru, L.: Automatic search engine performance evaluation with click-through data analysis. In: Proceedings of the 16th International Conference on World Wide Web, WWW 2007, New York, NY, USA, pp. 1133–1134. ACM (2007)
Oakes, M., Xu, Y.: A search engine based on query logs, and search log analysis by automatic language identification. In: Peters, C., et al. (eds.) CLEF 2009. LNCS, vol. 6241, pp. 526–533. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15754-7_64
Rasolofo, Y., Abbaci, F., Savoy, J.: Approaches to collection selection and results merging for distributed information retrieval. In: Proceedings of the Tenth International Conference on Information and Knowledge Management, CIKM 2001, New York, NY, USA, pp. 191–198. ACM (2001)
Shokouhi, M., Si, L.: Federated search. Found. Trends® Inf. Retr. 5(1), 1–102 (2011). http://dx.doi.org/10.1561/1500000010
Underwood, W.: Measuring search relevance with MRR (2016). https://observer.wunderwood.org/2016/09/12/measuring-search-relevance-with-mrr/. Accessed June 2022
Voorhees, E.M., Tice, D.M.: Building a question answering test collection. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2000, New York, NY, USA, pp. 200–207. ACM (2000)
Zhou, J., Ding, C., Androutsos, D.: Improving website search using web server logs. In: Proceedings of the 2006 Conference of the Center for Advanced Studies on Collaborative Research - CASCON 2006, New York, USA, p. 22. ACM Press (2006)
Zhu, H., Raghavan, S., Vaithyanathan, S., Löser, A.: Navigating the intranet with high precision. In: Proceedings of the 16th International Conference on World Wide Web - WWW 2007, New York, NY, USA, p. 491. ACM Press (2007)
Acknowledgements
This paper would have not been possible without the collaboration of the zerozero.pt team, who kindly provided us continuously refined search logs that were the foundation of the work developed. This work is financed by National Funds through the Portuguese funding agency, FCT - Fundação para a Ciência e a Tecnologia, within project LA/P/0063/2020.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Damas, J., Devezas, J., Nunes, S. (2022). Federated Search Using Query Log Evidence. In: Marreiros, G., Martins, B., Paiva, A., Ribeiro, B., Sardinha, A. (eds) Progress in Artificial Intelligence. EPIA 2022. Lecture Notes in Computer Science(), vol 13566. Springer, Cham. https://doi.org/10.1007/978-3-031-16474-3_64
Download citation
DOI: https://doi.org/10.1007/978-3-031-16474-3_64
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16473-6
Online ISBN: 978-3-031-16474-3
eBook Packages: Computer ScienceComputer Science (R0)