Skip to main content

Federated Search Using Query Log Evidence

  • Conference paper
  • First Online:
Progress in Artificial Intelligence (EPIA 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13566))

Included in the following conference series:

Abstract

In this work, we targeted the search engine of a sports-related website that presented an opportunity for search result quality improvement. We reframed the engine as a Federated Search instance, where each collection represented a searchable entity type within the system, using Apache Solr for querying each resource and a Python Flask server to merge results. We extend previous work on individual search term weighing, making use of past search terms as a relevance indicator for user selected documents. To incorporate term weights we define four strategies combining two binary variables: integration with default relevance (linear scaling or linear combination) and search term frequency (raw value or log-smoothed). To evaluate our solution, we extracted two query sets from search logs: one with frequently submitted queries, and another with ambiguous result access patterns. We used click-through information as a relevance proxy and tried to mitigate its limitations by evaluating under distinct IR metrics, including MRR, MAP and NDCG. Moreover, we also measured Spearman rank correlation coefficients to test similarities between produced rankings and reference orderings according to user access patterns. Results show consistency across all metrics in both sets. Previous search terms were key to obtaining a higher effectiveness, with runs that used pure search term frequency performing best. Compared to the baseline, our best strategies were able to maintain quality on frequent queries and improve retrieval effectiveness on ambiguous queries, with up to \(\sim \)six percentage points better performance on most metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://lucidworks.com/post/solr-payloads/.

  2. 2.

    https://flask.palletsprojects.com/en/1.1.x/.

  3. 3.

    https://lucene.apache.org/solr/guide/8_4/the-extended-dismax-query-parser.html.

References

  1. Arguello, J.: Federated search for heterogeneous environments. Ph.D. thesis, Carnegie Mellon University (2011)

    Google Scholar 

  2. Buckley, C., Voorhees, E.M.: Evaluating evaluation measure stability. SIGIR Forum 51(2), 235–242 (2017)

    Article  Google Scholar 

  3. Callan, J.P., Lu, Z., Croft, W.B.: Searching distributed collections with inference networks. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1995, New York, NY, USA, pp. 21–28. ACM (1995)

    Google Scholar 

  4. Callan, J.: Distributed information retrieval. In: Advances in Information Retrieval, pp. 127–150. Kluwer Academic Publishers, Boston (2005)

    Google Scholar 

  5. Ding, C., Zhou, J.: Log-based indexing to improve website search. In: Proceedings of the 2007 ACM Symposium on Applied Computing - SAC 2007, New York, NY, USA, p. 829. ACM Press (2007)

    Google Scholar 

  6. Fagin, R., et al.: Searching the workplace web. In: Proceedings of the Twelfth International Conference on World Wide Web - WWW 2003, New York, NY, USA, p. 366. ACM Press (2003)

    Google Scholar 

  7. Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002)

    Article  Google Scholar 

  8. Joachims, T.: Optimizing search engines using clickthrough data. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 2002, New York, NY, USA, p. 133. ACM Press (2002)

    Google Scholar 

  9. Kulkarni, A., Teevan, J., Svore, K.M., Dumais, S.T.: Understanding temporal query dynamics. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, WSDM 2011, New York, NY, USA, pp. 167–176. ACM (2011). https://doi.org/10.1145/1935826.1935862

  10. Li, P.V., Thomas, P., Hawking, D.: Merging algorithms for enterprise search. In: Proceedings of the 18th Australasian Document Computing Symposium, ADCS 2013, New York, NY, USA, pp. 42–49. ACM (2013)

    Google Scholar 

  11. Liu, Y., Fu, Y., Zhang, M., Ma, S., Ru, L.: Automatic search engine performance evaluation with click-through data analysis. In: Proceedings of the 16th International Conference on World Wide Web, WWW 2007, New York, NY, USA, pp. 1133–1134. ACM (2007)

    Google Scholar 

  12. Oakes, M., Xu, Y.: A search engine based on query logs, and search log analysis by automatic language identification. In: Peters, C., et al. (eds.) CLEF 2009. LNCS, vol. 6241, pp. 526–533. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15754-7_64

    Chapter  Google Scholar 

  13. Rasolofo, Y., Abbaci, F., Savoy, J.: Approaches to collection selection and results merging for distributed information retrieval. In: Proceedings of the Tenth International Conference on Information and Knowledge Management, CIKM 2001, New York, NY, USA, pp. 191–198. ACM (2001)

    Google Scholar 

  14. Shokouhi, M., Si, L.: Federated search. Found. Trends® Inf. Retr. 5(1), 1–102 (2011). http://dx.doi.org/10.1561/1500000010

  15. Underwood, W.: Measuring search relevance with MRR (2016). https://observer.wunderwood.org/2016/09/12/measuring-search-relevance-with-mrr/. Accessed June 2022

  16. Voorhees, E.M., Tice, D.M.: Building a question answering test collection. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2000, New York, NY, USA, pp. 200–207. ACM (2000)

    Google Scholar 

  17. Zhou, J., Ding, C., Androutsos, D.: Improving website search using web server logs. In: Proceedings of the 2006 Conference of the Center for Advanced Studies on Collaborative Research - CASCON 2006, New York, USA, p. 22. ACM Press (2006)

    Google Scholar 

  18. Zhu, H., Raghavan, S., Vaithyanathan, S., Löser, A.: Navigating the intranet with high precision. In: Proceedings of the 16th International Conference on World Wide Web - WWW 2007, New York, NY, USA, p. 491. ACM Press (2007)

    Google Scholar 

Download references

Acknowledgements

This paper would have not been possible without the collaboration of the zerozero.pt team, who kindly provided us continuously refined search logs that were the foundation of the work developed. This work is financed by National Funds through the Portuguese funding agency, FCT - Fundação para a Ciência e a Tecnologia, within project LA/P/0063/2020.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sérgio Nunes .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Damas, J., Devezas, J., Nunes, S. (2022). Federated Search Using Query Log Evidence. In: Marreiros, G., Martins, B., Paiva, A., Ribeiro, B., Sardinha, A. (eds) Progress in Artificial Intelligence. EPIA 2022. Lecture Notes in Computer Science(), vol 13566. Springer, Cham. https://doi.org/10.1007/978-3-031-16474-3_64

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-16474-3_64

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-16473-6

  • Online ISBN: 978-3-031-16474-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics