Improving BERT-based Query-by-Document Retrieval with Multi-task Optimization

Abolghasemi, Amin; Verberne, Suzan; Azzopardi, Leif

doi:10.1007/978-3-030-99739-7_1

Amin Abolghasemi¹⁵,
Suzan Verberne¹⁵ &
Leif Azzopardi¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13186))

Included in the following conference series:

European Conference on Information Retrieval

3141 Accesses
14 Citations

Abstract

Query-by-document (QBD) retrieval is an Information Retrieval task in which a seed document acts as the query and the goal is to retrieve related documents – it is particular common in professional search tasks. In this work we improve the retrieval effectiveness of the BERT re-ranker, proposing an extension to its fine-tuning step to better exploit the context of queries. To this end, we use an additional document-level representation learning objective besides the ranking objective when fine-tuning the BERT re-ranker. Our experiments on two QBD retrieval benchmarks show that the proposed multi-task optimization significantly improves the ranking effectiveness without changing the BERT re-ranker or using additional training samples. In future work, the generalizability of our approach to other retrieval tasks should be further investigated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Learning Query-Space Document Representations for High-Recall Retrieval

Incorporating Ranking Context for End-to-End BERT Re-ranking

Multi-stage enhanced representation learning for document reranking based on query view

Article 21 August 2024

Notes

1.
https://github.com/elastic/elasticsearch.
2.
Implementation from https://github.com/suzanv/termprofiling/.
3.
https://github.com/allenai/specter.

References

Ahmad, W.U., Chang, K.W., Wang, H.: Multi-task learning for document ranking and query suggestion. In: International Conference on Learning Representations (2018)
Google Scholar
Althammer, S., Hofstätter, S., Sertkan, M., Verberne, S., Hanbury, A.: Paragraph aggregation retrieval model (parm) for dense document-to-document retrieval. In: Advances in Information Retrieval, 44rd European Conference on IR Research, ECIR 2022 (2022)
Google Scholar
Askari, A., Verberne, S.: Combining lexical and neural retrieval with longformer-based summarization for effective case law retrieval. In: DESIRES (2021)
Google Scholar
Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3615–3620. Association for Computational Linguistics, Hong Kong (2019). https://doi.org/10.18653/v1/D19-1371, https://aclanthology.org/D19-1371
Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., Hullender, G.: Learning to rank using gradient descent. In: Proceedings of the 22nd international conference on Machine learning - ICML 2005, pp. 89–96. ACM Press, Bonn (2005). https://doi.org/10.1145/1102351.1102363, http://portal.acm.org/citation.cfm?doid=1102351.1102363
Cao, Z., Qin, T., Liu, T.Y., Tsai, M.F., Li, H.: Learning to rank: from pairwise approach to listwise approach. In: Proceedings of the 24th International Conference on Machine Learning, pp. 129–136 (2007)
Google Scholar
Chalkidis, I., Fergadiotis, M., Malakasiotis, P., Aletras, N., Androutsopoulos, I.: LEGAL-BERT: the muppets straight out of law school. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 2898–2904. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.261, https://aclanthology.org/2020.findings-emnlp.261
Cheng, Q., Ren, Z., Lin, Y., Ren, P., Chen, Z., Liu, X., de Rijke, M.D.: Long short-term session search: joint personalized reranking and next query prediction. In: Proceedings of the Web Conference 2021, pp. 239–248 (2021)
Google Scholar
Cohan, A., Feldman, S., Beltagy, I., Downey, D., Weld, D.: SPECTER: document-level representation learning using citation-informed transformers. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 2270–2282. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.acl-main.207, https://www.aclweb.org/anthology/2020.acl-main.207
Dai, Z., Callan, J.: Deeper text understanding for IR with contextual neural language modeling. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 985–988 (2019)
Google Scholar
Dai, Z., Callan, J.: Context-aware term weighting for first stage passage retrieval. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1533–1536 (2020)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis (2019). https://doi.org/10.18653/v1/N19-1423, https://aclanthology.org/N19-1423
Fujii, A., Iwayama, M., Kando, N.: Overview of the patent retrieval task at the ntcir-6 workshop. In: NTCIR (2007)
Google Scholar
Guo, J., Fan, Y., Ai, Q., Croft, W.B.: A deep relevance matching model for ad-hoc retrieval. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 55–64 (2016)
Google Scholar
Humeau, S., Shuster, K., Lachaux, M.A., Weston, J.: Poly-encoders: architectures and pre-training strategies for fast and accurate multi-sentence scoring. In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=SkxgnnNFvH
Huston, S., Croft, W.B.: Evaluating verbose query processing techniques. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 291–298 (2010)
Google Scholar
Kongyoung, S., Macdonald, C., Ounis, I.: Multi-task learning using dynamic task weighting for conversational question answering. In: Proceedings of the 5th International Workshop on Search-Oriented Conversational AI (SCAI), pp. 17–26 (2020)
Google Scholar
Lin, J., Nogueira, R., Yates, A.: Pretrained transformers for text ranking: bert and beyond (2021)
Google Scholar
Liu, S., Liang, Y., Gitter, A.: Loss-balanced task weighting to reduce negative transfer in multi-task learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 9977–9978 (2019)
Google Scholar
Liu, X., He, P., Chen, W., Gao, J.: Multi-task deep neural networks for natural language understanding. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4487–4496. Association for Computational Linguistics, Florence (2019). https://doi.org/10.18653/v1/P19-1441, https://aclanthology.org/P19-1441
Locke, D., Zuccon, G., Scells, H.: Automatic query generation from legal texts for case law retrieval. In: Asia Information Retrieval Symposium, pp. 181–193. Springer (2017). https://doi.org/10.1007/978-3-319-70145-5_14
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=Bkg6RiCqY7
Ma, Y., Shao, Y., Liu, B., Liu, Y., Zhang, M., Ma, S.: Retrieving legal cases from a large-scale candidate corpus. In: Proceedings of the Eighth International Competition on Legal Information Extraction/Entailment, COLIEE2021 (2021)
Google Scholar
MacAvaney, S., Yates, A., Cohan, A., Goharian, N.: Cedr: contextualized embeddings for document ranking. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1101–1104 (2019)
Google Scholar
Mysore, S., O’Gorman, T., McCallum, A., Zamani, H.: Csfcube-a test collection of computer science research articles for faceted query by example. arXiv preprint arXiv:2103.12906 (2021)
Nogueira, R., Cho, K.: Passage re-ranking with bert. arXiv preprint arXiv:1901.04085 (2019)
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’ Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32, pp. 8024–8035. Curran Associates, Inc. (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
Piroi, F., Hanbury, A.: Multilingual patent text retrieval evaluation: CLEF–IP. In: Information Retrieval Evaluation in a Changing World. TIRS, vol. 41, pp. 365–387. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-22948-1_15
Chapter Google Scholar
Qu, C., Yang, L., Chen, C., Qiu, M., Croft, W.B., Iyyer, M.: Open-retrieval conversational question answering. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 539–548 (2020)
Google Scholar
Rabelo, J., Kim, M.-Y., Goebel, R., Yoshioka, M., Kano, Y., Satoh, K.: COLIEE 2020: methods for legal document retrieval and entailment. In: Okazaki, N., Yada, K., Satoh, K., Mineshima, K. (eds.) JSAI-isAI 2020. LNCS (LNAI), vol. 12758, pp. 196–210. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-79942-7_13
Chapter Google Scholar
Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3982–3992. Association for Computational Linguistics, Hong Kong (2019). https://doi.org/10.18653/v1/D19-1410, https://aclanthology.org/D19-1410
Rosa, G.M., Rodrigues, R.C., Lotufo, R., Nogueira, R.: Yes, bm25 is a strong baseline for legal case retrieval. arXiv preprint arXiv:2105.05686 (2021)
Russell-Rose, T., Chamberlain, J., Azzopardi, L.: Information retrieval in the workplace: a comparison of professional search practices. Inf. Process. Manag. 54(6), 1042–1057 (2018)
Article Google Scholar
Shao, Y., et al.: Bert-pli: modeling paragraph-level interactions for legal case retrieval. In: Bessiere, C. (ed.) Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, pp. 3501–3507. International Joint Conferences on Artificial Intelligence Organization (2020). https://doi.org/10.24963/ijcai.2020/484
Verberne, S., et al.: First international workshop on professional search. In: ACM SIGIR Forum, vol. 52, pp. 153–162. ACM, New York (2019)
Google Scholar
Wolf, T., et al.: Transformers: state-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45. Association for Computational Linguistics (2020). https://www.aclweb.org/anthology/2020.emnlp-demos.6
Yang, E., Lewis, D.D., Frieder, O., Grossman, D.A., Yurchak, R.: Retrieval and richness when querying by document. In: DESIRES, pp. 68–75 (2018)
Google Scholar
Yang, Y., Bansal, N., Dakka, W., Ipeirotis, P., Koudas, N., Papadias, D.: Query by document. In: Proceedings of the Second ACM International Conference on Web Search and Data Mining, pp. 34–43 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Leiden University, Leiden, The Netherlands
Amin Abolghasemi & Suzan Verberne
University of Strathclyde, Glasgow, UK
Leif Azzopardi

Authors

Amin Abolghasemi
View author publications
You can also search for this author in PubMed Google Scholar
Suzan Verberne
View author publications
You can also search for this author in PubMed Google Scholar
Leif Azzopardi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amin Abolghasemi .

Editor information

Editors and Affiliations

Martin Luther University Halle-Wittenberg, Halle, Germany
Matthias Hagen
Leiden University, Leiden, The Netherlands
Suzan Verberne
University of Glasgow, Glasgow, UK
Craig Macdonald
University of Duisburg-Essen, Essen, Germany
Christin Seifert
University of Stavanger, Stavanger, Norway
Krisztian Balog
Norwegian University of Science and Technology, Trondheim, Norway
Kjetil Nørvåg
University of Stavanger, Stavanger, Norway
Vinay Setty

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Abolghasemi, A., Verberne, S., Azzopardi, L. (2022). Improving BERT-based Query-by-Document Retrieval with Multi-task Optimization. In: Hagen, M., et al. Advances in Information Retrieval. ECIR 2022. Lecture Notes in Computer Science, vol 13186. Springer, Cham. https://doi.org/10.1007/978-3-030-99739-7_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-99739-7_1
Published: 05 April 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-99738-0
Online ISBN: 978-3-030-99739-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improving BERT-based Query-by-Document Retrieval with Multi-task Optimization

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Learning Query-Space Document Representations for High-Recall Retrieval

Incorporating Ranking Context for End-to-End BERT Re-ranking

Multi-stage enhanced representation learning for document reranking based on query view

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Improving BERT-based Query-by-Document Retrieval with Multi-task Optimization

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Learning Query-Space Document Representations for High-Recall Retrieval

Incorporating Ranking Context for End-to-End BERT Re-ranking

Multi-stage enhanced representation learning for document reranking based on query view

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation