Abstract
For a given query, the objective of Cross-lingual Passage Re-ranking (XPR) is to rank a list of candidate passages in multiple languages, where only a portion of the passages are in the query’s language. Multilingual BERT (mBERT) is often used for the XPR task and achieves impressive performance. Nevertheless, there still exist two essential issues to be addressed in mBERT, including the performance gap between high- and low-resource languages, and the lack of explicit embedding distribution alignment. Regarding each language as a separated domain, we theoretically explore how these problems lead to errors in XPR under the guidance of domain adaptation. Based on the theoretical analysis, we propose a novel framework that comprises two modules, namely knowledge distillation and adversarial learning. The former enables the knowledge to be transferred from high-resource languages to low-resource ones, narrowing their performance gap. The latter encourages mBERT to align the embedding distributions across different languages by utilizing a novel language-distinguish task and adversarial training. Extensive experiments on in-domain and out-domain datasets confirm the effectiveness and robustness of the proposed framework and show that it can outperform state-of-the-art methods.
Similar content being viewed by others
Data availibility statement
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Nogueira R, Cho K (2019) Passage re-ranking with BERT. arXiv preprint arXiv:1901.04085
Hao T, Li X, He Y, Wang FL, Qu Y (2022) Recent progress in leveraging deep learning methods for question answering. Neural Comput Appl 1–19
Etezadi R, Shamsfard M (2022) The state of the art in open domain complex question answering: a survey. Appl Intell 1–21
Li R, Wang L, Jiang Z, Hu Z, Zhao M, Lu X (2022) Mutually improved dense retriever and GNN-based reader for arbitrary-hop open-domain question answering. Neural Comput Appl 1–21
Chen D, Zhang S, Zhang X, Yang K (2020) Cross-lingual passage re-ranking with alignment augmented multilingual BERT. IEEE Access 8:213232–213243
Roy U, Constant N, Al-Rfou R, Barua A, Phillips A, Yang Y (2020) LAReQA: Language-agnostic answer retrieval from a multilingual pool. In: Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pp 5919–5930
Asai A, Kasai J, Clark JH, Lee K, Choi E, Hajishirzi H (2021) XOR QA: cross-lingual open-retrieval question answering. In: Proceedings of the 2021 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 547–564
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American chapter of the association for computational linguistics: human language technologies, pp 4171–4186
Yang Z, Yang Y, Cer D, Darve E (2021) A simple and effective method to eliminate the self language bias in multilingual representations. In: Proceedings of the 2021 conference on empirical methods in natural language processing (EMNLP), pp 5825–5832
Kassner N, Dufter P, Schütze H (2021) Multilingual LAMA: investigating knowledge in multilingual pretrained language models. In: Proceedings of the 16th conference of the European chapter of the association for computational linguistics: main volume, pp 3250–3258
Besacier L, Barnard E, Karpov A, Schultz T (2014) Automatic speech recognition for under-resourced languages: a survey. Speech Commun 56:85–100
Choudhury M, Deshpande A (2021) How linguistically fair are multilingual pre-trained language models?. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 12710–12718
Wu S, Dredze M (2020) Are all languages created equal in multilingual BERT?. In: Proceedings of the 5th workshop on representation learning for NLP, pp 120–130
Ahn J, Oh A (2021) Mitigating language-dependent ethnic bias in BERT. In: Proceedings of the 2021 conference on empirical methods in natural language processing (EMNLP), pp 533–549
Novak E, Bizjak L, Mladenić D, Grobelnik M (2022) Why is a document relevant? Understanding the relevance scores in cross-lingual document retrieval. Knowl Based Syst 244:108545
Ulčar M, Robnik-Šikonja M (2022) Cross-lingual alignments of ELMo contextual embeddings. Neural Comput Appl 1–19
Minutolo A, Guarasci R, Damiano E, De Pietro G, Fujita H, Esposito M (2022) A multi-level methodology for the automated translation of a coreference resolution dataset: an application to the italian language. Neural Comput Appl 1–26
Amara A, Hadj Taieb MA, Ben Aouicha M (2021) Multilingual topic modeling for tracking COVID-19 trends based on Facebook data analysis. Appl Intell 51(5):3052–3073
Hull DA, Grefenstette G (1996) Querying across languages: a dictionary-based approach to multilingual information retrieval. In: Proceedings of the 19th annual international ACM SIGIR conference on research and development in information retrieval, pp 49–57
Ghanbari E, Shakery A (2022) A learning to rank framework based on cross-lingual loss function for cross-lingual information retrieval. Appl Intell 52(3):3156–3174
Zweigenbaum P, Sharoff S, Rapp R (2017) Overview of the second BUCC shared task: Spotting parallel sentences in comparable corpora. In: Proceedings of the 10th workshop on building and using comparable corpora, pp 60–67
Reimers N, Gurevych I (2020) Making monolingual sentence embeddings multilingual using knowledge distillation. In: Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pp 4512–4525
Liu F, Vulić I, Korhonen A, Collier N (2021) Fast, effective, and self-supervised: transforming masked language models into universal lexical and sentence encoders. In: Proceedings of the 2021 conference on empirical methods in natural language processing, pp 1442–1459
Wang K, Thakur N, Reimers N, Gurevych I (2022) GPL: generative pseudo labeling for unsupervised domain adaptation of dense retrieval. In: Proceedings of the 2022 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 2345–2360
Buciluǎ C, Caruana R, Niculescu-Mizil A (2006) Model compression. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 535–541
Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531
Tang R, Lu Y, Liu L, Mou L, Vechtomova O, Lin J (2019) Distilling task-specific knowledge from BERT into simple neural networks. arXiv preprint arXiv:1903.12136
Ma X, Shen Y, Fang G, Chen C, Jia C, Lu W (2020) Adversarial self-supervised data free distillation for text classification. In: Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pp. 6182–6192
Park S, Kwak N (2019) Feed: Feature-level ensemble for knowledge distillation. arXiv preprint arXiv:1909.10754
He W, Yang M, Yan R, Li C, Shen Y, Xu R (2020) Amalgamating knowledge from two teachers for task-oriented dialogue system with adversarial training. In: Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pp 3498–3507
Wu Q, Lin Z, Karlsson B, Lou J-G, Huang B (2020) Single-/multi-source cross-lingual NER via teacher-student learning on unlabeled data in target language. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 6505–6514
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Proceedings of the advances in neural information processing systems (NIPS), pp 2672–2680
Ganin Y, Lempitsky V (2015) Unsupervised domain adaptation by backpropagation. In: Proceedings of the 32nd international conference on machine learning, pp 1180–1189
Miyato T, Dai AM, Goodfellow I (2016) Adversarial training methods for semi-supervised text classification. arXiv preprint arXiv:1605.07725
Qi K, Du J (2020) Translation-based matching adversarial network for cross-lingual natural language inference. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 8632–8639
Li B, Du X, Chen M (2020) Cross-language question retrieval with multi-layer representation and layer-wise adversary. Inf Sci 527:241–252
Keung P, Lu Y, Bhardwaj V (2019) Adversarial learning with contextual embeddings for zero-resource cross-lingual classification and NER. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 1355–1360
Chen M, Xu Z, Weinberger KQ, Sha F (2012) Marginalized denoising autoencoders for domain adaptation. In: Proceedings of the 29th international coference on machine learning, pp 1627–1634
Wang R, Zhang Z, Zhuang F, Gao D, Wei Y, He Q (2021) Adversarial domain adaptation for cross-lingual information retrieval with multilingual BERT. In: Proceedings of the 30th ACM international conference on information and knowledge management, pp 3498–3502
Ben-David S, Blitzer J, Crammer K, Kulesza A, Pereira F, Vaughan JW (2010) A theory of learning from different domains. Mach Learn 79(1):151–175
Long M, Wang J, Cao Y, Sun J, Philip SY (2016) Deep learning of transferable representation for scalable domain adaptation. IEEE Trans Knowl Data Eng 28(8):2027–2040
Pires T, Schlinger E, Garrette D (2019) How multilingual is multilingual BERT? In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 4996–5001
Fan Y, Liang Y, Muzio A, Hassan H, Li H, Zhou M, Duan N (2021) Discovering representation sprachbund for multilingual pre-training. In: Findings of the association for computational linguistics: EMNLP 2021, pp 881–894
Ganin Y, Ustinova E, Ajakan H, Germain P, Larochelle H, Laviolette F, Marchand M, Lempitsky V (2016) Domain-adversarial training of neural networks. J Mach Learn Res 17(1):2096–2030
Feng M, Xiang B, Glass MR, Wang L, Zhou B (2015) Applying deep learning to answer selection: a study and an open task. In: 2015 IEEE workshop on automatic speech recognition and understanding (ASRU), pp 813–820
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Proceedings of the advances in neural information processing systems (NIPS), pp 5998–6008
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Wang S, Khabsa M, Ma H (2020) To pretrain or not to pretrain: Examining the benefits of pretrainng on resource rich tasks. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 2209–2213
Funding
The authors did not receive support from any organization for the submitted work.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chen, D., Zhang, X. & Zhang, S. Narrowing the language gap: domain adaptation guided cross-lingual passage re-ranking. Neural Comput & Applic 35, 20735–20748 (2023). https://doi.org/10.1007/s00521-023-08803-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-08803-7