Abstract
Cross-lingual open domain question answering (Open-QA) has become an increasingly important topic. When training a monolingual model, it is often necessary to use a large number of labeled data for supervised training, which makes it difficult to real applications, especially for low-resource languages. Recently, thanks to multilingual BERT model, a new task, so called zero-shot cross-lingual QA has emerged in this field, i.e., training a model for a language rich in resources and directly testing in other languages. The existing problems in the current research include two main points. The one is in document retrieval stage, directly working multilingual pretraining model for similarity calculation will result in insufficient retrieval accuracy. The other is in the stage of answer extraction, the answers will involve different levels of abstraction related to retrieved documents, which needs deep exploration. This paper puts forward a cross-layer connection based approach for cross-lingual Open-QA. It consists of Match-Retrieval module and Connection-Extraction module. The matching network in the retrieval module makes heuristic adjustment and expansion on the learned features to improve the retrieval quality. In the answer extraction module, the reuse of deep semantic features is realized at the network structure level through cross-layer connection. Experimental results on a public cross-lingual Open-QA dataset show the superiority of our proposed approach over the state-of-the-art methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Gouws, S., Bengio, Y., Corrado, G.: BilBOWA: fast bilingual distributed representations without word alignments. In: 32nd International Conference on Machine Learning, ICML 2015, vol. 1, pp. 748–756. International Machine Learning Society (IMLS) (2015)
Luong, T., Pham, H., Manning, C.D.: Bilingual word representations with monolingual quality in mind, pp. 151–159 (2015). https://doi.org/10.3115/v1/w15-1521
Artetxe, M., Labaka, G., Agirre, E.: Generalizing and improving bilingual word embedding mappings with a multi-step framework of linear transformations. In: 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, pp. 5012–5019 (2018)
Zhang, M., Liu, Y., Luan, H., Sun, M., Izuha, T., Hao, J.: Building earth mover’s distance on bilingual word embeddings for machine translation. In: 30th AAAI Conference on Artificial Intelligence, AAAI 2016, pp. 2870–2876 (2016)
Conneau, A, Lample, G, Ranzato, M.: Word translation without parallel data. CoRR abs/1710.04087 (2017)
Artetxe, M., Schwenk, H.: Massively multilingual sentence embeddings for zero- shot cross-lingual transfer and beyond. Trans. Assoc. Comput. Linguist. 7, 597–610 (2019). https://doi.org/10.1162/tacl_a_00288
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), pp. 4171–4186. No. Mlm (2019). https://doi.org/10.18653/v1/N19-1423
Klementiev, A., Titov, I., Bhattarai, B.: Inducing crosslingual distributed representations of words. In: Proceedings of COLING 2012, pp. 1459–1474. The COLING 2012 Organizing Committee, Mumbai (2012)
Cer, D., Diab, M., Agirre, E., Lopez-Gazpio, I., Specia, L.: SemEval-2017 Task 1: semantic textual similarity multilingual and crosslingual focused evaluation, pp. 1–14 (2018). https://doi.org/10.18653/v1/s17-2001
Conneau, A., et al.: XNLI: evaluating cross-lingual sentence representations. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018, pp. 2475–2485 (2018)
Schuster, S., Gupta, S., Shah, R., Lewis, M.: Cross-lingual transfer learning for multilingual task oriented dialog. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), pp. 3795–3805. Association for Computational Linguistics, Minneapolis (2019). https://doi.org/10.18653/v1/N19-1380
Liu, J., Lin, Y., Liu, Z., Sun, M.: XQA: a cross-lingual open-domain question answering dataset, pp. 2358–2368 (2019). https://doi.org/10.18653/v1/p19-1227
Schwenk, H., Li, X.: A corpus for multilingual document classification in eight languages. In: LREC 2018–11th International Conference on Language Resources and Evaluation, pp. 3548–3551 (2019)
Lample, G., Conneau, A., Denoyer, L., Ranzato, M.: Unsupervised machine translation using monolingual corpora only. In: 6th International Conference on Learning Representations, ICLR 2018 - Conference Track Proceedings (2018)
Pires, T., Schlinger, E., Garrette, D.: How multilingual is multilingual BERT? pp. 4996–5001 (2019). https://doi.org/10.18653/v1/p19-1493
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017 Janua, pp. 2261–2269 (2017). https://doi.org/10.1109/CVPR.2017.243
Conneau, A., Kiela, D., Schwenk, H., Barrault, L., Bordes, A.: Supervised learning of universal sentence representations from natural language inference data. In: EMNLP 2017 - Conference on Empirical Methods in Natural Language Processing, Proceedings, pp. 670–680 (2017). https://doi.org/10.18653/v1/d17-1070
Mou, L., et al.: Recognizing entailment and contradiction by tree-based convolution. arXiv preprint (2015)
Jawahar, G., Sagot, B., Seddah, D.: What does BERT learn about the structure of language? pp. 3651–3657 (2019). https://doi.org/10.18653/v1/p19-1356
Gardner, M., Clark, C.: Simple and effective multi-paragraph reading comprehension. In: ACL 2018–56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers), vol. 1, pp. 845–855 (2018). https://doi.org/10.18653/v1/p18-1078
Acknowledgements
The authors would like to thank anonymous reviewers for their helpful comments. This work was supported by NSFC Funding (No. 61876062).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, L., Kong, M., Li, D., Zhou, D. (2020). A Cross-Layer Connection Based Approach for Cross-Lingual Open Question Answering. In: Zhu, X., Zhang, M., Hong, Y., He, R. (eds) Natural Language Processing and Chinese Computing. NLPCC 2020. Lecture Notes in Computer Science(), vol 12430. Springer, Cham. https://doi.org/10.1007/978-3-030-60450-9_37
Download citation
DOI: https://doi.org/10.1007/978-3-030-60450-9_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60449-3
Online ISBN: 978-3-030-60450-9
eBook Packages: Computer ScienceComputer Science (R0)