Enhancing Low-Resource Languages Question Answering with Syntactic Graph

Wu, Linjuan; Zhu, Jiazheng; Zhang, Xiaowang; Zhuang, Zhiqiang; Feng, ZhiYong

doi:10.1007/978-3-031-11217-1_13

Linjuan Wu¹⁰,
Jiazheng Zhu¹⁰,
Xiaowang Zhang¹⁰,
Zhiqiang Zhuang ORCID: orcid.org/0000-0003-0081-1703¹⁰ &
…
ZhiYong Feng ORCID: orcid.org/0000-0001-8158-7453¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13248))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

Abstract

Multilingual pre-trained language models (PLMs) facilitate zero-shot cross-lingual transfer from rich-resource languages to low-resource languages in extractive question answering (QA) tasks. However, during fine-tuning on the QA task, the syntactic information of languages in multilingual PLMs is not always preserved or even is forgotten, which may influence the detection of answer spans for low-resource languages. In this paper, we propose an auxiliary task to predict syntactic graphs to enhance syntax information in the fine-tuning stage of the QA task to improve the answer span detection of low-resource. The syntactic graph includes Part-of-Speech (POS) information and syntax tree information without dependency parse label. We convert the syntactic graph prediction task into two subtasks to adapt the sequence input of PLMs: POS tags prediction task and syntax tree prediction task (including depth prediction of a word and distance prediction of two words). Moreover, to improve the alignment between languages, we parallel train the source language and target languages syntactic graph prediction task. Extensive experiments on three multilingual QA datasets show the effectiveness of our proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Phrase based code-switching for cross-lingual question understanding

Article 20 September 2023

Syntax-Informed Question Answering with Heterogeneous Graph Transformer

Multi-stage transfer learning with BERTology-based language models for question answering system in vietnamese

Article 30 January 2023

Notes

1.
https://stanfordnlp.github.io/CoreNLP/.

References

Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ questions for machine comprehension of text. In: 21st International Conference on Empirical Methods in Natural Language Processing, pp. 2383–2392. Association for Computational Linguistics. Austin, Texas, USA (2016)
Google Scholar
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: 23rd International Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4171–4186. Association for Computational Linguistics. Minneapolis, MN, USA (2019)
Google Scholar
Conneau, A., Lample, G.: Cross-lingual language model pretraining. In: 33rd International Conference on Neural Information Processing Systems, pp. 7057–7067. Vancouver, BC, Canada (2019)
Google Scholar
Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. In: 58th Conference of the Association for Computational Linguistics, pp. 8440–8451. Association for Computational Linguistics. Online (2020)
Google Scholar
Artetxe, M., Schwenk, H.: Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond. Trans. Assoc. Comput. Linguist. 7, 597–610 (2019)
Article Google Scholar
Patrick, S.H., Lewis, B.O., Rinott, R., Riedel, S., Schwenk, H.: MLQA: evaluating cross-lingual extractive question answering. In: 58th Conference of the Association for Computational Linguistics, pp. 7315–7330 (2020)
Google Scholar
Artetxe, M., Ruder, S., Yogatama, D.: On the cross-lingual transferability of monolingual representations. In: 58th Conference of the Association for Computational Linguistics, pp. 4623–4637 (2020)
Google Scholar
Jonathan, H., et al.: TyDi QA: a Benchmark for information-seeking question answering in typologically diverse languages. Trans. Assoc. Comput. Linguist. 8, 454–470 (2020)
Article Google Scholar
Yuan, F., et al.: Enhancing answer boundary detection for multilingual machine reading comprehension. In: 58th Conference of the Association for Computational Linguistics, pp. 925–934. Association for Computational Linguistics (2020)
Google Scholar
Gangi Reddy, R., et al.: Answer span correction in machine reading comprehension. In: 25th International Conference on Empirical Methods in Natural Language Processing, pp. 2496–2501. Association for Computational Linguistics (2020)
Google Scholar
Hewitt, J., Manning, C.D.: A structural probe for finding syntax in word representations. In: 23rd International Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4129–4138. Association for Computational Linguistics. Minneapolis, MN, USA (2019)
Google Scholar
Chi, E.A., Hewitt, J., Manning, C.D.: Finding universal grammatical relations in multilingual BERT. In: 58th Conference of the Association for Computational Linguistics, pp. 5564–5577. Association for Computational Linguistics (2020)
Google Scholar
Pérez-Mayos, L., Carlini, R., Ballesteros, M., Wanner, L.: On the evolution of syntactic information encoded by BERT’s contextualized representations. In: 16th Conference of the European Chapter of the Association for Computational Linguistics, pp. 2243–2258. Association for Computational Linguistics(2021)
Google Scholar
Xu, K., Wu, L., Wang, Z., Yu, M., Chen, L., Sheinin, V.: Exploiting rich syntactic information for semantic parsing with graph-to-sequence model. In: 23th International Conference on Empirical Methods in Natural Language Processing, pp. 918–924. Association for Computational Linguistics. Brussels, Belgium (2018)
Google Scholar
Hsu, T.-Y., Liu, C.-L., Lee, H.: Zero-shot reading comprehension by cross-lingual transfer learning with multi-lingual language representation model. In: 24th International Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp. 5932–5939. Association for Computational Linguistics. Hong Kong, China (2019)
Google Scholar
Liu, J., Shou, L., Pei, J., Gong, M., Yang, M., Jiang, D.: Cross-lingual machine reading comprehension. In: 24th International Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp. 1586–1595. Association for Computational Linguistics. Hong Kong, China (2019)
Google Scholar
Liu, J., Shou, L., Pei, J., Gong, M., Yang, M., Jiang, D.: Cross-lingual machine reading comprehension with language branch knowledge distillation. In: 28th International Conference on Computational Linguistics, pp. 2710–2721. International Committee on Computational Linguistics (2020)
Google Scholar
Huang, W.-C., Huang, C., Lee, H.: Improving cross-lingual reading comprehension with self-training. arXiv preprint arXiv:2105.03627 (2021)
Gaochen Wu, Bin Xu, Yuxin Qin, Fei Kong, Bangchang Liu, Hongwen Zhao, Dejie Chang: Improving Low-resource Reading Comprehension via Cross-lingual Transposition Rethinking. arXiv preprint arXiv:2107.05002 (2021)
Jiao, J., Wang, S., Zhang, X., Wang, L., Feng, Z., Wang, J.: gMatch: knowledge base question answering via semantic matching. Knowl.-Based Syst. 228, 107270 (2021)
Article Google Scholar
Hu, J., Ruder, S., Siddhant, A., Neubig, G., Firat, O., Johnson, M.: XTREME: a massively multilingual multi-task benchmark for evaluating cross-lingual generalisation. In: 37th International Conference on Machine Learning, pp. 4411–4421. Proceedings of Machine Learning Research. Virtual Event (2020)
Google Scholar
Zeman, D., et al.: Universal dependencies 2.7 (2020). http://hdl.handle.net/11234/570 1–3424
Conneau, A., et al.: XNLI: evaluating cross-lingual sentence representations. In: 23th International Conference on Empirical Methods in Natural Language Processing, pp. 2475–2485. Association for Computational Linguistics. Brussels, Belgium (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China
Linjuan Wu, Jiazheng Zhu, Xiaowang Zhang, Zhiqiang Zhuang & ZhiYong Feng

Authors

Linjuan Wu
View author publications
You can also search for this author in PubMed Google Scholar
Jiazheng Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaowang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhiqiang Zhuang
View author publications
You can also search for this author in PubMed Google Scholar
ZhiYong Feng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Linjuan Wu .

Editor information

Editors and Affiliations

University of Aizu, Aizu, Japan
Uday Kiran Rage
Indraprastha Institute of Information Technology, Delhi, India
Vikram Goyal
Data Sciences and Analytics Center, International Institute of Information Technology, Hyderabad, Telangana, India
P. Krishna Reddy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, L., Zhu, J., Zhang, X., Zhuang, Z., Feng, Z. (2022). Enhancing Low-Resource Languages Question Answering with Syntactic Graph. In: Rage, U.K., Goyal, V., Reddy, P.K. (eds) Database Systems for Advanced Applications. DASFAA 2022 International Workshops. DASFAA 2022. Lecture Notes in Computer Science, vol 13248. Springer, Cham. https://doi.org/10.1007/978-3-031-11217-1_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-11217-1_13
Published: 16 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-11216-4
Online ISBN: 978-3-031-11217-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Enhancing Low-Resource Languages Question Answering with Syntactic Graph