Improving Zero-Shot Cross-Lingual Dialogue State Tracking via Contrastive Learning

Xiang, Yu; Zhang, Ting; Di, Hui; Huang, Hui; Li, Chunyou; Ouchi, Kazushige; Chen, Yufeng; Xu, Jinan

doi:10.1007/978-981-99-6207-5_8

Yu Xiang¹⁴,
Ting Zhang¹⁵,
Hui Di¹⁶,
Hui Huang¹⁷,
Chunyou Li¹⁴,
Kazushige Ouchi¹⁶,
Yufeng Chen¹⁴ &
…
Jinan Xu¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14232))

Included in the following conference series:

China National Conference on Chinese Computational Linguistics

260 Accesses

Abstract

Recent works in dialogue state tracking (DST) focus on a handful of languages, as collecting large-scale manually annotated data in different languages is expensive. Existing models address this issue by code-switched data augmentation or intermediate fine-tuning of multilingual pre-trained models. However, these models can only perform implicit alignment across languages. In this paper, we propose a novel model named Contrastive Learning for Cross-Lingual DST (CLCL-DST) to enhance zero-shot cross-lingual adaptation. Specifically, we use a self-built bilingual dictionary for lexical substitution to construct multilingual views of the same utterance. Then our approach leverages fine-grained contrastive learning to encourage representations of specific slot tokens in different views to be more similar than negative example pairs. By this means, CLCL-DST aligns similar words across languages into a more refined language-invariant space. In addition, CLCL-DST uses a significance-based keyword extraction approach to select task-related words to build the bilingual dictionary for better cross-lingual positive examples. Experiment results on Multilingual WoZ 2.0 and parallel MultiWoZ 2.1 datasets show that our proposed CLCL-DST outperforms existing state-of-the-art methods by a large margin, demonstrating the effectiveness of CLCL-DST.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://huggingface.co/bert-base-multilingual-uncased.

References

Chen, W., et al.: XL-NBT: a cross-lingual neural belief tracking framework. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 414–424 (2018)
Google Scholar
Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 8440–8451 (2020)
Google Scholar
Conneau, A., Lample, G.: Cross-lingual language model pretraining. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186 (2019)
Google Scholar
Eric, M., et al.: MultiWOZ 2.1: a consolidated multi-domain dialogue dataset with state corrections and state tracking baselines. In: Proceedings of the Twelfth Language Resources and Evaluation Conference, pp. 422–428 (2020)
Google Scholar
Gao, T., Yao, X., Chen, D.: SimCSE: simple contrastive learning of sentence embeddings. In: 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, pp. 6894–6910. Association for Computational Linguistics (ACL) (2021)
Google Scholar
Goel, R., Paul, S., Hakkani-Tür, D.: HyST: a hybrid approach for flexible and accurate dialogue state tracking. Proc. Interspeech 2019, 1458–1462 (2019)
Google Scholar
Gunasekara, C.: Overview of the ninth dialog system technology challenge: DSTC9. In: DSTC9 Workshop at AAAI 2021 (2021)
Google Scholar
Kim, S., Yang, S., Kim, G., Lee, S.W.: Efficient dialogue state tracking by selectively overwriting memory. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 567–582 (2020)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Lai, S., Huang, H., Jing, D., Chen, Y., Xu, J., Liu, J.: Saliency-based multi-view mixed language training for zero-shot cross-lingual classification. In: Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 599–610 (2021)
Google Scholar
Lample, G., Conneau, A., Ranzato, M., Denoyer, L., Jégou, H.: Word translation without parallel data. In: International Conference on Learning Representations (2018)
Google Scholar
Lee, C.H., Cheng, H., Ostendorf, M.: Dialogue state tracking with a language model using schema-driven prompting. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 4937–4949 (2021)
Google Scholar
Lee, H., Lee, J., Kim, T.Y.: SUMBT: slot-utterance matching for universal and scalable belief tracking. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 5478–5483 (2019)
Google Scholar
Lin, W., Tseng, B.H., Byrne, B.: Knowledge-aware graph-enhanced GPT-2 for dialogue state tracking. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 7871–7881 (2021)
Google Scholar
Lin, Y.T., Chen, Y.N.: An empirical study of cross-lingual transferability in generative dialogue state tracker. arXiv preprint arXiv:2101.11360 (2021)
Liu, Z., Winata, G.I., Lin, Z., Xu, P., Fung, P.: Attention-informed mixed-language training for zero-shot cross-lingual task-oriented dialogue systems. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 8433–8440 (2020)
Google Scholar
Liu, Z., Winata, G.I., Xu, P., Lin, Z., Fung, P.: Cross-lingual spoken language understanding with regularized representation alignment. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 7241–7251 (2020)
Google Scholar
Moghe, N., Steedman, M., Birch, A.: Cross-lingual intermediate fine-tuning improves dialogue state tracking. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 1137–1150 (2021)
Google Scholar
Mrkšić, N., et al.: Semantic specialization of distributional word vector spaces using monolingual and cross-lingual constraints. Trans. Assoc. Comput. Linguist. 5, 309–324 (2017)
Article Google Scholar
Qin, L., et al.: GL-CleF: a global-local contrastive learning framework for cross-lingual spoken language understanding. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2677–2686 (2022)
Google Scholar
Qin, L., Ni, M., Zhang, Y., Che, W.: CoSDA-ML: multi-lingual code-switching data augmentation for zero-shot cross-lingual NLP. In: Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, pp. 3853–3860 (2021)
Google Scholar
Ren, L., Ni, J., McAuley, J.: Scalable and accurate dialogue state tracking via hierarchical sequence generation. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 1876–1885 (2019)
Google Scholar
Schuster, S., Gupta, S., Shah, R., Lewis, M.: Cross-lingual transfer learning for multilingual task oriented dialog. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 3795–3805 (2019)
Google Scholar
See, A., Liu, P.J., Manning, C.D.: Get to the point: summarization with pointer-generator networks. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1073–1083 (2017)
Google Scholar
Wang, D., Ding, N., Li, P., Zheng, H.: CLINE: contrastive learning with semantic negative examples for natural language understanding. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 2332–2342 (2021)
Google Scholar
Wang, Y., Zhao, J., Bao, J., Duan, C., Wu, Y., He, X.: LUNA: learning slot-turn alignment for dialogue state tracking. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 3319–3328 (2022)
Google Scholar
Wen, T.H., et al.: A network-based end-to-end trainable task-oriented dialogue system. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pp. 438–449 (2017)
Google Scholar
Wu, C.S., Madotto, A., Hosseini-Asl, E., Xiong, C., Socher, R., Fung, P.: Transferable multi-domain state generator for task-oriented dialogue systems. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 808–819 (2019)
Google Scholar
Xue, L., et al.: mT5: a massively multilingual pre-trained text-to-text transformer. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 483–498 (2021)
Google Scholar
Yuan, M., Zhang, M., Van Durme, B., Findlater, L., Boyd-Graber, J.: Interactive refinement of cross-lingual word embeddings. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 5984–5996 (2020)
Google Scholar
Zeng, Y., Nie, J.Y.: Jointly optimizing state operation prediction and value generation for dialogue state tracking. arXiv preprint arXiv:2010.14061 (2020)
Zeng, Y., Nie, J.Y.: Multi-domain dialogue state tracking based on state graph. arXiv preprint arXiv:2010.11137 (2020)
Zhong, V., Xiong, C., Socher, R.: Global-locally self-attentive encoder for dialogue state tracking. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1458–1467 (2018)
Google Scholar

Download references

Acknowledgement

The research work descried in this paper has been supported by the National Key R &D Program of China (2020AAA0108005), the National Nature Science Foundation of China (No. 61976015, 61976016, 61876198 and 61370130) and Toshiba (China) Co., Ltd. The authors would like to thank the anonymous reviewers for their valuable comments and suggestions to improve this paper.

Author information

Authors and Affiliations

Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing, 100044, China
Yu Xiang, Chunyou Li, Yufeng Chen & Jinan Xu
Global Tone Communication Technology Co., Ltd., Beijing, China
Ting Zhang
Toshiba (China) Co., Ltd., Beijing, China
Hui Di & Kazushige Ouchi
Harbin Institute of Technology, Harbin, China
Hui Huang

Authors

Yu Xiang
View author publications
You can also search for this author in PubMed Google Scholar
Ting Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hui Di
View author publications
You can also search for this author in PubMed Google Scholar
Hui Huang
View author publications
You can also search for this author in PubMed Google Scholar
Chunyou Li
View author publications
You can also search for this author in PubMed Google Scholar
Kazushige Ouchi
View author publications
You can also search for this author in PubMed Google Scholar
Yufeng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jinan Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jinan Xu .

Editor information

Editors and Affiliations

Department of Computer Science and Technology, Tsinghua University, Beijing, China
Maosong Sun
Harbin Institute of Technology, Harbin, China
Bing Qin
Fudan University, Shanghai, China
Xipeng Qiu
School of Computing and Information, Singapore Management University, Singapore, Singapore
Jiang Jing
Institute of Software, Chinese Academy of Sciences, Beijing, China
Xianpei Han
Beijing Language and Culture University, Beijing, China
Gaoqi Rao
Chinese Academy of Sciences, Institute of Automation, Beijing, China
Yubo Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xiang, Y. et al. (2023). Improving Zero-Shot Cross-Lingual Dialogue State Tracking via Contrastive Learning. In: Sun, M., et al. Chinese Computational Linguistics. CCL 2023. Lecture Notes in Computer Science(), vol 14232. Springer, Singapore. https://doi.org/10.1007/978-981-99-6207-5_8

Download citation

DOI: https://doi.org/10.1007/978-981-99-6207-5_8
Published: 20 September 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-6206-8
Online ISBN: 978-981-99-6207-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improving Zero-Shot Cross-Lingual Dialogue State Tracking via Contrastive Learning