MultiWOZ 2.3: A Multi-domain Task-Oriented Dialogue Dataset Enhanced with Annotation Corrections and Co-Reference Annotation

Han, Ting; Liu, Ximing; Takanabu, Ryuichi; Lian, Yixin; Huang, Chongxuan; Wan, Dazhen; Peng, Wei; Huang, Minlie

doi:10.1007/978-3-030-88483-3_16

Ting Han¹²,
Ximing Liu¹³,
Ryuichi Takanabu¹⁴,
Yixin Lian¹³,
Chongxuan Huang¹³,
Dazhen Wan¹⁴,
Wei Peng¹³ &
…
Minlie Huang¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13029))

Included in the following conference series:

CCF International Conference on Natural Language Processing and Chinese Computing

1856 Accesses
14 Citations

Abstract

Task-oriented dialogue systems have made unprecedented progress with multiple state-of-the-art (SOTA) models underpinned by a number of publicly available MultiWOZ datasets. Dialogue state annotations are error-prone, leading to sub-optimal performance. Various efforts have been put in rectifying the annotation errors presented in the original MultiWOZ dataset. In this paper, we introduce MultiWOZ 2.3, in which we differentiate incorrect annotations in dialogue acts from dialogue states, identifying a lack of co-reference when publishing the updated dataset. To ensure consistency between dialogue acts and dialogue states, we implement co-reference features and unify annotations of dialogue acts and dialogue states. We update the state of the art performance of natural language understanding and dialogue state tracking on MultiWOZ 2.3, where the results show significant improvements than on previous versions of MultiWOZ datasets (2.0–2.2).

T. Han and X. L—Both authors contributed equally to the work. The work was conducted when Ting Han interned at AARC.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

The DialogBank: dialogues with interoperable annotations

Article Open access 13 December 2018

Benchmarking Natural Language Understanding Services for Building Conversational Agents

Gate-Enhanced Multi-domain Dialog State Tracking for Task-Oriented Dialogue Systems

Notes

1.
https://github.com/budzianowski/multiwoz. Marked date: 6/1/2021.
2.
https://github.com/lexmen318/MultiWOZ-coref. Please be aware that all associated appendices are separately presented in the github link due to the limitataion of page numbers.
3.
Statistics on the type of corrections on the “metadata” annotations is presented in Appendix A.
4.
Examples of inconsistent tracking are presented in Appendix B.
5.
Statistics of the amount of coreference annotation for each slot is presented in Appendix C.
6.
Sample co-reference annotation is presented in Appendix D.
7.
Full benchmarks with various models are available in Appendix E.
8.
Scores shown in Table 7 are achieved by using pre-process scripts provided by SUMBT and TRADE.
9.
Details of correction are shown in Appendix F.

References

Budzianowski, P., Wen, T.H., Tseng, B.H., Casanueva, I., Ultes, S., Ramadan, O., & Gašić, M.: MultiWOZ-a large-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modelling. In: EMNLP, Brussels, pp. 5016–5026 (2018)
Google Scholar
Mehri, S., Eric, M., Hakkani-Tur, D.: DialoGLUE: a natural language understanding benchmark for task-oriented dialogue. arXiv preprint arXiv:2009.13570 (2020)
Wang, Y., Guo, Y., Zhu, S.: Slot attention with value normalization for multi-domain dialogue state tracking. In: EMNLP, pp. 3019–3028, November 2020
Google Scholar
Kim, S., Yang, S., Kim, G., Lee, S. W.: Efficient dialogue state tracking by selectively overwriting memory. In: ACL, pp. 567–582, July 2020
Google Scholar
Ren, L., Ni, J., McAuley, J.: Scalable and accurate dialogue state tracking via hierarchical sequence generation. In: EMNLP-IJCNLP, Hong Kong, pp. 1876–1885, November 2019
Google Scholar
Takanobu, R., Zhu, H., Huang, M.: Guided dialog policy learning: Reward estimation for multi-domain task-oriented dialog. In: EMNLP-IJCNLP, Hong Kong, pp. 100–110, November 2019
Google Scholar
Schatzmann, J., Thomson, B., Weilhammer, K., Ye, H., Young, S.: Agenda-based user simulation for bootstrapping a POMDP dialogue system. In: NAACL-HLT, Companion Volume, pp. 149–152. Rochester, April 2007
Google Scholar
Gür, I., Hakkani-Tür, D., Tür, G., Shah, P.: User modeling for task oriented dialogues. In: IEEE-SLT, Athens, pp. 900–906, December 2018
Google Scholar
Chen, W., Chen, J., Qin, P., Yan, X., Wang, W.Y.: Semantically Conditioned Dialog Response Generation via Hierarchical Disentangled Self-Attention. In: ACL, Florence, pp. 3696–3709, July 2019
Google Scholar
Zhang, J.G., Hashimoto, K., Wu, C.S., Wan, Y., Yu, P.S., Socher, R., Xiong, C.: Find or classify? dual strategy for slot-value predictions on multi-domain dialog state tracking. arXiv preprint arXiv:1910.03544 (2019)
Zhao, T., Xie, K., Eskenazi, M.: Rethinking action spaces for reinforcement learning in end-to-end dialog agents with latent variable models. In: NAACL-HLT, Volume 1 (Long and Short Papers), Minneapolis, pp. 1208–1218, June 2019
Google Scholar
Rastogi, A., Zang, X., Sunkara, S., Gupta, R., Khaitan, P.: Towards scalable multi-domain conversational agents: the schema-guided dialogue dataset. In: AAAI, New York, pp. 8689–8696, April 2020
Google Scholar
Wen, T.H., et al.: A network-based end-to-end trainable task-oriented dialogue system. In: EACL, Valencia, pp. 438–449, January 2017
Google Scholar
Williams, J., Raux, A., Ramachandran, D., Black, A.: The dialog state tracking challenge. In: SIGDIAL, Metz, pp. 404–413 (2013)
Google Scholar
Henderson, M., Thomson, B., Williams, J.D.: The second dialog state tracking challenge. In: SIGDIAL, Philadelphia, pp. 263–272 (2014)
Google Scholar
Eric, M., et al.: MultiWOZ 2.1: a consolidated multi-domain dialogue dataset with state corrections and state tracking baselines. In: LREC, Marseille, pp. 422–428 (2020)
Google Scholar
Zhu, Q., Huang, K., Zhang, Z., Zhu, X., Huang, M.: CrossWOZ: a large-scale chinese cross-domain task-oriented dialogue dataset. In: TACL, 8, pp. 281–295 (2020)
Google Scholar
Zang, X., Rastogi, A., Sunkara, S., Gupta, R., Zhang, J., Chen, J.: MultiWOZ 2.2: a dialogue dataset with additional annotation corrections and state tracking baselines. In: ACL, pp. 109–117 (2020)
Google Scholar
Zhu, Q., et al.: ConvLab-2: an open-source toolkit for building, evaluating, and diagnosing dialogue systems. In: ACL, System Demonstrations, pp. 142–149, July 2020
Google Scholar
Gao, S., Sethi, A., Agarwal, S., Chung, T., Hakkani-Tur, D., AI, A.A.: Dialog state tracking: a neural reading comprehension approach. In: SIGDIAL, Stockholm, pp. 264–273 (2019)
Google Scholar
Wu, C.S., Madotto, A., Hosseini-Asl, E., Xiong, C., Socher, R., Fung, P.: Transferable multi-domain state generator for task-oriented dialogue systems. In: ACL, Florence, pp. 808–819, July 2019
Google Scholar
Lee, H., Lee, J., Kim, T.Y.: SUMBT: slot-utterance matching for universal and scalable belief tracking. In: ACL, Florence, pp. 5478–5483, July 2019
Google Scholar
Zhou, L., Small, K.: Multi-domain dialogue state tracking as dynamic knowledge graph enhanced question answering. arXiv preprint arXiv:1911.06192 (2019)
Heck, M., et al.: TripPy: a triple copy strategy for value independent neural dialog state tracking. In: SIGDIAL, pp. 35–44, July 2020
Google Scholar
Pan, Z., Bai, K., Wang, Y., Zhou, L., Liu, X.: Improving open-domain dialogue systems via multi-turn incomplete utterance restoration. In: EMNLP-IJCNLP, Hong Kong, pp. 1824–1833, November 2019
Google Scholar
Quan, J., Xiong, D., Webber, B., Hu, C.: GECOR: an end-to-end generative ellipsis and co-reference resolution model for task-oriented dialogue. In: EMNLP-IJCNLP, Hong Kong, pp. 4539–4549, November 2019
Google Scholar
Su, H., et al.: Improving multi-turn dialogue modelling with utterance ReWriter. In: ACL, Florence, pp. 22–31, July 2019
Google Scholar
Ferreira Cruz, A., Rocha, G., Lopes Cardoso, H.: Coreference resolution: toward end-to-end and cross-lingual systems. Information 11(2), 2078–2489 (2020)
Google Scholar
Lee, S., et al.: ConvLab: multi-domain end-to-end dialog system platform. In: ACL, Florence, pp. 64–69, July 2019
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT, Volume 1 (Long and Short Papers), pp. 4171–4186. Minneapolis, June 2019
Google Scholar
Chen, Q., Zhuo, Z., Wang, W.: Bert for joint intent classification and slot filling. arXiv preprint arXiv:1902.10909 (2019)

Download references

Author information

Authors and Affiliations

University of Illinois at Chicago, Chicago, USA
Ting Han
Artificial Intelligence Application Research Center, AARC, Huawei Technologies, Shenzhen, China
Ximing Liu, Yixin Lian, Chongxuan Huang & Wei Peng
Tsinghua University, Beijing, China
Ryuichi Takanabu, Dazhen Wan & Minlie Huang

Authors

Ting Han
View author publications
You can also search for this author in PubMed Google Scholar
Ximing Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ryuichi Takanabu
View author publications
You can also search for this author in PubMed Google Scholar
Yixin Lian
View author publications
You can also search for this author in PubMed Google Scholar
Chongxuan Huang
View author publications
You can also search for this author in PubMed Google Scholar
Dazhen Wan
View author publications
You can also search for this author in PubMed Google Scholar
Wei Peng
View author publications
You can also search for this author in PubMed Google Scholar
Minlie Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Wei Peng or Minlie Huang .

Editor information

Editors and Affiliations

University of Michigan, Ann Arbor, MI, USA
Lu Wang
Peking University, Beijing, China
Yansong Feng
Soochow University, Suzhou, China
Yu Hong
Tianjin University, Tianjin, China
Ruifang He

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Han, T. et al. (2021). MultiWOZ 2.3: A Multi-domain Task-Oriented Dialogue Dataset Enhanced with Annotation Corrections and Co-Reference Annotation. In: Wang, L., Feng, Y., Hong, Y., He, R. (eds) Natural Language Processing and Chinese Computing. NLPCC 2021. Lecture Notes in Computer Science(), vol 13029. Springer, Cham. https://doi.org/10.1007/978-3-030-88483-3_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-88483-3_16
Published: 06 October 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88482-6
Online ISBN: 978-3-030-88483-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)