Abstract
Task-oriented dialogue systems have made unprecedented progress with multiple state-of-the-art (SOTA) models underpinned by a number of publicly available MultiWOZ datasets. Dialogue state annotations are error-prone, leading to sub-optimal performance. Various efforts have been put in rectifying the annotation errors presented in the original MultiWOZ dataset. In this paper, we introduce MultiWOZ 2.3, in which we differentiate incorrect annotations in dialogue acts from dialogue states, identifying a lack of co-reference when publishing the updated dataset. To ensure consistency between dialogue acts and dialogue states, we implement co-reference features and unify annotations of dialogue acts and dialogue states. We update the state of the art performance of natural language understanding and dialogue state tracking on MultiWOZ 2.3, where the results show significant improvements than on previous versions of MultiWOZ datasets (2.0–2.2).
T. Han and X. L—Both authors contributed equally to the work. The work was conducted when Ting Han interned at AARC.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
https://github.com/budzianowski/multiwoz. Marked date: 6/1/2021.
- 2.
https://github.com/lexmen318/MultiWOZ-coref. Please be aware that all associated appendices are separately presented in the github link due to the limitataion of page numbers.
- 3.
Statistics on the type of corrections on the “metadata” annotations is presented in Appendix A.
- 4.
Examples of inconsistent tracking are presented in Appendix B.
- 5.
Statistics of the amount of coreference annotation for each slot is presented in Appendix C.
- 6.
Sample co-reference annotation is presented in Appendix D.
- 7.
Full benchmarks with various models are available in Appendix E.
- 8.
Scores shown in Table 7 are achieved by using pre-process scripts provided by SUMBT and TRADE.
- 9.
Details of correction are shown in Appendix F.
References
Budzianowski, P., Wen, T.H., Tseng, B.H., Casanueva, I., Ultes, S., Ramadan, O., & Gašić, M.: MultiWOZ-a large-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modelling. In: EMNLP, Brussels, pp. 5016–5026 (2018)
Mehri, S., Eric, M., Hakkani-Tur, D.: DialoGLUE: a natural language understanding benchmark for task-oriented dialogue. arXiv preprint arXiv:2009.13570 (2020)
Wang, Y., Guo, Y., Zhu, S.: Slot attention with value normalization for multi-domain dialogue state tracking. In: EMNLP, pp. 3019–3028, November 2020
Kim, S., Yang, S., Kim, G., Lee, S. W.: Efficient dialogue state tracking by selectively overwriting memory. In: ACL, pp. 567–582, July 2020
Ren, L., Ni, J., McAuley, J.: Scalable and accurate dialogue state tracking via hierarchical sequence generation. In: EMNLP-IJCNLP, Hong Kong, pp. 1876–1885, November 2019
Takanobu, R., Zhu, H., Huang, M.: Guided dialog policy learning: Reward estimation for multi-domain task-oriented dialog. In: EMNLP-IJCNLP, Hong Kong, pp. 100–110, November 2019
Schatzmann, J., Thomson, B., Weilhammer, K., Ye, H., Young, S.: Agenda-based user simulation for bootstrapping a POMDP dialogue system. In: NAACL-HLT, Companion Volume, pp. 149–152. Rochester, April 2007
Gür, I., Hakkani-Tür, D., Tür, G., Shah, P.: User modeling for task oriented dialogues. In: IEEE-SLT, Athens, pp. 900–906, December 2018
Chen, W., Chen, J., Qin, P., Yan, X., Wang, W.Y.: Semantically Conditioned Dialog Response Generation via Hierarchical Disentangled Self-Attention. In: ACL, Florence, pp. 3696–3709, July 2019
Zhang, J.G., Hashimoto, K., Wu, C.S., Wan, Y., Yu, P.S., Socher, R., Xiong, C.: Find or classify? dual strategy for slot-value predictions on multi-domain dialog state tracking. arXiv preprint arXiv:1910.03544 (2019)
Zhao, T., Xie, K., Eskenazi, M.: Rethinking action spaces for reinforcement learning in end-to-end dialog agents with latent variable models. In: NAACL-HLT, Volume 1 (Long and Short Papers), Minneapolis, pp. 1208–1218, June 2019
Rastogi, A., Zang, X., Sunkara, S., Gupta, R., Khaitan, P.: Towards scalable multi-domain conversational agents: the schema-guided dialogue dataset. In: AAAI, New York, pp. 8689–8696, April 2020
Wen, T.H., et al.: A network-based end-to-end trainable task-oriented dialogue system. In: EACL, Valencia, pp. 438–449, January 2017
Williams, J., Raux, A., Ramachandran, D., Black, A.: The dialog state tracking challenge. In: SIGDIAL, Metz, pp. 404–413 (2013)
Henderson, M., Thomson, B., Williams, J.D.: The second dialog state tracking challenge. In: SIGDIAL, Philadelphia, pp. 263–272 (2014)
Eric, M., et al.: MultiWOZ 2.1: a consolidated multi-domain dialogue dataset with state corrections and state tracking baselines. In: LREC, Marseille, pp. 422–428 (2020)
Zhu, Q., Huang, K., Zhang, Z., Zhu, X., Huang, M.: CrossWOZ: a large-scale chinese cross-domain task-oriented dialogue dataset. In: TACL, 8, pp. 281–295 (2020)
Zang, X., Rastogi, A., Sunkara, S., Gupta, R., Zhang, J., Chen, J.: MultiWOZ 2.2: a dialogue dataset with additional annotation corrections and state tracking baselines. In: ACL, pp. 109–117 (2020)
Zhu, Q., et al.: ConvLab-2: an open-source toolkit for building, evaluating, and diagnosing dialogue systems. In: ACL, System Demonstrations, pp. 142–149, July 2020
Gao, S., Sethi, A., Agarwal, S., Chung, T., Hakkani-Tur, D., AI, A.A.: Dialog state tracking: a neural reading comprehension approach. In: SIGDIAL, Stockholm, pp. 264–273 (2019)
Wu, C.S., Madotto, A., Hosseini-Asl, E., Xiong, C., Socher, R., Fung, P.: Transferable multi-domain state generator for task-oriented dialogue systems. In: ACL, Florence, pp. 808–819, July 2019
Lee, H., Lee, J., Kim, T.Y.: SUMBT: slot-utterance matching for universal and scalable belief tracking. In: ACL, Florence, pp. 5478–5483, July 2019
Zhou, L., Small, K.: Multi-domain dialogue state tracking as dynamic knowledge graph enhanced question answering. arXiv preprint arXiv:1911.06192 (2019)
Heck, M., et al.: TripPy: a triple copy strategy for value independent neural dialog state tracking. In: SIGDIAL, pp. 35–44, July 2020
Pan, Z., Bai, K., Wang, Y., Zhou, L., Liu, X.: Improving open-domain dialogue systems via multi-turn incomplete utterance restoration. In: EMNLP-IJCNLP, Hong Kong, pp. 1824–1833, November 2019
Quan, J., Xiong, D., Webber, B., Hu, C.: GECOR: an end-to-end generative ellipsis and co-reference resolution model for task-oriented dialogue. In: EMNLP-IJCNLP, Hong Kong, pp. 4539–4549, November 2019
Su, H., et al.: Improving multi-turn dialogue modelling with utterance ReWriter. In: ACL, Florence, pp. 22–31, July 2019
Ferreira Cruz, A., Rocha, G., Lopes Cardoso, H.: Coreference resolution: toward end-to-end and cross-lingual systems. Information 11(2), 2078–2489 (2020)
Lee, S., et al.: ConvLab: multi-domain end-to-end dialog system platform. In: ACL, Florence, pp. 64–69, July 2019
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT, Volume 1 (Long and Short Papers), pp. 4171–4186. Minneapolis, June 2019
Chen, Q., Zhuo, Z., Wang, W.: Bert for joint intent classification and slot filling. arXiv preprint arXiv:1902.10909 (2019)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Han, T. et al. (2021). MultiWOZ 2.3: A Multi-domain Task-Oriented Dialogue Dataset Enhanced with Annotation Corrections and Co-Reference Annotation. In: Wang, L., Feng, Y., Hong, Y., He, R. (eds) Natural Language Processing and Chinese Computing. NLPCC 2021. Lecture Notes in Computer Science(), vol 13029. Springer, Cham. https://doi.org/10.1007/978-3-030-88483-3_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-88483-3_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88482-6
Online ISBN: 978-3-030-88483-3
eBook Packages: Computer ScienceComputer Science (R0)