Skip to main content

MultiWOZ 2.3: A Multi-domain Task-Oriented Dialogue Dataset Enhanced with Annotation Corrections and Co-Reference Annotation

  • Conference paper
  • First Online:
Natural Language Processing and Chinese Computing (NLPCC 2021)

Abstract

Task-oriented dialogue systems have made unprecedented progress with multiple state-of-the-art (SOTA) models underpinned by a number of publicly available MultiWOZ datasets. Dialogue state annotations are error-prone, leading to sub-optimal performance. Various efforts have been put in rectifying the annotation errors presented in the original MultiWOZ dataset. In this paper, we introduce MultiWOZ 2.3, in which we differentiate incorrect annotations in dialogue acts from dialogue states, identifying a lack of co-reference when publishing the updated dataset. To ensure consistency between dialogue acts and dialogue states, we implement co-reference features and unify annotations of dialogue acts and dialogue states. We update the state of the art performance of natural language understanding and dialogue state tracking on MultiWOZ 2.3, where the results show significant improvements than on previous versions of MultiWOZ datasets (2.0–2.2).

T. Han and X. L—Both authors contributed equally to the work. The work was conducted when Ting Han interned at AARC.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/budzianowski/multiwoz. Marked date: 6/1/2021.

  2. 2.

    https://github.com/lexmen318/MultiWOZ-coref. Please be aware that all associated appendices are separately presented in the github link due to the limitataion of page numbers.

  3. 3.

    Statistics on the type of corrections on the “metadata” annotations is presented in Appendix A.

  4. 4.

    Examples of inconsistent tracking are presented in Appendix B.

  5. 5.

    Statistics of the amount of coreference annotation for each slot is presented in Appendix C.

  6. 6.

    Sample co-reference annotation is presented in Appendix D.

  7. 7.

    Full benchmarks with various models are available in Appendix E.

  8. 8.

    Scores shown in Table 7 are achieved by using pre-process scripts provided by SUMBT and TRADE.

  9. 9.

    Details of correction are shown in Appendix F.

References

  1. Budzianowski, P., Wen, T.H., Tseng, B.H., Casanueva, I., Ultes, S., Ramadan, O., & Gašić, M.: MultiWOZ-a large-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modelling. In: EMNLP, Brussels, pp. 5016–5026 (2018)

    Google Scholar 

  2. Mehri, S., Eric, M., Hakkani-Tur, D.: DialoGLUE: a natural language understanding benchmark for task-oriented dialogue. arXiv preprint arXiv:2009.13570 (2020)

  3. Wang, Y., Guo, Y., Zhu, S.: Slot attention with value normalization for multi-domain dialogue state tracking. In: EMNLP, pp. 3019–3028, November 2020

    Google Scholar 

  4. Kim, S., Yang, S., Kim, G., Lee, S. W.: Efficient dialogue state tracking by selectively overwriting memory. In: ACL, pp. 567–582, July 2020

    Google Scholar 

  5. Ren, L., Ni, J., McAuley, J.: Scalable and accurate dialogue state tracking via hierarchical sequence generation. In: EMNLP-IJCNLP, Hong Kong, pp. 1876–1885, November 2019

    Google Scholar 

  6. Takanobu, R., Zhu, H., Huang, M.: Guided dialog policy learning: Reward estimation for multi-domain task-oriented dialog. In: EMNLP-IJCNLP, Hong Kong, pp. 100–110, November 2019

    Google Scholar 

  7. Schatzmann, J., Thomson, B., Weilhammer, K., Ye, H., Young, S.: Agenda-based user simulation for bootstrapping a POMDP dialogue system. In: NAACL-HLT, Companion Volume, pp. 149–152. Rochester, April 2007

    Google Scholar 

  8. Gür, I., Hakkani-Tür, D., Tür, G., Shah, P.: User modeling for task oriented dialogues. In: IEEE-SLT, Athens, pp. 900–906, December 2018

    Google Scholar 

  9. Chen, W., Chen, J., Qin, P., Yan, X., Wang, W.Y.: Semantically Conditioned Dialog Response Generation via Hierarchical Disentangled Self-Attention. In: ACL, Florence, pp. 3696–3709, July 2019

    Google Scholar 

  10. Zhang, J.G., Hashimoto, K., Wu, C.S., Wan, Y., Yu, P.S., Socher, R., Xiong, C.: Find or classify? dual strategy for slot-value predictions on multi-domain dialog state tracking. arXiv preprint arXiv:1910.03544 (2019)

  11. Zhao, T., Xie, K., Eskenazi, M.: Rethinking action spaces for reinforcement learning in end-to-end dialog agents with latent variable models. In: NAACL-HLT, Volume 1 (Long and Short Papers), Minneapolis, pp. 1208–1218, June 2019

    Google Scholar 

  12. Rastogi, A., Zang, X., Sunkara, S., Gupta, R., Khaitan, P.: Towards scalable multi-domain conversational agents: the schema-guided dialogue dataset. In: AAAI, New York, pp. 8689–8696, April 2020

    Google Scholar 

  13. Wen, T.H., et al.: A network-based end-to-end trainable task-oriented dialogue system. In: EACL, Valencia, pp. 438–449, January 2017

    Google Scholar 

  14. Williams, J., Raux, A., Ramachandran, D., Black, A.: The dialog state tracking challenge. In: SIGDIAL, Metz, pp. 404–413 (2013)

    Google Scholar 

  15. Henderson, M., Thomson, B., Williams, J.D.: The second dialog state tracking challenge. In: SIGDIAL, Philadelphia, pp. 263–272 (2014)

    Google Scholar 

  16. Eric, M., et al.: MultiWOZ 2.1: a consolidated multi-domain dialogue dataset with state corrections and state tracking baselines. In: LREC, Marseille, pp. 422–428 (2020)

    Google Scholar 

  17. Zhu, Q., Huang, K., Zhang, Z., Zhu, X., Huang, M.: CrossWOZ: a large-scale chinese cross-domain task-oriented dialogue dataset. In: TACL, 8, pp. 281–295 (2020)

    Google Scholar 

  18. Zang, X., Rastogi, A., Sunkara, S., Gupta, R., Zhang, J., Chen, J.: MultiWOZ 2.2: a dialogue dataset with additional annotation corrections and state tracking baselines. In: ACL, pp. 109–117 (2020)

    Google Scholar 

  19. Zhu, Q., et al.: ConvLab-2: an open-source toolkit for building, evaluating, and diagnosing dialogue systems. In: ACL, System Demonstrations, pp. 142–149, July 2020

    Google Scholar 

  20. Gao, S., Sethi, A., Agarwal, S., Chung, T., Hakkani-Tur, D., AI, A.A.: Dialog state tracking: a neural reading comprehension approach. In: SIGDIAL, Stockholm, pp. 264–273 (2019)

    Google Scholar 

  21. Wu, C.S., Madotto, A., Hosseini-Asl, E., Xiong, C., Socher, R., Fung, P.: Transferable multi-domain state generator for task-oriented dialogue systems. In: ACL, Florence, pp. 808–819, July 2019

    Google Scholar 

  22. Lee, H., Lee, J., Kim, T.Y.: SUMBT: slot-utterance matching for universal and scalable belief tracking. In: ACL, Florence, pp. 5478–5483, July 2019

    Google Scholar 

  23. Zhou, L., Small, K.: Multi-domain dialogue state tracking as dynamic knowledge graph enhanced question answering. arXiv preprint arXiv:1911.06192 (2019)

  24. Heck, M., et al.: TripPy: a triple copy strategy for value independent neural dialog state tracking. In: SIGDIAL, pp. 35–44, July 2020

    Google Scholar 

  25. Pan, Z., Bai, K., Wang, Y., Zhou, L., Liu, X.: Improving open-domain dialogue systems via multi-turn incomplete utterance restoration. In: EMNLP-IJCNLP, Hong Kong, pp. 1824–1833, November 2019

    Google Scholar 

  26. Quan, J., Xiong, D., Webber, B., Hu, C.: GECOR: an end-to-end generative ellipsis and co-reference resolution model for task-oriented dialogue. In: EMNLP-IJCNLP, Hong Kong, pp. 4539–4549, November 2019

    Google Scholar 

  27. Su, H., et al.: Improving multi-turn dialogue modelling with utterance ReWriter. In: ACL, Florence, pp. 22–31, July 2019

    Google Scholar 

  28. Ferreira Cruz, A., Rocha, G., Lopes Cardoso, H.: Coreference resolution: toward end-to-end and cross-lingual systems. Information 11(2), 2078–2489 (2020)

    Google Scholar 

  29. Lee, S., et al.: ConvLab: multi-domain end-to-end dialog system platform. In: ACL, Florence, pp. 64–69, July 2019

    Google Scholar 

  30. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT, Volume 1 (Long and Short Papers), pp. 4171–4186. Minneapolis, June 2019

    Google Scholar 

  31. Chen, Q., Zhuo, Z., Wang, W.: Bert for joint intent classification and slot filling. arXiv preprint arXiv:1902.10909 (2019)

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Wei Peng or Minlie Huang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Han, T. et al. (2021). MultiWOZ 2.3: A Multi-domain Task-Oriented Dialogue Dataset Enhanced with Annotation Corrections and Co-Reference Annotation. In: Wang, L., Feng, Y., Hong, Y., He, R. (eds) Natural Language Processing and Chinese Computing. NLPCC 2021. Lecture Notes in Computer Science(), vol 13029. Springer, Cham. https://doi.org/10.1007/978-3-030-88483-3_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-88483-3_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-88482-6

  • Online ISBN: 978-3-030-88483-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics