Skip to main content

Advertisement

Log in

FewJoint: few-shot learning for joint dialogue understanding

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Few-shot learning (FSL) is one of the key future steps in machine learning and raises a lot of attention. In this paper, we focus on the FSL problem of dialogue understanding, which contains two closely related tasks: intent detection and slot filling. Dialogue understanding has been proven to benefit a lot from jointly learning the two sub-tasks. However, such joint learning becomes challenging in the few-shot scenarios: on the one hand, the sparsity of samples greatly magnifies the difficulty of modeling the connection between the two tasks; on the other hand, how to jointly learn multiple tasks in the few-shot setting is still less investigated. In response to this, we introduce FewJoint, the first FSL benchmark for joint dialogue understanding. FewJoint provides a new corpus with 59 different dialogue domains from real industrial API and a code platform to ease FSL experiment set-up, which are expected to advance the research of this field. Further, we find that insufficient performance of the few-shot setting often leads to noisy sharing between two sub-task and disturbs joint learning. To tackle this, we guide slot with explicit intent information and propose a novel trust gating mechanism that blocks low-confidence intent information to ensure high quality sharing. Besides, we introduce a Reptile-based meta-learning strategy to achieve better generalization in unseen few-shot domains. In the experiments, the proposed method brings significant improvements on two datasets and achieve new state-of-the-art performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Notes

  1. The benchmark is used in the FSL contest of SMP2020-ECDT task-1. The dataset and FSL platform is available at https://github.com/AtmaHou/MetaDialog and the code for our proposed model will be released after the reviewing process.

  2. In practice, we find the averaged token embedding is better in representing a sentence than [CLS] token embedding.

  3. http://aiui.xfyun.cn/index-aiui.

  4. Benchmark users are free to re-construct training set into any format.

  5. The Evaluation of Chinese Human-Computer Dialogue Technology, SMP2020-ECDT task-1. Link: https://smp2020.aconf.cn/smp.html.

  6. We choose 1 and 5 shots because they have been a common experiment setting in a few-shot learning study.

  7. https://github.com/AtmaHou/MetaDialog

  8. Note that baseline results on FewJoint are slightly higher than those reported in ConProm paper[27]. This is because we conduct experiments on a refined version of FewJoint, which fixes errors in the original version.

  9. During the simulation of 1-shot scenarios, each slot tag is sampled to appear at least 1 time, which lead to over-sampling of intents with much more co-occurring slots.

References

  1. Alex N, Lifland E, Tunstall L, et al (2021) Raft: A real-world few-shot text classification benchmark. In: NeurIPS Datasets and Benchmarks Track (Round 2)

  2. Antoniou A, Edwards H, Storkey A (2019) How to train your maml. In: Proc. of ICLR

  3. Baik S, Choi M, Choi J, et al (2020) Meta-learning with adaptive hyperparameters. In: NeurIPS

  4. Bao Y, Wu M, Chang S, et al (2019) Few-shot text classification with distributional signatures. In: Proc. of ICLR

  5. Bhathiya HS, Thayasivam U (2020) Meta learning for few-shot joint intent detection and slot-filling. In: ICMLT, pp 86–92

  6. Budzianowski P, Wen TH, Tseng BH, et al (2018) Multiwoz-a large-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modelling. In: Proc. of EMNLP, pp 5016–5026

  7. Cao K, Brbic M, Leskovec J (2021) Concept learners for few-shot learning. In: Proc. of ICLR

  8. Chada R, Natarajan P (2021) Fewshotqa: A simple framework for few-shot learning of question answering tasks using pre-trained text-to-text models. In: Proc. of EMNLP, pp 6081–6090

  9. Chen Q, Zhuo Z, Wang W (2019) Bert for joint intent classification and slot filling. arXiv preprint arXiv:1902.10909

  10. Chen Z, Ge J, Zhan H, et al (2021) Pareto self-supervised training for few-shot learning. In: Proc. of CVPR, pp 13,663–13,672

  11. Coucke A, Saade A, Ball A, et al (2018) Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces. CoRR abs/1805.10190

  12. Das SSS, Katiyar A, Passonneau RJ, et al (2022) Container: Few-shot named entity recognition via contrastive learning. In: Proc. of ACL

  13. Devlin J, Chang M, Lee K, et al (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proc. of NAACL-HLT, pp 4171–4186

  14. Ding N, Xu G, Chen Y, et al (2021) Few-nerd: A few-shot named entity recognition dataset. In: Proc. of ACL-IJCNLP, pp 3198–3213

  15. Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research 12(7)

  16. Eric M, Goel R, Paul S, et al (2020) Multiwoz 2.1: A consolidated multi-domain dialogue dataset with state corrections and state tracking baselines. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp 422–428

  17. Fei-Fei L (2006) Knowledge transfer in learning to recognize visual objects classes. In: ICDL, pp 1–8

  18. Fei-Fei L, Fergus R, Perona P (2006) One-shot learning of object categories. IEEE transactions on pattern analysis and machine intelligence 28(4):594–611

    Article  Google Scholar 

  19. Fink M (2005) Object classification from a single example utilizing class relevance metrics. In: NeurIPS, pp 449–456

  20. Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML, pp 1126–1135

  21. Gao T, Han X, Zhu H, et al (2019) Fewrel 2.0: Towards more challenging few-shot relation classification. In: Proc. of EMNLP-IJCNLP

  22. Goo CW, Gao G, Hsu YK, et al (2018) Slot-gated modeling for joint slot filling and intent prediction. In: Proc. of NAACL-HLT, pp 753–757

  23. Goyal P, Dollár P, Girshick R, et al (2017) Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677

  24. Henderson M, Vulić I (2021) Convex: Data-efficient and few-shot slot labeling. In: Proc. of NAACL-HLT, pp 3375–3389

  25. Hou Y, Liu Y, Che W, et al (2018) Sequence-to-sequence data augmentation for dialogue language understanding. In: Proc. of COLING, pp 1234–1245

  26. Hou Y, Che W, Lai Y, et al (2020) Few-shot slot tagging with collapsed dependency transfer and label-enhanced task-adaptive projection network. In: Proc. of ACL

  27. Hou Y, Lai Y, Chen C, et al (2021a) Learning to bridge metric spaces: Few-shot joint learning of intent detection and slot filling. In: Findings of ACL-IJCNLP, pp 3190–3200

  28. Hou Y, Lai Y, Wu Y, et al (2021b) Few-shot learning for multi-label intent detection. In: Proc. of AAAI, pp 13,036–13,044

  29. Kelley JF (1984) An iterative design methodology for user-friendly natural language office information applications. ACM Transactions on Information Systems (TOIS) 2(1):26–41

    Article  Google Scholar 

  30. Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. In: Proc. of ICLR

  31. Krone J, Zhang Y, Diab M (2020) Learning to classify intents and slot labels given a handful of examples. In: Proc. of the 2nd Workshop on Natural Language Processing for Conversational AI

  32. La Gatta V, Moscato V, Postiglione M, et al (2021) Few-shot named entity recognition with cloze questions. arXiv preprint arXiv:2111.12421

  33. Lake BM, Salakhutdinov R, Tenenbaum JB (2015) Human-level concept learning through probabilistic program induction. Science 350(6266):1332–1338

    Article  MathSciNet  Google Scholar 

  34. Lee K, Maji S, Ravichandran A, et al (2019) Meta-learning with differentiable convex optimization. In: Proc. of CVPR, pp 10,657–10,665

  35. Li C, Li L, Qi J (2018) A self-attentive model with gate mechanism for spoken language understanding. In: Proc. of EMNLP, pp 3824–3833

  36. Loshchilov I, Hutter F (2018) Decoupled weight decay regularization. In: Proc. of ICLR

  37. Malik V, Kumar A, Veppa J (2021) Exploring the limits of natural language inference based setup for few-shot intent detection. arXiv preprint arXiv:2112.07434

  38. Mangla P, Kumari N, Sinha A, et al (2020) Charting the right manifold: Manifold mixup for few-shot learning. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 2218–2227

  39. Meihao F (2021) Few-shot multi-hop question answering over knowledge base. arXiv preprint arXiv:2112.11909

  40. Miller EG, Matsakis NE, Viola PA (2000) Learning from one example through shared densities on transforms. In: Proc. of CVPR, pp 464–471

  41. Min S, Lewis M, Hajishirzi H, et al (2022) Noisy channel language model prompting for few-shot text classification. In: Proc. of ACL

  42. Mittal A, Bharadwaj S, Khare S, et al (2021) Representation based meta-learning for few-shot spoken intent recognition. arXiv preprint arXiv:2106.15238

  43. Mukherjee S, Liu X, Zheng G, et al (2021) Clues: Few-shot learning evaluation in natural language understanding. arXiv preprint arXiv:2111.02570

  44. Nichol A, Achiam J, Schulman J (2018) On first-order meta-learning algorithms. arXiv preprint arXiv:1803.02999

  45. Oguz C, Vu NT (2021) Few-shot learning for slot tagging with attentive relational network. In: Proc. of EACL: Main Volume, pp 1566–1572

  46. Peng B, Zhu C, Li C, et al (2020) Few-shot natural language generation for task-oriented dialog. In: Findings of EMNLP, pp 172–182

  47. Qin L, Che W, Li Y, et al (2019) A stack-propagation framework with token-level intent detection for spoken language understanding. In: Proc. of EMNLP-IJCNLP, pp 2078–2087

  48. Qin L, Xu X, Che W, et al (2020) Towards fine-grained transfer: An adaptive graph-interactive framework for joint multiple intent detection and slot filling. In: Findings of EMNLP, pp 1807–1816

  49. Qin L, Li Z, Che W, et al (2021a) Co-gat: A co-interactive graph attention network for joint dialog act recognition and sentiment classification. In: Proc. of AAAI, pp 13,709–13,717

  50. Qin L, Liu T, Che W, et al (2021b) A co-interactive transformer for joint slot filling and intent detection. In: ICASSP, pp 8193–8197

  51. Reimers N, Gurevych I (2017) Reporting score distributions makes a difference: Performance study of lstm-networks for sequence tagging. In: Proc. of EMNLP

  52. Rizve MN, Khan S, Khan FS, et al (2021) Exploring complementary strengths of invariant and equivariant representations for few-shot learning. In: Proc. of CVPR, pp 10,836–10,846

  53. Rusu AA, Rao D, Sygnowski J, et al (2018) Meta-learning with latent embedding optimization. In: Proc. of ICLR

  54. Snell J, Swersky K, Zemel R (2017) Prototypical networks for few-shot learning. In: NeurIPS, pp 4077–4087

  55. Tian Y, Wang Y, Krishnan D, et al (2020) Rethinking few-shot image classification: a good embedding is all you need? In: ECCV, pp 266–282

  56. Tong M, Wang S, Xu B, et al (2021) Learning from miscellaneous other-class words for few-shot named entity recognition. In: Proc. of ACL-IJCNLP

  57. Triantafillou E, Larochelle H, Zemel R, et al (2021) Learning a universal template for few-shot dataset generalization. In: ICML, pp 10,424–10,433

  58. Vinyals O, Blundell C, Lillicrap T, et al (2016) Matching networks for one shot learning. In: NeurIPS

  59. Wang B, Li L, Verma M, et al (2021a) Mtunet: Few-shot image classification with visual explanations. In: Proc. of CVPR, pp 2294–2298

  60. Wang H, Wang Z, Fung GPC, et al (2021b) Mcml: A novel memory-based contrastive meta-learning method for few shot slot tagging. arXiv preprint arXiv:2108.11635

  61. Wang J, Wang KC, Rudzicz F, et al (2021c) Grad2task: Improved few-shot text classification using gradients for task representation. In: NeurIPS

  62. Wang Y, Chu H, Zhang C, et al (2021d) Learning from language description: Low-shot named entity recognition via decomposed framework. In: Findings of EMNLP, pp 1618–1630

  63. Wei P, Zeng B, Liao W (2022) Joint intent detection and slot filling with wheel-graph attention networks. Journal of Intelligent & Fuzzy Systems pp 2409–2420

  64. Worsham J, Kalita J (2020) Multi-task learning for natural language processing in the 2020s: where are we going? Pattern Recognition Letters

  65. Xu L, Lu X, Yuan C, et al (2021) Fewclue: A chinese few-shot learning evaluation benchmark. arXiv preprint arXiv:2107.07498

  66. Xu W, Wang H, Tu Z, et al (2020) Attentional constellation nets for few-shot learning. In: Proc. of ICLR

  67. Yang S, Zhang Y, Niu G, et al (2021) Entity concept-enhanced few-shot relation extraction. In: Proc. of ACL (Volume 2: Short Papers), pp 987–991

  68. Ye HJ, Hu H, Zhan DC, et al (2020) Few-shot learning via embedding adaptation with set-to-set functions. In: Proc. of CVPR, pp 8808–8817

  69. Young S, Gašić M, Thomson B, et al (2013) Pomdp-based statistical spoken dialog systems: A review. In: IEEE, pp 1160–1179

  70. Yu D, He L, Zhang Y, et al (2021) Few-shot intent classification and slot filling with retrieved examples. In: Proc. of NAACL-HLT

  71. Zhang C, Cai Y, Lin G, et al (2020) Deepemd: Few-shot image classification with differentiable earth mover’s distance and structured classifiers. In: Proc. of CVPR, pp 12,203–12,213

  72. Zhang J, Bui T, Yoon S, et al (2021a) Few-shot intent detection via contrastive pre-training and fine-tuning. In: Proc. of EMNLP, pp 1906–1912

  73. Zhang L, Shi Y, Shou L, et al (2021b) A joint and domain-adaptive approach to spoken language understanding. arXiv preprint arXiv:2107.11768

  74. Zheng Y, Zhou J, Qian Y, et al (2022) Fewnlu: Benchmarking state-of-the-art methods for few-shot natural language understanding. In: Proc. of ACL

  75. Zhu Q, Huang K, Zhang Z et al (2020) Crosswoz: A large-scale chinese cross-domain task-oriented dialogue dataset. Transactions of the Association for Computational Linguistics 8:281–295

    Article  Google Scholar 

Download references

Acknowledgments

We are grateful for the helpful comments and suggestions from the anonymous reviewers. This work was supported by the National Key R&D Program of China via grant 2020AAA0106501 and the National Natural Science Foundation of China (NSFC) via grants 61976072 and 62176078.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wanxiang Che.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hou, Y., Wang, X., Chen, C. et al. FewJoint: few-shot learning for joint dialogue understanding. Int. J. Mach. Learn. & Cyber. 13, 3409–3423 (2022). https://doi.org/10.1007/s13042-022-01604-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-022-01604-9

Keywords