Abstract
Few-shot learning (FSL) is one of the key future steps in machine learning and raises a lot of attention. In this paper, we focus on the FSL problem of dialogue understanding, which contains two closely related tasks: intent detection and slot filling. Dialogue understanding has been proven to benefit a lot from jointly learning the two sub-tasks. However, such joint learning becomes challenging in the few-shot scenarios: on the one hand, the sparsity of samples greatly magnifies the difficulty of modeling the connection between the two tasks; on the other hand, how to jointly learn multiple tasks in the few-shot setting is still less investigated. In response to this, we introduce FewJoint, the first FSL benchmark for joint dialogue understanding. FewJoint provides a new corpus with 59 different dialogue domains from real industrial API and a code platform to ease FSL experiment set-up, which are expected to advance the research of this field. Further, we find that insufficient performance of the few-shot setting often leads to noisy sharing between two sub-task and disturbs joint learning. To tackle this, we guide slot with explicit intent information and propose a novel trust gating mechanism that blocks low-confidence intent information to ensure high quality sharing. Besides, we introduce a Reptile-based meta-learning strategy to achieve better generalization in unseen few-shot domains. In the experiments, the proposed method brings significant improvements on two datasets and achieve new state-of-the-art performance.



Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Notes
The benchmark is used in the FSL contest of SMP2020-ECDT task-1. The dataset and FSL platform is available at https://github.com/AtmaHou/MetaDialog and the code for our proposed model will be released after the reviewing process.
In practice, we find the averaged token embedding is better in representing a sentence than [CLS] token embedding.
Benchmark users are free to re-construct training set into any format.
The Evaluation of Chinese Human-Computer Dialogue Technology, SMP2020-ECDT task-1. Link: https://smp2020.aconf.cn/smp.html.
We choose 1 and 5 shots because they have been a common experiment setting in a few-shot learning study.
Note that baseline results on FewJoint are slightly higher than those reported in ConProm paper[27]. This is because we conduct experiments on a refined version of FewJoint, which fixes errors in the original version.
During the simulation of 1-shot scenarios, each slot tag is sampled to appear at least 1 time, which lead to over-sampling of intents with much more co-occurring slots.
References
Alex N, Lifland E, Tunstall L, et al (2021) Raft: A real-world few-shot text classification benchmark. In: NeurIPS Datasets and Benchmarks Track (Round 2)
Antoniou A, Edwards H, Storkey A (2019) How to train your maml. In: Proc. of ICLR
Baik S, Choi M, Choi J, et al (2020) Meta-learning with adaptive hyperparameters. In: NeurIPS
Bao Y, Wu M, Chang S, et al (2019) Few-shot text classification with distributional signatures. In: Proc. of ICLR
Bhathiya HS, Thayasivam U (2020) Meta learning for few-shot joint intent detection and slot-filling. In: ICMLT, pp 86–92
Budzianowski P, Wen TH, Tseng BH, et al (2018) Multiwoz-a large-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modelling. In: Proc. of EMNLP, pp 5016–5026
Cao K, Brbic M, Leskovec J (2021) Concept learners for few-shot learning. In: Proc. of ICLR
Chada R, Natarajan P (2021) Fewshotqa: A simple framework for few-shot learning of question answering tasks using pre-trained text-to-text models. In: Proc. of EMNLP, pp 6081–6090
Chen Q, Zhuo Z, Wang W (2019) Bert for joint intent classification and slot filling. arXiv preprint arXiv:1902.10909
Chen Z, Ge J, Zhan H, et al (2021) Pareto self-supervised training for few-shot learning. In: Proc. of CVPR, pp 13,663–13,672
Coucke A, Saade A, Ball A, et al (2018) Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces. CoRR abs/1805.10190
Das SSS, Katiyar A, Passonneau RJ, et al (2022) Container: Few-shot named entity recognition via contrastive learning. In: Proc. of ACL
Devlin J, Chang M, Lee K, et al (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proc. of NAACL-HLT, pp 4171–4186
Ding N, Xu G, Chen Y, et al (2021) Few-nerd: A few-shot named entity recognition dataset. In: Proc. of ACL-IJCNLP, pp 3198–3213
Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research 12(7)
Eric M, Goel R, Paul S, et al (2020) Multiwoz 2.1: A consolidated multi-domain dialogue dataset with state corrections and state tracking baselines. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp 422–428
Fei-Fei L (2006) Knowledge transfer in learning to recognize visual objects classes. In: ICDL, pp 1–8
Fei-Fei L, Fergus R, Perona P (2006) One-shot learning of object categories. IEEE transactions on pattern analysis and machine intelligence 28(4):594–611
Fink M (2005) Object classification from a single example utilizing class relevance metrics. In: NeurIPS, pp 449–456
Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML, pp 1126–1135
Gao T, Han X, Zhu H, et al (2019) Fewrel 2.0: Towards more challenging few-shot relation classification. In: Proc. of EMNLP-IJCNLP
Goo CW, Gao G, Hsu YK, et al (2018) Slot-gated modeling for joint slot filling and intent prediction. In: Proc. of NAACL-HLT, pp 753–757
Goyal P, Dollár P, Girshick R, et al (2017) Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677
Henderson M, Vulić I (2021) Convex: Data-efficient and few-shot slot labeling. In: Proc. of NAACL-HLT, pp 3375–3389
Hou Y, Liu Y, Che W, et al (2018) Sequence-to-sequence data augmentation for dialogue language understanding. In: Proc. of COLING, pp 1234–1245
Hou Y, Che W, Lai Y, et al (2020) Few-shot slot tagging with collapsed dependency transfer and label-enhanced task-adaptive projection network. In: Proc. of ACL
Hou Y, Lai Y, Chen C, et al (2021a) Learning to bridge metric spaces: Few-shot joint learning of intent detection and slot filling. In: Findings of ACL-IJCNLP, pp 3190–3200
Hou Y, Lai Y, Wu Y, et al (2021b) Few-shot learning for multi-label intent detection. In: Proc. of AAAI, pp 13,036–13,044
Kelley JF (1984) An iterative design methodology for user-friendly natural language office information applications. ACM Transactions on Information Systems (TOIS) 2(1):26–41
Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. In: Proc. of ICLR
Krone J, Zhang Y, Diab M (2020) Learning to classify intents and slot labels given a handful of examples. In: Proc. of the 2nd Workshop on Natural Language Processing for Conversational AI
La Gatta V, Moscato V, Postiglione M, et al (2021) Few-shot named entity recognition with cloze questions. arXiv preprint arXiv:2111.12421
Lake BM, Salakhutdinov R, Tenenbaum JB (2015) Human-level concept learning through probabilistic program induction. Science 350(6266):1332–1338
Lee K, Maji S, Ravichandran A, et al (2019) Meta-learning with differentiable convex optimization. In: Proc. of CVPR, pp 10,657–10,665
Li C, Li L, Qi J (2018) A self-attentive model with gate mechanism for spoken language understanding. In: Proc. of EMNLP, pp 3824–3833
Loshchilov I, Hutter F (2018) Decoupled weight decay regularization. In: Proc. of ICLR
Malik V, Kumar A, Veppa J (2021) Exploring the limits of natural language inference based setup for few-shot intent detection. arXiv preprint arXiv:2112.07434
Mangla P, Kumari N, Sinha A, et al (2020) Charting the right manifold: Manifold mixup for few-shot learning. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 2218–2227
Meihao F (2021) Few-shot multi-hop question answering over knowledge base. arXiv preprint arXiv:2112.11909
Miller EG, Matsakis NE, Viola PA (2000) Learning from one example through shared densities on transforms. In: Proc. of CVPR, pp 464–471
Min S, Lewis M, Hajishirzi H, et al (2022) Noisy channel language model prompting for few-shot text classification. In: Proc. of ACL
Mittal A, Bharadwaj S, Khare S, et al (2021) Representation based meta-learning for few-shot spoken intent recognition. arXiv preprint arXiv:2106.15238
Mukherjee S, Liu X, Zheng G, et al (2021) Clues: Few-shot learning evaluation in natural language understanding. arXiv preprint arXiv:2111.02570
Nichol A, Achiam J, Schulman J (2018) On first-order meta-learning algorithms. arXiv preprint arXiv:1803.02999
Oguz C, Vu NT (2021) Few-shot learning for slot tagging with attentive relational network. In: Proc. of EACL: Main Volume, pp 1566–1572
Peng B, Zhu C, Li C, et al (2020) Few-shot natural language generation for task-oriented dialog. In: Findings of EMNLP, pp 172–182
Qin L, Che W, Li Y, et al (2019) A stack-propagation framework with token-level intent detection for spoken language understanding. In: Proc. of EMNLP-IJCNLP, pp 2078–2087
Qin L, Xu X, Che W, et al (2020) Towards fine-grained transfer: An adaptive graph-interactive framework for joint multiple intent detection and slot filling. In: Findings of EMNLP, pp 1807–1816
Qin L, Li Z, Che W, et al (2021a) Co-gat: A co-interactive graph attention network for joint dialog act recognition and sentiment classification. In: Proc. of AAAI, pp 13,709–13,717
Qin L, Liu T, Che W, et al (2021b) A co-interactive transformer for joint slot filling and intent detection. In: ICASSP, pp 8193–8197
Reimers N, Gurevych I (2017) Reporting score distributions makes a difference: Performance study of lstm-networks for sequence tagging. In: Proc. of EMNLP
Rizve MN, Khan S, Khan FS, et al (2021) Exploring complementary strengths of invariant and equivariant representations for few-shot learning. In: Proc. of CVPR, pp 10,836–10,846
Rusu AA, Rao D, Sygnowski J, et al (2018) Meta-learning with latent embedding optimization. In: Proc. of ICLR
Snell J, Swersky K, Zemel R (2017) Prototypical networks for few-shot learning. In: NeurIPS, pp 4077–4087
Tian Y, Wang Y, Krishnan D, et al (2020) Rethinking few-shot image classification: a good embedding is all you need? In: ECCV, pp 266–282
Tong M, Wang S, Xu B, et al (2021) Learning from miscellaneous other-class words for few-shot named entity recognition. In: Proc. of ACL-IJCNLP
Triantafillou E, Larochelle H, Zemel R, et al (2021) Learning a universal template for few-shot dataset generalization. In: ICML, pp 10,424–10,433
Vinyals O, Blundell C, Lillicrap T, et al (2016) Matching networks for one shot learning. In: NeurIPS
Wang B, Li L, Verma M, et al (2021a) Mtunet: Few-shot image classification with visual explanations. In: Proc. of CVPR, pp 2294–2298
Wang H, Wang Z, Fung GPC, et al (2021b) Mcml: A novel memory-based contrastive meta-learning method for few shot slot tagging. arXiv preprint arXiv:2108.11635
Wang J, Wang KC, Rudzicz F, et al (2021c) Grad2task: Improved few-shot text classification using gradients for task representation. In: NeurIPS
Wang Y, Chu H, Zhang C, et al (2021d) Learning from language description: Low-shot named entity recognition via decomposed framework. In: Findings of EMNLP, pp 1618–1630
Wei P, Zeng B, Liao W (2022) Joint intent detection and slot filling with wheel-graph attention networks. Journal of Intelligent & Fuzzy Systems pp 2409–2420
Worsham J, Kalita J (2020) Multi-task learning for natural language processing in the 2020s: where are we going? Pattern Recognition Letters
Xu L, Lu X, Yuan C, et al (2021) Fewclue: A chinese few-shot learning evaluation benchmark. arXiv preprint arXiv:2107.07498
Xu W, Wang H, Tu Z, et al (2020) Attentional constellation nets for few-shot learning. In: Proc. of ICLR
Yang S, Zhang Y, Niu G, et al (2021) Entity concept-enhanced few-shot relation extraction. In: Proc. of ACL (Volume 2: Short Papers), pp 987–991
Ye HJ, Hu H, Zhan DC, et al (2020) Few-shot learning via embedding adaptation with set-to-set functions. In: Proc. of CVPR, pp 8808–8817
Young S, Gašić M, Thomson B, et al (2013) Pomdp-based statistical spoken dialog systems: A review. In: IEEE, pp 1160–1179
Yu D, He L, Zhang Y, et al (2021) Few-shot intent classification and slot filling with retrieved examples. In: Proc. of NAACL-HLT
Zhang C, Cai Y, Lin G, et al (2020) Deepemd: Few-shot image classification with differentiable earth mover’s distance and structured classifiers. In: Proc. of CVPR, pp 12,203–12,213
Zhang J, Bui T, Yoon S, et al (2021a) Few-shot intent detection via contrastive pre-training and fine-tuning. In: Proc. of EMNLP, pp 1906–1912
Zhang L, Shi Y, Shou L, et al (2021b) A joint and domain-adaptive approach to spoken language understanding. arXiv preprint arXiv:2107.11768
Zheng Y, Zhou J, Qian Y, et al (2022) Fewnlu: Benchmarking state-of-the-art methods for few-shot natural language understanding. In: Proc. of ACL
Zhu Q, Huang K, Zhang Z et al (2020) Crosswoz: A large-scale chinese cross-domain task-oriented dialogue dataset. Transactions of the Association for Computational Linguistics 8:281–295
Acknowledgments
We are grateful for the helpful comments and suggestions from the anonymous reviewers. This work was supported by the National Key R&D Program of China via grant 2020AAA0106501 and the National Natural Science Foundation of China (NSFC) via grants 61976072 and 62176078.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Hou, Y., Wang, X., Chen, C. et al. FewJoint: few-shot learning for joint dialogue understanding. Int. J. Mach. Learn. & Cyber. 13, 3409–3423 (2022). https://doi.org/10.1007/s13042-022-01604-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-022-01604-9