Abstract
Few-shot text classification targets at the situation where a model is developed to classify newly incoming query instances after acquiring knowledge from a few support instances. In this paper, we investigate few-shot text classification under a metric-based meta-learning framework. While the representations of the query and support instances are the key to the classification, existing study handles them independently in the text encoding stage. To better describe the classification features, we propose to exploit their interaction with adapted bi-directional attention mechanism. Moreover, distinct from previous approaches that encode different classes individually, we leverage the underlying cross-class knowledge for classification. To this end, we conceive the learning target by incorporating the large margin loss, which is expected to shorten the intra-class distances while enlarging the inter-class distances. To validate the design, we conduct extensive experiments on three datasets, and the experimental results demonstrate that our solution outperforms its state-of-the-art competitors. Detailed analyses also reveal that the bi-directional attention and the cross-class knowledge both contribute to the overall performance.
Similar content being viewed by others
References
Pang B, Lee L. Opinion mining and sentiment analysis. FNT Inf Retrieval, 2008, 2: 1–135
Aggarwal C C, Zhai C. A survey of text classification algorithms. In: Proceedings of Mining Text Data, 2012. 163–222
Zhang X, Zhao J, LeCun Y. Character-level convolutional networks for text classification. In: Proceedings of Advances in Neural Information Processing Systems, 2015. 649–657
Kim Y. Convolutional neural networks for sentence classification. 2014. ArXiv: 1408.5882
Yao L, Mao C, Luo Y. Graph convolutional networks for text classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2019. 7370–7377
Li F-F, Fergus R, Perona P. One-shot learning of object categories. IEEE Trans Pattern Anal Machine Intell, 2006, 28: 594–611
Sung F, Yang Y, Zhang L, et al. Learning to compare: relation network for few-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 1199–1208
Munkhdalai T, Yu H. Meta networks. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, 2017. 2554–2563
Snell J, Swersky K, Zemel R S. Prototypical networks for few-shot learning. In: Proceedings of Advances in Neural Information Processing Systems, Long Beach, 2017. 4077–4087
Vinyals O, Blundell C, Lillicrap T, et al. Matching networks for one shot learning. In: Proceedings of Advances in Neural Information Processing Systems, Barcelona, 2016. 3630–3638
Koch G, Zemel R, Salakhutdinov R. Siamese neural networks for one-shot image recognition. In: Proceedings of ICML Deep Learning Workshop, 2015
Yu M, Guo X, Yi J, et al. Diverse few-shot text classification with multiple metrics. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, 2018. 1206–1215
Han X, Zhu H, Yu P, et al. Fewrel: a large-scale supervised few-shot relation classification dataset with state-of-the-art evaluation. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, 2018. 4803–4809
Gao T, Han X, Liu Z, et al. Hybrid attention-based prototypical networks for noisy few-shot relation classification. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence, 2019. 6407–6414
Ye Z, Ling Z. Multi-level matching and aggregation network for few-shot relation classification. In: Proceedings of the 57th Conference of the Association for Computational Linguistics, Florence, 2019. 2872–2881
Bao Y, Wu M, Chang S, et al. Few-shot text classification with distributional signatures. In: Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, 2020
Seo M, Kembhavi A, Farhadi A, et al. Bidirectional attention flow for machine comprehension. 2016. ArXiv: 1611.01603
Yang Z, Yang D, Dyer C, et al. Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016. 1480–1489
Tao H, Tong S, Zhao H, et al. A radical-aware attention-based model for chinese text classification. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence, 2019
Miller E G, Matsakis N E, Viola P A. Learning from one example through shared densities on transforms. In: Proceedings of Conference on Computer Vision and Pattern Recognition, Hilton Head, 2000. 1464–1471
Santoro A, Bartunov S, Botvinick M, et al. Meta-learning with memory-augmented neural networks. In: Proceedings of the 33rd International Conference on Machine Learning, New York City, 2016. 1842–1850
Mishra N, Rohaninejad M, Chen X, et al. A simple neural attentive meta-learner. In: Proceedings of the 6th International Conference on Learning Representations, Vancouver, 2018
Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the 34th International Conference on Machine Learning, Sydney, 2017. 1126–1135
Al-Shedivat M, Bansal T, Burda Y, et al. Continuous adaptation via meta-learning in nonstationary and competitive environments. In: Proceedings of the 6th International Conference on Learning Representations, Vancouver, 2018
Bertinetto L, Henriques J F, Torr P H S, et al. Meta-learning with differentiable closed-form solvers. In: Proceedings of the 7th International Conference on Learning Representations, New Orleans, 2019
Lam W, Lai K Y. A meta-learning approach for text categorization. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2001. 303–309
Jiang X, Havaei M, Chartrand G, et al. Attentive task-agnostic meta-learning for few-shot text classification. In: Proceedings of International Conference on Learning Representations, 2019
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Proceedings of Advances in Neural Information Processing Systems, Long Beach, 2017. 5998–6008
Ji G, Liu K, He S, et al. Distant supervision for relation extraction with sentence-level attention and entity descriptions. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco, 2017. 3060–3066
Wu L, Zhang H, Yang Y, et al. Dynamic prototype selection by fusing attention mechanism for few-shot relation classification. In: Proceedings of the 12th Asian Conference Intelligent Information and Database Systems, Phuket, 2020. 431–441
Pennington J, Socher R, Manning C D. Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, 2014. 1532–1543
Devlin J, Chang M W, Lee K, et al. Bert: pre-training of deep bidirectional transformers for language understanding. 2018. ArXiv: 1810.04805
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput, 1997, 9: 1735–1780
Lewis D. Reuters-21578 text categorization test collection, distribution 1.0. 1997. http://www.research/.att.com
Chen W, Liu Y, Kira Z, et al. A closer look at few-shot classification. In: Proceedings of the 7th International Conference on Learning Representations, New Orleans, 2019
Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res, 2014, 15: 1929–1958
Maaten L V D, Hinton G. Visualizing data using t-SNE. J Mach Learn Res, 2008, 9: 2579–2605
Acknowledgements
This work was partially supported by National Natural Science Foundation of China (Grant Nos. 61872446, U19B2024), Natural Science Foundation of Hunan Province (Grant No. 2019JJ20024), and the Science and Technology Innovation Program of Hunan Province (Grant No. 2020RC4046).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Pang, N., Zhao, X., Wang, W. et al. Few-shot text classification by leveraging bi-directional attention and cross-class knowledge. Sci. China Inf. Sci. 64, 130103 (2021). https://doi.org/10.1007/s11432-020-3055-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-020-3055-1