Few-shot text classification by leveraging bi-directional attention and cross-class knowledge

Pang, Ning; Zhao, Xiang; Wang, Wei; Xiao, Weidong; Guo, Deke

doi:10.1007/s11432-020-3055-1

Few-shot text classification by leveraging bi-directional attention and cross-class knowledge

Research Paper
Published: 07 February 2021

Volume 64, article number 130103, (2021)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

Ning Pang¹,
Xiang Zhao¹,
Wei Wang²,
Weidong Xiao¹ &
…
Deke Guo¹

286 Accesses
16 Citations
Explore all metrics

Abstract

Few-shot text classification targets at the situation where a model is developed to classify newly incoming query instances after acquiring knowledge from a few support instances. In this paper, we investigate few-shot text classification under a metric-based meta-learning framework. While the representations of the query and support instances are the key to the classification, existing study handles them independently in the text encoding stage. To better describe the classification features, we propose to exploit their interaction with adapted bi-directional attention mechanism. Moreover, distinct from previous approaches that encode different classes individually, we leverage the underlying cross-class knowledge for classification. To this end, we conceive the learning target by incorporating the large margin loss, which is expected to shorten the intra-class distances while enlarging the inter-class distances. To validate the design, we conduct extensive experiments on three datasets, and the experimental results demonstrate that our solution outperforms its state-of-the-art competitors. Detailed analyses also reveal that the bi-directional attention and the cross-class knowledge both contribute to the overall performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

TextConvoNet: a convolutional neural network based architecture for text classification

Article 22 October 2022

Learning with Noisy Correspondence

Article 13 April 2024

How to Fine-Tune BERT for Text Classification?

References

Pang B, Lee L. Opinion mining and sentiment analysis. FNT Inf Retrieval, 2008, 2: 1–135
Article Google Scholar
Aggarwal C C, Zhai C. A survey of text classification algorithms. In: Proceedings of Mining Text Data, 2012. 163–222
Zhang X, Zhao J, LeCun Y. Character-level convolutional networks for text classification. In: Proceedings of Advances in Neural Information Processing Systems, 2015. 649–657
Kim Y. Convolutional neural networks for sentence classification. 2014. ArXiv: 1408.5882
Yao L, Mao C, Luo Y. Graph convolutional networks for text classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2019. 7370–7377
Li F-F, Fergus R, Perona P. One-shot learning of object categories. IEEE Trans Pattern Anal Machine Intell, 2006, 28: 594–611
Article Google Scholar
Sung F, Yang Y, Zhang L, et al. Learning to compare: relation network for few-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 1199–1208
Munkhdalai T, Yu H. Meta networks. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, 2017. 2554–2563
Snell J, Swersky K, Zemel R S. Prototypical networks for few-shot learning. In: Proceedings of Advances in Neural Information Processing Systems, Long Beach, 2017. 4077–4087
Vinyals O, Blundell C, Lillicrap T, et al. Matching networks for one shot learning. In: Proceedings of Advances in Neural Information Processing Systems, Barcelona, 2016. 3630–3638
Koch G, Zemel R, Salakhutdinov R. Siamese neural networks for one-shot image recognition. In: Proceedings of ICML Deep Learning Workshop, 2015
Yu M, Guo X, Yi J, et al. Diverse few-shot text classification with multiple metrics. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, 2018. 1206–1215
Han X, Zhu H, Yu P, et al. Fewrel: a large-scale supervised few-shot relation classification dataset with state-of-the-art evaluation. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, 2018. 4803–4809
Gao T, Han X, Liu Z, et al. Hybrid attention-based prototypical networks for noisy few-shot relation classification. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence, 2019. 6407–6414
Ye Z, Ling Z. Multi-level matching and aggregation network for few-shot relation classification. In: Proceedings of the 57th Conference of the Association for Computational Linguistics, Florence, 2019. 2872–2881
Bao Y, Wu M, Chang S, et al. Few-shot text classification with distributional signatures. In: Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, 2020
Seo M, Kembhavi A, Farhadi A, et al. Bidirectional attention flow for machine comprehension. 2016. ArXiv: 1611.01603
Yang Z, Yang D, Dyer C, et al. Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016. 1480–1489
Tao H, Tong S, Zhao H, et al. A radical-aware attention-based model for chinese text classification. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence, 2019
Miller E G, Matsakis N E, Viola P A. Learning from one example through shared densities on transforms. In: Proceedings of Conference on Computer Vision and Pattern Recognition, Hilton Head, 2000. 1464–1471
Santoro A, Bartunov S, Botvinick M, et al. Meta-learning with memory-augmented neural networks. In: Proceedings of the 33rd International Conference on Machine Learning, New York City, 2016. 1842–1850
Mishra N, Rohaninejad M, Chen X, et al. A simple neural attentive meta-learner. In: Proceedings of the 6th International Conference on Learning Representations, Vancouver, 2018
Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the 34th International Conference on Machine Learning, Sydney, 2017. 1126–1135
Al-Shedivat M, Bansal T, Burda Y, et al. Continuous adaptation via meta-learning in nonstationary and competitive environments. In: Proceedings of the 6th International Conference on Learning Representations, Vancouver, 2018
Bertinetto L, Henriques J F, Torr P H S, et al. Meta-learning with differentiable closed-form solvers. In: Proceedings of the 7th International Conference on Learning Representations, New Orleans, 2019
Lam W, Lai K Y. A meta-learning approach for text categorization. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2001. 303–309
Jiang X, Havaei M, Chartrand G, et al. Attentive task-agnostic meta-learning for few-shot text classification. In: Proceedings of International Conference on Learning Representations, 2019
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Proceedings of Advances in Neural Information Processing Systems, Long Beach, 2017. 5998–6008
Ji G, Liu K, He S, et al. Distant supervision for relation extraction with sentence-level attention and entity descriptions. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco, 2017. 3060–3066
Wu L, Zhang H, Yang Y, et al. Dynamic prototype selection by fusing attention mechanism for few-shot relation classification. In: Proceedings of the 12th Asian Conference Intelligent Information and Database Systems, Phuket, 2020. 431–441
Pennington J, Socher R, Manning C D. Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, 2014. 1532–1543
Devlin J, Chang M W, Lee K, et al. Bert: pre-training of deep bidirectional transformers for language understanding. 2018. ArXiv: 1810.04805
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput, 1997, 9: 1735–1780
Article Google Scholar
Lewis D. Reuters-21578 text categorization test collection, distribution 1.0. 1997. http://www.research/.att.com
Chen W, Liu Y, Kira Z, et al. A closer look at few-shot classification. In: Proceedings of the 7th International Conference on Learning Representations, New Orleans, 2019
Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res, 2014, 15: 1929–1958
MathSciNet MATH Google Scholar
Maaten L V D, Hinton G. Visualizing data using t-SNE. J Mach Learn Res, 2008, 9: 2579–2605
MATH Google Scholar

Download references

Acknowledgements

This work was partially supported by National Natural Science Foundation of China (Grant Nos. 61872446, U19B2024), Natural Science Foundation of Hunan Province (Grant No. 2019JJ20024), and the Science and Technology Innovation Program of Hunan Province (Grant No. 2020RC4046).

Author information

Authors and Affiliations

Science and Technology on Information Engineering Laboratory, National University of Defense Technology, Changsha, 410000, China
Ning Pang, Xiang Zhao, Weidong Xiao & Deke Guo
School of Computer Science and Engineering, The University of New South Wales, Sydney, 2052, Australia
Wei Wang

Authors

Ning Pang
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Wei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Weidong Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Deke Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiang Zhao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pang, N., Zhao, X., Wang, W. et al. Few-shot text classification by leveraging bi-directional attention and cross-class knowledge. Sci. China Inf. Sci. 64, 130103 (2021). https://doi.org/10.1007/s11432-020-3055-1

Download citation

Received: 29 June 2020
Accepted: 30 July 2020
Published: 07 February 2021
DOI: https://doi.org/10.1007/s11432-020-3055-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Few-shot text classification by leveraging bi-directional attention and cross-class knowledge

Abstract

Access this article

Similar content being viewed by others

TextConvoNet: a convolutional neural network based architecture for text classification

Learning with Noisy Correspondence

How to Fine-Tune BERT for Text Classification?

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Few-shot text classification by leveraging bi-directional attention and cross-class knowledge

Abstract

Access this article

Similar content being viewed by others

TextConvoNet: a convolutional neural network based architecture for text classification

Learning with Noisy Correspondence

How to Fine-Tune BERT for Text Classification?

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation