Skip to main content
Log in

CLINER: exploring task-relevant features and label semantic for few-shot named entity recognition

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Few-shot named entity recognition aims at recognizing novel-class named entities in low resources scenarios. Low resource scenarios contain limited data in the support set with sparse labels. Existing methods neglect the relevance of the support set to the task and the semantics of label naming. In this paper, on the basis of contrastive learning, we propose a multi-task learning framework CLINER for Few-Shot NER. We construct a mechanism for joint learning of label semantic information and support set information. For label support set information, we find a view in the support set that is most relevant to the current task, maximizing the utilization of each support set. Momentum encoder, a dynamic queue, is constructed to keep track of positive and negative examples learned from previous support sets, and keep it updated. For label semantic information, it is implied in the label naming and is derived explicitly by pre-trained language encoder. Experiments demonstrate that our model improves the overall performance comparing with recent baseline models, achieves state-of-the-art results on the commonly used standard datasets. The source code of CLINER will be available at: https://github.com/yizumi426/CLINER.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

The datasets analyzed during the current study are available in the Few-NERD repository.

References

  1. Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C (2016) Neural architectures for named entity recognition. In: Proceedings of NAACL-HLT, pp 260–270

  2. Ma X, Hovy E (2016) End-to-end sequence labeling via bi-directional lstm-cnns-crf. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 1064–1074

  3. Chiu JP, Nichols E (2016) Named entity recognition with bidirectional lstm-cnns. Trans Assoc Comput Linguist 4:357–370

    Article  Google Scholar 

  4. Peters ME, Ammar W, Bhagavatula C, Power R (2017) Semi-supervised sequence tagging with bidirectional language models. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Vol 1: Long Papers), pp 1756–1765

  5. Kenton JDM-WC, Toutanova LK (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, pp 4171–4186

  6. Hofer M, Kormilitzin A, Goldberg P, Nevado-Holgado A (2018) Few-shot learning for named entity recognition in medical text. arXiv preprint arXiv:1811.05468

  7. Snell J, Swersky K, Zemel R (2017) Prototypical networks for few-shot learning. Adv Neural Inform Process Syst 30:4077–4087

  8. Fritzler A, Logacheva V, Kretov M (2019) Few-shot classification in named entity recognition task. In: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, pp 993–1000

  9. Hou Y, Che W, Lai Y, Zhou Z, Liu Y, Liu H, Liu T (2020) Few-shot slot tagging with collapsed dependency transfer and label-enhanced task-adaptive projection network. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp 1381–1393

  10. Yang Y, Katiyar A (2020) Simple and effective few-shot named entity recognition with structured nearest neighbor learning. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 6365–6375

  11. Lafferty JD, McCallum A, Pereira FC (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp 282–289

  12. Lin D, Wu X (2009) Phrase clustering for discriminative learning. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp 1030–1038

  13. Turian J, Ratinov L, Bengio Y (2010) Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp 384–394

  14. Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pp 2227–2237

  15. Akbik A, Blythe D, Vollgraf R (2018) Contextual string embeddings for sequence labeling. In: Proceedings of the 27th International Conference on Computational Linguistics, pp 1638–1649

  16. Yamada I, Asai A, Shindo H, Takeda H, Matsumoto Y (2020) Luke: Deep contextualized entity representations with entity-aware self-attention. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 6442–6454

  17. Han X, Zhu H, Yu P, Wang Z, Yao Y, Liu Z, Sun M (2018) Fewrel: A large-scale supervised few-shot relation classification dataset with state-of-the-art evaluation. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp 4803–4809

  18. Geng R, Li B, Li Y, Sun J, Zhu X (2020) Dynamic memory induction networks for few-shot text classification. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp 1087–1094

  19. Wang P, Xun R, Liu T, Dai D, Chang B, Sui Z (2021) Behind the scenes: an exploration of trigger biases problem in few-shot event classification. In: Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp 1969–1978

  20. Das SSS, Katiyar A, Passonneau RJ, Zhang R (2022) Container: few-shot named entity recognition via contrastive learning. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 6338–6353

  21. Huang J, Li C, Subudhi K, Jose D, Balakrishnan S, Chen W, Peng B, Gao J, Han J (2021) Few-shot named entity recognition: an empirical baseline study. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp 10408–10423

  22. Vinyals O, Blundell C, Lillicrap T, Wierstra D, et al (2016) Matching networks for one shot learning. Adv Neural Inform Process Syst 29:3630–3638

  23. Gu J, Wang Y, Chen Y, Li VO, Cho K (2018) Meta-learning for low-resource neural machine translation. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp 3622–3631

  24. Deng S, Zhang N, Kang J, Zhang Y, Zhang W, Chen H (2020) Meta-learning with dynamic-memory-based prototypical network for few-shot event detection. In: Proceedings of the 13th International Conference on Web Search and Data Mining, pp 151–159

  25. Wang Y, Chao W-L, Weinberger KQ, van der Maaten L (2019) Simpleshot: revisiting nearest-neighbor classification for few-shot learning. arXiv preprint arXiv:1911.04623

  26. Ma J, Ballesteros M, Doss S, Anubhai R, Mallya S, Al-Onaizan Y, Roth D (2022) Label semantics for few shot named entity recognition. In: Findings of the Association for Computational Linguistics: ACL 2022, pp 1956–1971

  27. Cui L, Wu Y, Liu J, Yang S, Zhang Y (2021) Template-based named entity recognition using bart. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp 1835–1845

  28. Hadsell R, Chopra S, LeCun Y (2006) Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), vol 2, pp 1735–1742. IEEE

  29. Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: Proceedings of the 37th International Conference on Machine Learning, pp 1597–1607

  30. He K, Fan H, Wu Y, Xie S, Girshick R (2020) Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9729–9738

  31. Doersch C, Zisserman A (2017) Multi-task self-supervised visual learning. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2051–2060

  32. Ye M, Zhang X, Yuen PC, Chang S-F (2019) Unsupervised embedding learning via invariant and spreading instance feature. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6210–6219

  33. Kim T, Yoo KM, Lee S-g (2021) Self-guided contrastive learning for bert sentence representations. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp 2528–2540

  34. Gunel B, Du J, Conneau A, Stoyanov V (2020) Supervised contrastive learning for pre-trained language model fine-tuning. arXiv preprint arXiv:2011.01403

  35. Gao T, Yao X, Chen D (2021) Simcse: simple contrastive learning of sentence embeddings. In: 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, pp 6894–6910. Association for Computational Linguistics (ACL)

  36. Athiwaratkun B, Wilson AG, Anandkumar A (2018) Probabilistic fasttext for multi-sense word embeddings. arXiv preprint arXiv:1806.02901

  37. Li H, Eigen D, Dodge S, Zeiler M, Wang X (2019) Finding task-relevant features for few-shot learning by category traversal. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1–10

  38. Ding N, Xu G, Chen Y, Wang X, Han X, Xie P, Zheng H, Liu Z (2021) Few-nerd: a few-shot named entity recognition dataset. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp 3198–3213

  39. Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M, et al (2019) Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771

  40. Su J, Cao J, Liu W, Ou Y (2021) Whitening sentence representations for better semantics and faster retrieval. arXiv preprint arXiv:2103.15316

Download references

Acknowledgements

This work is jointly supported by National Natural Science Foundation of China (61877043) and National Natural Science Foundation of China (61877044).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jian Yu.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, X., Li, X., Zhao, M. et al. CLINER: exploring task-relevant features and label semantic for few-shot named entity recognition. Neural Comput & Applic 36, 4679–4691 (2024). https://doi.org/10.1007/s00521-023-09285-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-09285-3

Keywords

Navigation