Skip to main content

Advertisement

Log in

RepEKShot: an evidential k-nearest neighbor classifier with repulsion loss for few-shot named entity recognition

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Metric-based models have recently shown promising performance in the few-shot named entity recognition (NER) task. Many methods train their encoders with loss functions that focus on distinguishing different entity types, which ignores improving the ability to recognize ground-truth and interfered labels when making predictions. Furthermore, the inference strategy of nearest neighbor is popular for metric-based models. However, other surrounding neighbors can also provide useful information for NER, and it is hard to determine whether the nearest neighbor is the most suitable referent when multiple neighbors are all close to the query sample. To solve the above problems, we propose RepEKShot, a novel model which utilizes repulsion loss for training the encoder and extends the inference strategy from nearest neighbor to evidential k-nearest neighbor in the framework of Dempster–Shafer theory. Our model effectively optimizes the training of encoder, and sufficiently exploits the information provided by other neighbors to provide a more global perspective for few-shot NER. Extensive experiments have been conducted on two benchmarks with public datasets, and the results show that our model has performance merits in few-shot scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Data availability

The sources of datasets have been listed in the paper.

Notes

  1. Note that we only adopt the domain transfer scenario in the later benchmark, because the support sets for another tag set extension scenario are not publicly available.

  2. According to the original setting, in 1-shot setting, there are 200 support-query pairs for testing CoNLL, GUM and WNUT, and 100 pairs for OntoNotes. In 5-shot setting, all the datasets are tested with 100 support-query pairs.

  3. The data for CrossNER and Domain Transfer can be obtained from https://github.com/AtmaHou/FewShotTagging and https://github.com/asappresearch/structshot.

  4. https://huggingface.co/bert-base-uncased.

  5. https://huggingface.co/bert-base-cased.

References

  1. Hirschman L, Gaizauskas R (2001) Natural language question answering: the view from here. Nat Lang Eng 7(4):275–300. https://doi.org/10.1017/S1351324901002807

    Article  Google Scholar 

  2. Zou X (2020) A survey on application of knowledge graph. J Phys Conf Ser 1487:012016. https://doi.org/10.1088/1742-6596/1487/1/012016

    Article  Google Scholar 

  3. Chen H, Liu X, Yin D, Tang J (2017) A survey on dialogue systems: recent advances and new frontiers. ACM SIGKDD Explor Newsl 19(2):25–35. https://doi.org/10.1145/3166054.3166058

    Article  Google Scholar 

  4. Li J, Sun A, Han J, Li C (2020) A survey on deep learning for named entity recognition. IEEE Trans Knowl Data Eng 34(1):50–70. https://doi.org/10.1109/TKDE.2020.2981314

    Article  Google Scholar 

  5. Huang J, Li C, Subudhi K, Jose D, Balakrishnan S, Chen W, Peng B, Gao J, Han J (2021) Few-shot named entity recognition: an empirical baseline study. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp 10408–10423. https://doi.org/10.18653/v1/2021.emnlp-main.813

  6. Snell J, Swersky K, Zemel R (2017) Prototypical networks for few-shot learning. In: Advances in Neural Information Processing Systems, vol 30. https://proceedings.neurips.cc/paper_files/paper/2017/file/cb8da6767461f2812ae4290eac7cbc42-Paper.pdf

  7. Fritzler A, Logacheva V, Kretov M (2019) Few-shot classification in named entity recognition task. In: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, pp 993–1000. https://doi.org/10.1145/3297280.3297378

  8. Yang Y, Katiyar A (2020) Simple and effective few-shot named entity recognition with structured nearest neighbor learning. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 6365–6375. https://doi.org/10.18653/v1/2020.emnlp-main.516

  9. Das SSS, Katiyar A, Passonneau RJ, Zhang R (2022) Container: few-shot named entity recognition via contrastive learning. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (vol 1: Long Papers), pp 6338–6353. https://doi.org/10.18653/v1/2022.acl-long.439

  10. Cao J, Gao Y, Huang H (2022) A prototype-based few-shot named entity recognition. In: Proceedings of the 8th International Conference on Computing and Artificial Intelligence, pp 338–343. https://doi.org/10.1145/3532213.3532263

  11. Wang X, Xiao T, Jiang Y, Shao S, Sun J, Shen C (2018) Repulsion loss: detecting pedestrians in a crowd. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7774–7783. https://doi.org/10.1109/CVPR.2018.00811

  12. Sachdeva R, Cordeiro FR, Belagiannis V, Reid I, Carneiro G (2023) Scanmix: learning from severe label noise via semantic clustering and semi-supervised learning. Pattern Recogn 134:109121. https://doi.org/10.1016/j.patcog.2022.109121

    Article  Google Scholar 

  13. Zhang G, Zhang S, Yuan G (2024) Bayesian graph local extrema convolution with long-tail strategy for misinformation detection. ACM Trans Knowl Discov Data. https://doi.org/10.1145/3639408

    Article  Google Scholar 

  14. Tong M, Wang S, Xu B, Cao Y, Liu M, Hou L, Li J (2021) Learning from miscellaneous other-class words for few-shot named entity recognition. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (vol 1: Long Papers), pp 6236–6247. https://doi.org/10.18653/v1/2021.acl-long.487

  15. Shafer G (1976) A mathematical theory of evidence. Princeton University Press, Princeton

    Book  Google Scholar 

  16. Denoeux T (1995) A k-nearest neighbor classification rule based on Dempster–Shafer theory. IEEE Trans Syst Man Cybern 25(5):804–813. https://doi.org/10.1109/21.376493

    Article  Google Scholar 

  17. Huang Y, He K, Wang Y, Zhang X, Gong T, Mao R, Li C (2022) Copner: Contrastive learning with prompt guiding for few-shot named entity recognition. In: Proceedings of the 29th International Conference on Computational Linguistics, pp 2515–2527. https://aclanthology.org/2022.coling-1.222

  18. Ding N, Xu G, Chen Y, Wang X, Han X, Xie P, Zheng H, Liu Z (2021) Few-nerd: a few-shot named entity recognition dataset. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (vol 1: Long Papers), pp 3198–3213. https://doi.org/10.18653/v1/2021.acl-long.248

  19. Huang T, Zhang M, Liu K, Li X, Wang Y (2023) Enhanced prototypical network for few-shot named entity recognition. In: International Artificial Intelligence Conference, pp 156–170. https://doi.org/10.1007/978-981-97-1277-9_12

  20. Ma J, Ballesteros M, Doss S, Anubhai R, Mallya S, Al-Onaizan Y, Roth D (2022) Label semantics for few shot named entity recognition. Findings of the Association for Computational Linguistics: ACL 2022, pp 1956–1971.https://doi.org/10.18653/v1/2022.findings-acl.155

  21. Liao Z, Fei J, Zeng W, Zhao X (2023) Few-shot named entity recognition with hybrid multi-prototype learning. World Wide Web 26(5):2521–2544. https://doi.org/10.1007/s11280-023-01143-5

    Article  Google Scholar 

  22. Wen W, Liu Y, Lin Q, Ouyang C (2023) Few-shot named entity recognition with joint token and sentence awareness. Data Intell 5(3):767–785. https://doi.org/10.1162/dint_a_00195

    Article  Google Scholar 

  23. Dong G, Wang Z, Wang L, Guo D, Fu D, Wu Y, Zeng C, Li X, Hui T, He K, et al (2023) A prototypical semantic decoupling method via joint contrastive learning for few-shot named entity recognition. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 1–5 https://doi.org/10.1109/ICASSP49357.2023.10095149

  24. Hou Y, Che W, Lai Y, Zhou Z, Liu Y, Liu H, Liu T (2020) Few-shot slot tagging with collapsed dependency transfer and label-enhanced task-adaptive projection network. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp 1381–1393 . https://doi.org/10.18653/v1/2020.acl-main.128

  25. Li X, Li X, Zhao M, Yang M, Yu R, Yu M, Yu J (2024) Cliner: exploring task-relevant features and label semantic for few-shot named entity recognition. Neural Comput Appl 36(9):4679–4691. https://doi.org/10.1007/s00521-023-09285-3

    Article  Google Scholar 

  26. Wang P, Xu R, Liu T, Zhou Q, Cao Y, Chang B, Sui Z (2022) An enhanced span-based decomposition method for few-shot sequence labeling. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 5012–5024. https://doi.org/10.18653/v1/2022.naacl-main.369

  27. Ji B, Li S, Gan S, Yu J, Ma J, Liu H, Yang J (2022) Few-shot named entity recognition with entity-level prototypical network enhanced by dispersedly distributed prototypes. In: Proceedings of the 29th International Conference on Computational Linguistics, pp 1842–1854. https://aclanthology.org/2022.coling-1.159

  28. Wang J, Wang C, Tan C, Qiu M, Huang S, Huang J, Gao M (2022) Spanproto: A two-stage span-based prototypical network for few-shot named entity recognition. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp 3466–3476. https://doi.org/10.18653/v1/2022.emnlp-main.227

  29. Feng J, Xu G, Wang Q, Yang Y, Huang L (2024) Note the hierarchy: taxonomy-guided prototype for few-shot named entity recognition. Inf Process Manag 61(1):103557. https://doi.org/10.1016/j.ipm.2023.103557

    Article  Google Scholar 

  30. Zha E, Zeng D, Lin M, Shen Y (2024) Ceptner: contrastive learning enhanced prototypical network for two-stage few-shot named entity recognition. Knowl-Based Syst 295:111730. https://doi.org/10.1016/j.knosys.2024.111730

    Article  Google Scholar 

  31. Zouhal LM, Denoeux T (1998) An evidence-theoretic k-nn rule with parameter optimization. IEEE Trans Syst Man Cybern C (Appl Rev) 28(2):263–271. https://doi.org/10.1109/5326.669565

    Article  Google Scholar 

  32. Jiao L, Pan Q, Feng X, Yang F (2013) An evidential k-nearest neighbor classification method with weighted attributes. In: Proceedings of the 16th International Conference on Information Fusion, pp 145–150. https://ieeexplore.ieee.org/abstract/document/6641178

  33. Lian C, Ruan S, Denœux T (2015) An evidential classifier based on feature selection and two-step classification strategy. Pattern Recogn 48(7):2318–2327. https://doi.org/10.1016/j.patcog.2015.01.019

    Article  Google Scholar 

  34. Lian C, Ruan S, Denoeux T (2016) Dissimilarity metric learning in the belief function framework. IEEE Trans Fuzzy Syst 24(6):1555–1564. https://doi.org/10.1109/TFUZZ.2016.2540068

    Article  Google Scholar 

  35. Su Z, Denoeux T, Hao Y, Zhao M (2018) Evidential k-nn classification with enhanced performance via optimizing a class of parametric conjunctive t-rules. Knowl-Based Syst 142:7–16. https://doi.org/10.1016/j.knosys.2017.11.020

    Article  Google Scholar 

  36. Denoeux T, Kanjanatarakul O, Sriboonchitta S (2019) A new evidential k-nearest neighbor rule based on contextual discounting with partially supervised learning. Int J Approx Reason 113:287–302. https://doi.org/10.1016/j.ijar.2019.07.009

    Article  MathSciNet  Google Scholar 

  37. Denoeux T (2000) A neural network classifier based on Dempster–Shafer theory. IEEE Trans Syst Man Cybern A Syst Humans 30(2):131–150. https://doi.org/10.1109/3468.833094

    Article  Google Scholar 

  38. Denoeux T (2019) Logistic regression, neural networks and Dempster–Shafer theory: a new perspective. Knowl-Based Syst 176:54–67. https://doi.org/10.1016/j.knosys.2019.03.030

    Article  Google Scholar 

  39. Capellier E, Davoine F, Cherfaoui V, Li Y (2019) Evidential deep learning for arbitrary lidar object classification in the context of autonomous driving. In: 2019 IEEE Intelligent Vehicles Symposium (IV), pp 1304–1311. https://doi.org/10.1109/IVS.2019.8813846

  40. Tong Z, Xu P, Denoeux T (2019) Convnet and dempster-shafer theory for object recognition. In: Scalable Uncertainty Management: 13th International Conference, SUM 2019, Compiègne, France, 16–18 Dec 2019, Proceedings 13, pp 368–381. https://doi.org/10.1007/978-3-030-35514-2_27

  41. Tong Z, Xu P, Denoeux T (2021) An evidential classifier based on Dempster–Shafer theory and deep learning. Neurocomputing 450:275–293. https://doi.org/10.1016/j.neucom.2021.03.066

    Article  Google Scholar 

  42. Huang L, Ruan S, Decazes P, Denoeux T (2021) Evidential segmentation of 3d pet/ct images. In: Belief Functions: Theory and Applications: 6th International Conference, BELIEF 2021, Shanghai, China, 15–19 Oct 2021, Proceedings, pp 159–167. https://doi.org/10.1007/978-3-030-88601-1_16

  43. Huang L, Ruan S, Decazes P, Denœux T (2022) Lymphoma segmentation from 3d pet-ct images using a deep evidential network. Int J Approx Reason 149:39–60. https://doi.org/10.1016/j.ijar.2022.06.007

    Article  Google Scholar 

  44. Huang L, Ruan S, Denoeux T (2021) Belief function-based semi-supervised learning for brain tumor segmentation. In: 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), pp 160–164. https://doi.org/10.1109/ISBI48211.2021.9433885

  45. Yue X, Chen Y, Yuan B, Lv Y (2022) Three-way image classification with evidential deep convolutional neural networks. Cogn Comput 14:2074–2086. https://doi.org/10.1007/s12559-021-09869-y

    Article  Google Scholar 

  46. Xu S, Chen Y, Ma C, Yue X (2022) Deep evidential fusion network for medical image classification. Int J Approx Reason 150:188–198. https://doi.org/10.1016/j.ijar.2022.08.013

    Article  MathSciNet  Google Scholar 

  47. Qiang C, Deng Y (2022) A new correlation coefficient of mass function in evidence theory and its application in fault diagnosis. Appl Intell 52(7):7832–7842. https://doi.org/10.1007/s10489-021-02797-2

    Article  Google Scholar 

  48. Devlin J, Chang M-W, Lee K, Toutanova K (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, pp 4171–4186. https://doi.org/10.18653/v1/N19-1423

  49. Weischedel R, Palmer M, Marcus M, Hovy E, Pradhan S, Ramshaw L, Xue N, Taylor A, Kaufman J, Franchini M, et al. (2013) Ontonotes release 5.0 ldc2013t19. Linguistic Data Consortium, Philadelphia, PA, vol 23

  50. Sang EFTK, De Meulder F (2003) Introduction to the conll-2003 shared task: Language-independent named entity recognition. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, pp 142–147. https://doi.org/10.3115/1119176.1119195

  51. Stubbs A, Uzuner Ö (2015) Annotating longitudinal clinical narratives for de-identification: the 2014 i2b2/uthealth corpus. J Biomed Inform 58:20–29. https://doi.org/10.1016/j.jbi.2015.07.020

    Article  Google Scholar 

  52. Derczynski L, Nichols E, Van Erp M, Limsopatham N (2017) Results of the wnut2017 shared task on novel and emerging entity recognition. In: Proceedings of the 3rd Workshop on Noisy User-generated Text, pp 140–147. https://doi.org/10.18653/v1/W17-4418

  53. Zeldes A (2017) The gum corpus: creating multilayer resources in the classroom. Lang Resour Eval 51(3):581–612. https://doi.org/10.1007/s10579-016-9343-x

    Article  Google Scholar 

  54. Liu J, Pasupat P, Cyphers S, Glass J (2013) Asgard: a portable architecture for multilingual dialogue systems. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp 8386–8390. https://doi.org/10.1109/ICASSP.2013.6639301

  55. Chen P, Xu H, Zhang C, Huang R (2022) Crossroads, buildings and neighborhoods: a dataset for fine-grained location recognition. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 3329–3339. https://doi.org/10.18653/v1/2022.naacl-main.243

  56. Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(86):2579–2605

    Google Scholar 

  57. Zheng X, Chen H, Xu T (2013) Deep learning for Chinese word segmentation and POS tagging. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp 647–657. https://aclanthology.org/D13-1061/

Download references

Funding

This work was supported by the National Natural Science Foundation of China (Grant No. 62007004), the Beijing Natural Science Foundation (Grant No. 4234081) and the Major Program of Key Research Base of Humanities and Social Sciences of the Ministry of Education of China (22JJD740017).

Author information

Authors and Affiliations

Authors

Contributions

Haitao Liu contributed to data curation, investigation, resources, software, writing—original draft. Weiming Peng contributed to conceptualization, funding acquisition, methodology, writing—review and editing. Jihua Song: funding acquisition, formal analysis, project administration, supervision.

Corresponding author

Correspondence to Jihua Song.

Ethics declarations

Conflict of interest

The authors declare no potential conflict of interest.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, H., Peng, W. & Song, J. RepEKShot: an evidential k-nearest neighbor classifier with repulsion loss for few-shot named entity recognition. J Supercomput 80, 22069–22098 (2024). https://doi.org/10.1007/s11227-024-06244-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-024-06244-0

Keywords