Abstract
Metric-based models have recently shown promising performance in the few-shot named entity recognition (NER) task. Many methods train their encoders with loss functions that focus on distinguishing different entity types, which ignores improving the ability to recognize ground-truth and interfered labels when making predictions. Furthermore, the inference strategy of nearest neighbor is popular for metric-based models. However, other surrounding neighbors can also provide useful information for NER, and it is hard to determine whether the nearest neighbor is the most suitable referent when multiple neighbors are all close to the query sample. To solve the above problems, we propose RepEKShot, a novel model which utilizes repulsion loss for training the encoder and extends the inference strategy from nearest neighbor to evidential k-nearest neighbor in the framework of Dempster–Shafer theory. Our model effectively optimizes the training of encoder, and sufficiently exploits the information provided by other neighbors to provide a more global perspective for few-shot NER. Extensive experiments have been conducted on two benchmarks with public datasets, and the results show that our model has performance merits in few-shot scenarios.










Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Data availability
The sources of datasets have been listed in the paper.
Notes
Note that we only adopt the domain transfer scenario in the later benchmark, because the support sets for another tag set extension scenario are not publicly available.
According to the original setting, in 1-shot setting, there are 200 support-query pairs for testing CoNLL, GUM and WNUT, and 100 pairs for OntoNotes. In 5-shot setting, all the datasets are tested with 100 support-query pairs.
The data for CrossNER and Domain Transfer can be obtained from https://github.com/AtmaHou/FewShotTagging and https://github.com/asappresearch/structshot.
References
Hirschman L, Gaizauskas R (2001) Natural language question answering: the view from here. Nat Lang Eng 7(4):275–300. https://doi.org/10.1017/S1351324901002807
Zou X (2020) A survey on application of knowledge graph. J Phys Conf Ser 1487:012016. https://doi.org/10.1088/1742-6596/1487/1/012016
Chen H, Liu X, Yin D, Tang J (2017) A survey on dialogue systems: recent advances and new frontiers. ACM SIGKDD Explor Newsl 19(2):25–35. https://doi.org/10.1145/3166054.3166058
Li J, Sun A, Han J, Li C (2020) A survey on deep learning for named entity recognition. IEEE Trans Knowl Data Eng 34(1):50–70. https://doi.org/10.1109/TKDE.2020.2981314
Huang J, Li C, Subudhi K, Jose D, Balakrishnan S, Chen W, Peng B, Gao J, Han J (2021) Few-shot named entity recognition: an empirical baseline study. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp 10408–10423. https://doi.org/10.18653/v1/2021.emnlp-main.813
Snell J, Swersky K, Zemel R (2017) Prototypical networks for few-shot learning. In: Advances in Neural Information Processing Systems, vol 30. https://proceedings.neurips.cc/paper_files/paper/2017/file/cb8da6767461f2812ae4290eac7cbc42-Paper.pdf
Fritzler A, Logacheva V, Kretov M (2019) Few-shot classification in named entity recognition task. In: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, pp 993–1000. https://doi.org/10.1145/3297280.3297378
Yang Y, Katiyar A (2020) Simple and effective few-shot named entity recognition with structured nearest neighbor learning. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 6365–6375. https://doi.org/10.18653/v1/2020.emnlp-main.516
Das SSS, Katiyar A, Passonneau RJ, Zhang R (2022) Container: few-shot named entity recognition via contrastive learning. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (vol 1: Long Papers), pp 6338–6353. https://doi.org/10.18653/v1/2022.acl-long.439
Cao J, Gao Y, Huang H (2022) A prototype-based few-shot named entity recognition. In: Proceedings of the 8th International Conference on Computing and Artificial Intelligence, pp 338–343. https://doi.org/10.1145/3532213.3532263
Wang X, Xiao T, Jiang Y, Shao S, Sun J, Shen C (2018) Repulsion loss: detecting pedestrians in a crowd. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7774–7783. https://doi.org/10.1109/CVPR.2018.00811
Sachdeva R, Cordeiro FR, Belagiannis V, Reid I, Carneiro G (2023) Scanmix: learning from severe label noise via semantic clustering and semi-supervised learning. Pattern Recogn 134:109121. https://doi.org/10.1016/j.patcog.2022.109121
Zhang G, Zhang S, Yuan G (2024) Bayesian graph local extrema convolution with long-tail strategy for misinformation detection. ACM Trans Knowl Discov Data. https://doi.org/10.1145/3639408
Tong M, Wang S, Xu B, Cao Y, Liu M, Hou L, Li J (2021) Learning from miscellaneous other-class words for few-shot named entity recognition. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (vol 1: Long Papers), pp 6236–6247. https://doi.org/10.18653/v1/2021.acl-long.487
Shafer G (1976) A mathematical theory of evidence. Princeton University Press, Princeton
Denoeux T (1995) A k-nearest neighbor classification rule based on Dempster–Shafer theory. IEEE Trans Syst Man Cybern 25(5):804–813. https://doi.org/10.1109/21.376493
Huang Y, He K, Wang Y, Zhang X, Gong T, Mao R, Li C (2022) Copner: Contrastive learning with prompt guiding for few-shot named entity recognition. In: Proceedings of the 29th International Conference on Computational Linguistics, pp 2515–2527. https://aclanthology.org/2022.coling-1.222
Ding N, Xu G, Chen Y, Wang X, Han X, Xie P, Zheng H, Liu Z (2021) Few-nerd: a few-shot named entity recognition dataset. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (vol 1: Long Papers), pp 3198–3213. https://doi.org/10.18653/v1/2021.acl-long.248
Huang T, Zhang M, Liu K, Li X, Wang Y (2023) Enhanced prototypical network for few-shot named entity recognition. In: International Artificial Intelligence Conference, pp 156–170. https://doi.org/10.1007/978-981-97-1277-9_12
Ma J, Ballesteros M, Doss S, Anubhai R, Mallya S, Al-Onaizan Y, Roth D (2022) Label semantics for few shot named entity recognition. Findings of the Association for Computational Linguistics: ACL 2022, pp 1956–1971.https://doi.org/10.18653/v1/2022.findings-acl.155
Liao Z, Fei J, Zeng W, Zhao X (2023) Few-shot named entity recognition with hybrid multi-prototype learning. World Wide Web 26(5):2521–2544. https://doi.org/10.1007/s11280-023-01143-5
Wen W, Liu Y, Lin Q, Ouyang C (2023) Few-shot named entity recognition with joint token and sentence awareness. Data Intell 5(3):767–785. https://doi.org/10.1162/dint_a_00195
Dong G, Wang Z, Wang L, Guo D, Fu D, Wu Y, Zeng C, Li X, Hui T, He K, et al (2023) A prototypical semantic decoupling method via joint contrastive learning for few-shot named entity recognition. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 1–5 https://doi.org/10.1109/ICASSP49357.2023.10095149
Hou Y, Che W, Lai Y, Zhou Z, Liu Y, Liu H, Liu T (2020) Few-shot slot tagging with collapsed dependency transfer and label-enhanced task-adaptive projection network. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp 1381–1393 . https://doi.org/10.18653/v1/2020.acl-main.128
Li X, Li X, Zhao M, Yang M, Yu R, Yu M, Yu J (2024) Cliner: exploring task-relevant features and label semantic for few-shot named entity recognition. Neural Comput Appl 36(9):4679–4691. https://doi.org/10.1007/s00521-023-09285-3
Wang P, Xu R, Liu T, Zhou Q, Cao Y, Chang B, Sui Z (2022) An enhanced span-based decomposition method for few-shot sequence labeling. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 5012–5024. https://doi.org/10.18653/v1/2022.naacl-main.369
Ji B, Li S, Gan S, Yu J, Ma J, Liu H, Yang J (2022) Few-shot named entity recognition with entity-level prototypical network enhanced by dispersedly distributed prototypes. In: Proceedings of the 29th International Conference on Computational Linguistics, pp 1842–1854. https://aclanthology.org/2022.coling-1.159
Wang J, Wang C, Tan C, Qiu M, Huang S, Huang J, Gao M (2022) Spanproto: A two-stage span-based prototypical network for few-shot named entity recognition. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp 3466–3476. https://doi.org/10.18653/v1/2022.emnlp-main.227
Feng J, Xu G, Wang Q, Yang Y, Huang L (2024) Note the hierarchy: taxonomy-guided prototype for few-shot named entity recognition. Inf Process Manag 61(1):103557. https://doi.org/10.1016/j.ipm.2023.103557
Zha E, Zeng D, Lin M, Shen Y (2024) Ceptner: contrastive learning enhanced prototypical network for two-stage few-shot named entity recognition. Knowl-Based Syst 295:111730. https://doi.org/10.1016/j.knosys.2024.111730
Zouhal LM, Denoeux T (1998) An evidence-theoretic k-nn rule with parameter optimization. IEEE Trans Syst Man Cybern C (Appl Rev) 28(2):263–271. https://doi.org/10.1109/5326.669565
Jiao L, Pan Q, Feng X, Yang F (2013) An evidential k-nearest neighbor classification method with weighted attributes. In: Proceedings of the 16th International Conference on Information Fusion, pp 145–150. https://ieeexplore.ieee.org/abstract/document/6641178
Lian C, Ruan S, Denœux T (2015) An evidential classifier based on feature selection and two-step classification strategy. Pattern Recogn 48(7):2318–2327. https://doi.org/10.1016/j.patcog.2015.01.019
Lian C, Ruan S, Denoeux T (2016) Dissimilarity metric learning in the belief function framework. IEEE Trans Fuzzy Syst 24(6):1555–1564. https://doi.org/10.1109/TFUZZ.2016.2540068
Su Z, Denoeux T, Hao Y, Zhao M (2018) Evidential k-nn classification with enhanced performance via optimizing a class of parametric conjunctive t-rules. Knowl-Based Syst 142:7–16. https://doi.org/10.1016/j.knosys.2017.11.020
Denoeux T, Kanjanatarakul O, Sriboonchitta S (2019) A new evidential k-nearest neighbor rule based on contextual discounting with partially supervised learning. Int J Approx Reason 113:287–302. https://doi.org/10.1016/j.ijar.2019.07.009
Denoeux T (2000) A neural network classifier based on Dempster–Shafer theory. IEEE Trans Syst Man Cybern A Syst Humans 30(2):131–150. https://doi.org/10.1109/3468.833094
Denoeux T (2019) Logistic regression, neural networks and Dempster–Shafer theory: a new perspective. Knowl-Based Syst 176:54–67. https://doi.org/10.1016/j.knosys.2019.03.030
Capellier E, Davoine F, Cherfaoui V, Li Y (2019) Evidential deep learning for arbitrary lidar object classification in the context of autonomous driving. In: 2019 IEEE Intelligent Vehicles Symposium (IV), pp 1304–1311. https://doi.org/10.1109/IVS.2019.8813846
Tong Z, Xu P, Denoeux T (2019) Convnet and dempster-shafer theory for object recognition. In: Scalable Uncertainty Management: 13th International Conference, SUM 2019, Compiègne, France, 16–18 Dec 2019, Proceedings 13, pp 368–381. https://doi.org/10.1007/978-3-030-35514-2_27
Tong Z, Xu P, Denoeux T (2021) An evidential classifier based on Dempster–Shafer theory and deep learning. Neurocomputing 450:275–293. https://doi.org/10.1016/j.neucom.2021.03.066
Huang L, Ruan S, Decazes P, Denoeux T (2021) Evidential segmentation of 3d pet/ct images. In: Belief Functions: Theory and Applications: 6th International Conference, BELIEF 2021, Shanghai, China, 15–19 Oct 2021, Proceedings, pp 159–167. https://doi.org/10.1007/978-3-030-88601-1_16
Huang L, Ruan S, Decazes P, Denœux T (2022) Lymphoma segmentation from 3d pet-ct images using a deep evidential network. Int J Approx Reason 149:39–60. https://doi.org/10.1016/j.ijar.2022.06.007
Huang L, Ruan S, Denoeux T (2021) Belief function-based semi-supervised learning for brain tumor segmentation. In: 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), pp 160–164. https://doi.org/10.1109/ISBI48211.2021.9433885
Yue X, Chen Y, Yuan B, Lv Y (2022) Three-way image classification with evidential deep convolutional neural networks. Cogn Comput 14:2074–2086. https://doi.org/10.1007/s12559-021-09869-y
Xu S, Chen Y, Ma C, Yue X (2022) Deep evidential fusion network for medical image classification. Int J Approx Reason 150:188–198. https://doi.org/10.1016/j.ijar.2022.08.013
Qiang C, Deng Y (2022) A new correlation coefficient of mass function in evidence theory and its application in fault diagnosis. Appl Intell 52(7):7832–7842. https://doi.org/10.1007/s10489-021-02797-2
Devlin J, Chang M-W, Lee K, Toutanova K (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, pp 4171–4186. https://doi.org/10.18653/v1/N19-1423
Weischedel R, Palmer M, Marcus M, Hovy E, Pradhan S, Ramshaw L, Xue N, Taylor A, Kaufman J, Franchini M, et al. (2013) Ontonotes release 5.0 ldc2013t19. Linguistic Data Consortium, Philadelphia, PA, vol 23
Sang EFTK, De Meulder F (2003) Introduction to the conll-2003 shared task: Language-independent named entity recognition. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, pp 142–147. https://doi.org/10.3115/1119176.1119195
Stubbs A, Uzuner Ö (2015) Annotating longitudinal clinical narratives for de-identification: the 2014 i2b2/uthealth corpus. J Biomed Inform 58:20–29. https://doi.org/10.1016/j.jbi.2015.07.020
Derczynski L, Nichols E, Van Erp M, Limsopatham N (2017) Results of the wnut2017 shared task on novel and emerging entity recognition. In: Proceedings of the 3rd Workshop on Noisy User-generated Text, pp 140–147. https://doi.org/10.18653/v1/W17-4418
Zeldes A (2017) The gum corpus: creating multilayer resources in the classroom. Lang Resour Eval 51(3):581–612. https://doi.org/10.1007/s10579-016-9343-x
Liu J, Pasupat P, Cyphers S, Glass J (2013) Asgard: a portable architecture for multilingual dialogue systems. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp 8386–8390. https://doi.org/10.1109/ICASSP.2013.6639301
Chen P, Xu H, Zhang C, Huang R (2022) Crossroads, buildings and neighborhoods: a dataset for fine-grained location recognition. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 3329–3339. https://doi.org/10.18653/v1/2022.naacl-main.243
Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(86):2579–2605
Zheng X, Chen H, Xu T (2013) Deep learning for Chinese word segmentation and POS tagging. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp 647–657. https://aclanthology.org/D13-1061/
Funding
This work was supported by the National Natural Science Foundation of China (Grant No. 62007004), the Beijing Natural Science Foundation (Grant No. 4234081) and the Major Program of Key Research Base of Humanities and Social Sciences of the Ministry of Education of China (22JJD740017).
Author information
Authors and Affiliations
Contributions
Haitao Liu contributed to data curation, investigation, resources, software, writing—original draft. Weiming Peng contributed to conceptualization, funding acquisition, methodology, writing—review and editing. Jihua Song: funding acquisition, formal analysis, project administration, supervision.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no potential conflict of interest.
Ethical approval
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, H., Peng, W. & Song, J. RepEKShot: an evidential k-nearest neighbor classifier with repulsion loss for few-shot named entity recognition. J Supercomput 80, 22069–22098 (2024). https://doi.org/10.1007/s11227-024-06244-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-024-06244-0