Skip to main content

TOKEN Is a MASK: Few-shot Named Entity Recognition with Pre-trained Language Models

  • Conference paper
  • First Online:
Text, Speech, and Dialogue (TSD 2022)

Abstract

Transferring knowledge from one domain to another is of practical importance for many tasks in natural language processing, especially when the amount of available data in the target domain is limited. In this work, we propose a novel few-shot approach to domain adaptation in the context of Named Entity Recognition (NER). We propose a two-step approach consisting of a variable base module and a template module that leverages the knowledge captured in pre-trained language models with the help of simple descriptive patterns. Our approach is simple yet versatile, and can be applied in few-shot and zero-shot settings. Evaluating our lightweight approach across a number of different datasets shows that it can boost the performance of state-of-the-art baselines by \(2-5\%\) F1-score.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/uds-lsv/TOKEN-is-a-MASK.

  2. 2.

    We make use of Spacy POS tagger https://spacy.io/usage/linguistic-features.

References

  1. Brown, T., et al.: Language models are few-shot learners. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) In: NeurIPS, vol. 33, pp. 1877–1901 (2020)

    Google Scholar 

  2. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT, pp. 4171–4186, June 2019

    Google Scholar 

  3. Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML (2017)

    Google Scholar 

  4. Fritzler, A., Logacheva, V., Kretov, M.: Few-shot classification in named entity recognition task. In: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, SAC 2019, pp. 993–1000 (2019)

    Google Scholar 

  5. Gao, T., Fisch, A., Chen, D.: Making pre-trained language models better few-shot learners. arXiv:2012.15723, 31 December 2020

  6. Gao, T., Fisch, A., Chen, D.: Making pre-trained language models better few-shot learners (2020)

    Google Scholar 

  7. Hou, Y., et al.: Few-shot slot tagging with collapsed dependency transfer and label-enhanced task-adaptive projection network. In: ACL, pp. 1381–1393. Online, July 2020

    Google Scholar 

  8. Huang, J., et al.: Few-shot named entity recognition: a comprehensive study. ArXiv abs/2012.14978 (2020)

    Google Scholar 

  9. Krone, J., Zhang, Y., Diab, M.: Learning to classify intents and slot labels given a handful of examples. In: Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI, pp. 96–108. Association for Computational Linguistics, July 2020

    Google Scholar 

  10. Li, J., Shang, S., Shao, L.: Metaner: Named entity recognition with meta-learning. In: Proceedings of The Web Conference 2020, pp. 429–440. WWW 2020. Association for Computing Machinery, New York, (2020)

    Google Scholar 

  11. Liu, Y., et al.: Roberta: a robustly optimized bert pretraining approach. ArXiv abs/1907.11692 (2019)

    Google Scholar 

  12. Peters, M.E., et al.: Deep contextualized word representations. In: NAACL-HLT, pp. 2227–2237, June 2018

    Google Scholar 

  13. Petroni, F., et al.: Language models as knowledge bases? (2019)

    Google Scholar 

  14. Pradhan, S., et al.: Towards robust linguistic analysis using OntoNotes. In: Proceedings of the Seventeenth Conference on Computational Natural Language Learning, pp. 143–152. Association for Computational Linguistics, Sofia, Bulgaria, August 2013

    Google Scholar 

  15. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners (2019). https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf

  16. Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21, 140:1–140:67 (2020)

    Google Scholar 

  17. Ravi, S., Larochelle, H.: Optimization as a model for few-shot learning. In: ICLR (2017)

    Google Scholar 

  18. Schick, T., Schütze, H.: Exploiting cloze questions for few shot text classification and natural language inference. In: EACL, 19–23 April 2021

    Google Scholar 

  19. Shin, T., Razeghi, Y., Logan IV, R.L., Wallace, E., Singh, S.: AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts. In: EMNLP. pp. 4222–4235. Online (Nov 2020)

    Google Scholar 

  20. Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) NeurIPS, vol. 30, pp. 4077–4087. Curran Associates, Inc. (2017)

    Google Scholar 

  21. Stubbs, A., Kotfila, C., Uzuner, O.: Automated systems for the de-identification of longitudinal clinical narratives. J. Biomed. Inf. 58(S), S11–S19 (2015)

    Google Scholar 

  22. Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In: Proceedings of the 7th CoNLL at HLT-NAACL 2003, pp. 142–147 (2003)

    Google Scholar 

  23. Wallace, E., Feng, S., Kandpal, N., Gardner, M., Singh, S.: Universal adversarial triggers for attacking and analyzing NLP. In: EMNLP/IJCNLP (2019)

    Google Scholar 

  24. Wang, J., Kulkarni, M., Preotiuc-Pietro, D.: Multi-domain named entity recognition with genre-aware and agnostic inference. In: ACL. pp. 8476–8488. Online (Jul 2020)

    Google Scholar 

  25. Yamada, I., Asai, A., Shindo, H., Takeda, H., Matsumoto, Y.: LUKE: deep contextualized entity representations with entity-aware self-attention. In: EMNLP, pp. 6442–6454, November 2020

    Google Scholar 

  26. Yang, Y., Katiyar, A.: Simple and effective few-shot named entity recognition with structured nearest neighbor learning. In: EMNLP, pp. 6365–6375, November 2020

    Google Scholar 

  27. Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: Xlnet: Generalized autoregressive pretraining for language understanding. In: NeurIPS. vol. 32, pp. 5753–5763 (2019)

    Google Scholar 

Download references

Acknowledgments

This work was funded by the EU-funded Horizon 2020 projects: COMPRISE (http://www.compriseh2020.eu/) under grant agreement No. 3081705 and ROXANNE under grant number 833635.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Ifeoluwa Adelani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Davody, A., Adelani, D.I., Kleinbauer, T., Klakow, D. (2022). TOKEN Is a MASK: Few-shot Named Entity Recognition with Pre-trained Language Models. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2022. Lecture Notes in Computer Science(), vol 13502. Springer, Cham. https://doi.org/10.1007/978-3-031-16270-1_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-16270-1_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-16269-5

  • Online ISBN: 978-3-031-16270-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics