Skip to main content

Knowledge-Grounded andĀ Self-extending NER

  • Conference paper
  • First Online:
HCI International 2023 Posters (HCII 2023)

Abstract

The wave of digitization has begun. Organizations deal with huge amounts of data, such as logs, websites, and documents. A common way to make the information contained in these sources machine-accessible for automated processing is to first extract the information and then store it in a knowledge graph. A key task in this approach is to recognize entities. While common named entity recognition (NER) models work well for common entity types, they typically fail to recognize custom entities. Custom entity recognition requires data to be manually annotated and custom NER models to be trained. To efficiently extract the information, this paper proposes an innovative solution: Our Gazetteer approach uses a knowledge graph to create a coarse and fast NER component, reducing the need for manual annotation and saving human effort. Focusing on a university use case, our Gazetteer is integrated into a chatbot for entity recognition. In addition, data can be annotated using the Gazetteer and an NER model can be trained. Subsequently, the NER model can be used to recognize unseen custom entities, which are then added to the knowledge graph. This will improve the knowledge graph and make it self-extending.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. AI, O.: ChatGPT - a sibling of InstructGPT which is trained to follow an instruction (2023). https://chat.openai.com/chat

  2. ArangoDB: ArangoDB - a native multi-model database with flexible data models for documents, graphs, and key-values (2023). https://www.arangodb.com/

  3. Community, A.: Python Arango - Python driver for Arango (2023), https://github.com/ArangoDB-Community/python-arango

  4. Effland, T., Collins, M.: Partially supervised named entity recognition via the expected entity ratio loss (2021). https://doi.org/10.48550/ARXIV.2108.07216, https://arxiv.org/abs/2108.07216

  5. Explosion: SpaCy - Industrial-strength Natural Language Processing (NLP) in Python (2023). https://spacy.io/

  6. Flair: Flair - a very simple framework for state-of-the-art Natural Language Processing (NLP) (2023). https://github.com/flairNLP/flair

  7. Henne, S., Mehlin, V., Schmid, E., Schacht, S.: The DIAS project. development of an intelligent digital assistant in higher education. In: Proceedings of the 4th International Conference Business Meets Technology (BMT22). Editorial Universitat PolitĆØcnica de ValĆØncia (2023)

    Google ScholarĀ 

  8. HuggingFace: HuggingFace Transformers - State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX (2023), https://github.com/huggingface/transformers

  9. Pasupat, P., Liang, P.: Zero-shot entity extraction from web pages. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 391ā€“401 (2014)

    Google ScholarĀ 

  10. Rasa: Rasa - an open source machine learning framework to automate text and voice-based conversations (2023). https://rasa.com/

  11. Sang, E.F., De Meulder, F.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. arXiv preprint cs/0306050 (2003)

    Google ScholarĀ 

  12. Singh, V.: Replace or retrieve keywords in documents at scale. arXiv preprint arXiv:1711.00046 (2017)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sudarshan Kamath Barkur .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kamath Barkur, S., Schacht, S., Lanquillon, C. (2023). Knowledge-Grounded andĀ Self-extending NER. In: Stephanidis, C., Antona, M., Ntoa, S., Salvendy, G. (eds) HCI International 2023 Posters. HCII 2023. Communications in Computer and Information Science, vol 1836. Springer, Cham. https://doi.org/10.1007/978-3-031-36004-6_60

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-36004-6_60

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-36003-9

  • Online ISBN: 978-3-031-36004-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics