Abstract
The wave of digitization has begun. Organizations deal with huge amounts of data, such as logs, websites, and documents. A common way to make the information contained in these sources machine-accessible for automated processing is to first extract the information and then store it in a knowledge graph. A key task in this approach is to recognize entities. While common named entity recognition (NER) models work well for common entity types, they typically fail to recognize custom entities. Custom entity recognition requires data to be manually annotated and custom NER models to be trained. To efficiently extract the information, this paper proposes an innovative solution: Our Gazetteer approach uses a knowledge graph to create a coarse and fast NER component, reducing the need for manual annotation and saving human effort. Focusing on a university use case, our Gazetteer is integrated into a chatbot for entity recognition. In addition, data can be annotated using the Gazetteer and an NER model can be trained. Subsequently, the NER model can be used to recognize unseen custom entities, which are then added to the knowledge graph. This will improve the knowledge graph and make it self-extending.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
AI, O.: ChatGPT - a sibling of InstructGPT which is trained to follow an instruction (2023). https://chat.openai.com/chat
ArangoDB: ArangoDB - a native multi-model database with flexible data models for documents, graphs, and key-values (2023). https://www.arangodb.com/
Community, A.: Python Arango - Python driver for Arango (2023), https://github.com/ArangoDB-Community/python-arango
Effland, T., Collins, M.: Partially supervised named entity recognition via the expected entity ratio loss (2021). https://doi.org/10.48550/ARXIV.2108.07216, https://arxiv.org/abs/2108.07216
Explosion: SpaCy - Industrial-strength Natural Language Processing (NLP) in Python (2023). https://spacy.io/
Flair: Flair - a very simple framework for state-of-the-art Natural Language Processing (NLP) (2023). https://github.com/flairNLP/flair
Henne, S., Mehlin, V., Schmid, E., Schacht, S.: The DIAS project. development of an intelligent digital assistant in higher education. In: Proceedings of the 4th International Conference Business Meets Technology (BMT22). Editorial Universitat PolitĆØcnica de ValĆØncia (2023)
HuggingFace: HuggingFace Transformers - State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX (2023), https://github.com/huggingface/transformers
Pasupat, P., Liang, P.: Zero-shot entity extraction from web pages. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 391ā401 (2014)
Rasa: Rasa - an open source machine learning framework to automate text and voice-based conversations (2023). https://rasa.com/
Sang, E.F., De Meulder, F.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. arXiv preprint cs/0306050 (2003)
Singh, V.: Replace or retrieve keywords in documents at scale. arXiv preprint arXiv:1711.00046 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kamath Barkur, S., Schacht, S., Lanquillon, C. (2023). Knowledge-Grounded and Self-extending NER. In: Stephanidis, C., Antona, M., Ntoa, S., Salvendy, G. (eds) HCI International 2023 Posters. HCII 2023. Communications in Computer and Information Science, vol 1836. Springer, Cham. https://doi.org/10.1007/978-3-031-36004-6_60
Download citation
DOI: https://doi.org/10.1007/978-3-031-36004-6_60
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36003-9
Online ISBN: 978-3-031-36004-6
eBook Packages: Computer ScienceComputer Science (R0)