Abstract
In this paper, we present a processing pipeline for transforming natural language annotations in RDF graphs into machine-readable and interoperable semantic annotations. The pipeline uses Named Entity Recognition (NER) and Entity Linking (EL) techniques based on a foundational Large Language Model (LLM), combined with a Knowledge Graph (KG) based knowledge injection approach for entity disambiguation and self-verification. Through a running example in the paper, we demonstrate that the pipeline can increase the number of semantic annotations in an RDF graph derived from information contained in natural language annotations. The evaluation of the proposed pipeline shows that the LLM-based NER approach produces results comparable to those of fine-tuned NER models. Furthermore, we show that the pipeline using a chain-of-thought prompting style with factual information retrieved via link traversal from an external KG achieves better entity disambiguation and linking than both a pipeline without chain-of-thought prompting and an approach relying only on information within the LLM.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Well-known prefixes are omitted in all listings, but can be looked up on http://prefix.cc/.
- 2.
prefix wd: https://www.wikidata.org/wiki/.
- 3.
prefix qudt: https://qudt.org/vocab/unit/.
- 4.
prefix ex: https://example.com/.
- 5.
- 6.
- 7.
- 8.
- 9.
https://w3c.github.io/wot-thing-description/testing/report.html; available as a single RDF file at https://www.vcharpenay.link/talks/td-sem-interop.html.
- 10.
References
Charpenay, V., Käbisch, S.: On modeling the physical world as a collection of things: the W3C thing description ontology. In: European Semantic Web Conference, pp. 599–615. Springer (2020)
Freund, M., Rott, J., Dorsch, R., et al.: FAIR Internet of Things data: enabling process optimization at Munich airport. In: European Semantic Web Conference. Springer (2024)
Kaebisch, S., McCool, M., Korkan, E., Kamiya, T., Charpenay, V., Kovatsch, M.: Web of Things (WoT) Thing Description 1.1 (2023). https://www.w3.org/TR/wot-thing-description/
Lagally, M., Matsukura, R., McCool, M., et al.: Web of Things (WoT) Architecture 1.1 (2023). https://www.w3.org/TR/wot-architecture/
Lewis, P., et al.: Retrieval-augmented generation for knowledge-intensive NLP tasks. Adv. Neural. Inf. Process. Syst. 33, 9459–9474 (2020)
Mann, B., et al.: Language models are few-shot learners. arXiv preprint arXiv:2005.14165, vol. 1 (2020)
Martino, A., Iannelli, M., Truong, C.: Knowledge injection to counter large language model (LLM) hallucination. In: European Semantic Web Conference, pp. 182–185. Springer (2023)
Matsumoto, N., et al.: Kragen: a knowledge graph-enhanced rag framework for biomedical problem solving using large language models. Bioinformatics 40(6) (2024)
Mesnard, T., Hardin, C., Dadashi, R., et al.: Gemma: open models based on Gemini research and technology. arXiv preprint arXiv:2403.08295 (2024)
Min, B., et al.: Recent advances in natural language processing via large pre-trained language models: a survey. ACM Comput. Surv. 56(2), 1–40 (2023)
Monajatipoor, M., et al.: LLMs in biomedicine: a study on clinical named entity recognition. arXiv preprint arXiv:2404.07376 (2024)
Nadkarni, P.M., Ohno-Machado, L., Chapman, W.W.: Natural language processing: an introduction. J. Am. Med. Inf. Assoc. 18(5), 544–551 (2011)
Nasar, Z., Jaffry, S.W., Malik, M.K.: Named entity recognition and relation extraction: state-of-the-art. ACM Comput. Surv. (CSUR) 54(1), 1–39 (2021)
Nori, H., et al.: Can generalist foundation models outcompete special-purpose tuning? Case study in medicine. arXiv preprint arXiv:2311.16452 (2023)
Pan, S., Luo, L., Wang, Y., Chen, C., Wang, J., Wu, X.: Unifying large language models and knowledge graphs: a roadmap. IEEE Trans. Knowl. Data Eng. (2024)
Qin, C., Zhang, A., Zhang, Z., et al.: Is chatgpt a general-purpose natural language processing task solver? arXiv preprint arXiv:2302.06476 (2023)
Rantala, H., Ikkala, E., Rohiola, V., et al.: Findsampo: a linked data based portal and data service for analyzing and disseminating archaeological object finds. In: European Semantic Web Conference, pp. 478–494. Springer (2022)
Satheesh, K., Jahnavi, A., Iswarya, L., Ayesha, K., Bhanusekhar, G., Hanisha, K.: Resume ranking based on job description using SpaCy NER model. Int. Res. J. Eng. Technol. 7(05), 74–77 (2020)
Scheffler, M., Aeschlimann, M., Albrecht, M., et al.: FAIR data enabling new horizons for materials research. Nature 604(7907), 635–642 (2022)
Sevgili, Ö., Shelmanov, A., Arkhipov, M., et al.: Neural entity linking: a survey of models based on deep learning. Semant. Web 13(3), 527–570 (2022)
Shen, W., Li, Y., Liu, Y., et al.: Entity linking meets deep learning: techniques and solutions. IEEE Trans. Knowl. Data Eng. 35(3), 2556–2578 (2021)
Wang, S., Zhao, Z., Ouyang, X., Wang, Q., Shen, D.: Chatcad: interactive computer-aided diagnosis on medical image using large language models. arXiv preprint arXiv:2302.07257 (2023)
Wang, S., Sun, X., Li, X., et al.: GPT-NER: named entity recognition via large language models. arXiv preprint arXiv:2304.10428 (2023)
Wei, J., et al.: Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural. Inf. Process. Syst. 35, 24824–24837 (2022)
Whitehouse, C., Choudhury, M., Aji, A.F.: LLM-powered data augmentation for enhanced cross-lingual performance. arXiv preprint arXiv:2305.14288 (2023)
Yang, J., et al.: Harnessing the power of LLMs in practice: a survey on chatgpt and beyond. ACM Trans. Knowl. Discov. Data 18(6), 1–32 (2024)
Acknowledgement
This work was partially funded by the German Federal Ministry for Economic Affairs and Climate Action (BMWK) through the Antrieb 4.0 project (Grant No. 13IK015B) and the MANDAT project (Grant No. 16DTM107A).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Freund, M., Dorsch, R., Schmid, S., Wehr, T., Harth, A. (2025). Enriching RDF Data with LLM Based Named Entity Recognition and Linking on Embedded Natural Language Annotations. In: Tiwari, S., Villazón-Terrazas, B., Ortiz-Rodríguez, F., Sahri, S. (eds) Knowledge Graphs and Semantic Web. KGSWC 2024. Lecture Notes in Computer Science, vol 15459. Springer, Cham. https://doi.org/10.1007/978-3-031-81221-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-81221-7_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-81220-0
Online ISBN: 978-3-031-81221-7
eBook Packages: Computer ScienceComputer Science (R0)