Skip to main content

Enriching RDF Data with LLM Based Named Entity Recognition and Linking on Embedded Natural Language Annotations

  • Conference paper
  • First Online:
Knowledge Graphs and Semantic Web (KGSWC 2024)

Abstract

In this paper, we present a processing pipeline for transforming natural language annotations in RDF graphs into machine-readable and interoperable semantic annotations. The pipeline uses Named Entity Recognition (NER) and Entity Linking (EL) techniques based on a foundational Large Language Model (LLM), combined with a Knowledge Graph (KG) based knowledge injection approach for entity disambiguation and self-verification. Through a running example in the paper, we demonstrate that the pipeline can increase the number of semantic annotations in an RDF graph derived from information contained in natural language annotations. The evaluation of the proposed pipeline shows that the LLM-based NER approach produces results comparable to those of fine-tuned NER models. Furthermore, we show that the pipeline using a chain-of-thought prompting style with factual information retrieved via link traversal from an external KG achieves better entity disambiguation and linking than both a pipeline without chain-of-thought prompting and an approach relying only on information within the LLM.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Well-known prefixes are omitted in all listings, but can be looked up on http://prefix.cc/.

  2. 2.

    prefix wd: https://www.wikidata.org/wiki/.

  3. 3.

    prefix qudt: https://qudt.org/vocab/unit/.

  4. 4.

    prefix ex: https://example.com/.

  5. 5.

    https://github.com/FreuMi/ner_pipeline.

  6. 6.

    https://spacy.io/models/en#en_core_web_lg.

  7. 7.

    https://github.com/explosion/spaCy.

  8. 8.

    https://github.com/FreuMi/NER_Training.

  9. 9.

    https://w3c.github.io/wot-thing-description/testing/report.html; available as a single RDF file at https://www.vcharpenay.link/talks/td-sem-interop.html.

  10. 10.

    https://github.com/FreuMi/ner_pipeline/tree/main/evaluation/dataset.

References

  1. Charpenay, V., Käbisch, S.: On modeling the physical world as a collection of things: the W3C thing description ontology. In: European Semantic Web Conference, pp. 599–615. Springer (2020)

    Google Scholar 

  2. Freund, M., Rott, J., Dorsch, R., et al.: FAIR Internet of Things data: enabling process optimization at Munich airport. In: European Semantic Web Conference. Springer (2024)

    Google Scholar 

  3. Kaebisch, S., McCool, M., Korkan, E., Kamiya, T., Charpenay, V., Kovatsch, M.: Web of Things (WoT) Thing Description 1.1 (2023). https://www.w3.org/TR/wot-thing-description/

  4. Lagally, M., Matsukura, R., McCool, M., et al.: Web of Things (WoT) Architecture 1.1 (2023). https://www.w3.org/TR/wot-architecture/

  5. Lewis, P., et al.: Retrieval-augmented generation for knowledge-intensive NLP tasks. Adv. Neural. Inf. Process. Syst. 33, 9459–9474 (2020)

    MATH  Google Scholar 

  6. Mann, B., et al.: Language models are few-shot learners. arXiv preprint arXiv:2005.14165, vol. 1 (2020)

  7. Martino, A., Iannelli, M., Truong, C.: Knowledge injection to counter large language model (LLM) hallucination. In: European Semantic Web Conference, pp. 182–185. Springer (2023)

    Google Scholar 

  8. Matsumoto, N., et al.: Kragen: a knowledge graph-enhanced rag framework for biomedical problem solving using large language models. Bioinformatics 40(6) (2024)

    Google Scholar 

  9. Mesnard, T., Hardin, C., Dadashi, R., et al.: Gemma: open models based on Gemini research and technology. arXiv preprint arXiv:2403.08295 (2024)

  10. Min, B., et al.: Recent advances in natural language processing via large pre-trained language models: a survey. ACM Comput. Surv. 56(2), 1–40 (2023)

    Article  MATH  Google Scholar 

  11. Monajatipoor, M., et al.: LLMs in biomedicine: a study on clinical named entity recognition. arXiv preprint arXiv:2404.07376 (2024)

  12. Nadkarni, P.M., Ohno-Machado, L., Chapman, W.W.: Natural language processing: an introduction. J. Am. Med. Inf. Assoc. 18(5), 544–551 (2011)

    Article  MATH  Google Scholar 

  13. Nasar, Z., Jaffry, S.W., Malik, M.K.: Named entity recognition and relation extraction: state-of-the-art. ACM Comput. Surv. (CSUR) 54(1), 1–39 (2021)

    Article  MATH  Google Scholar 

  14. Nori, H., et al.: Can generalist foundation models outcompete special-purpose tuning? Case study in medicine. arXiv preprint arXiv:2311.16452 (2023)

  15. Pan, S., Luo, L., Wang, Y., Chen, C., Wang, J., Wu, X.: Unifying large language models and knowledge graphs: a roadmap. IEEE Trans. Knowl. Data Eng. (2024)

    Google Scholar 

  16. Qin, C., Zhang, A., Zhang, Z., et al.: Is chatgpt a general-purpose natural language processing task solver? arXiv preprint arXiv:2302.06476 (2023)

  17. Rantala, H., Ikkala, E., Rohiola, V., et al.: Findsampo: a linked data based portal and data service for analyzing and disseminating archaeological object finds. In: European Semantic Web Conference, pp. 478–494. Springer (2022)

    Google Scholar 

  18. Satheesh, K., Jahnavi, A., Iswarya, L., Ayesha, K., Bhanusekhar, G., Hanisha, K.: Resume ranking based on job description using SpaCy NER model. Int. Res. J. Eng. Technol. 7(05), 74–77 (2020)

    Google Scholar 

  19. Scheffler, M., Aeschlimann, M., Albrecht, M., et al.: FAIR data enabling new horizons for materials research. Nature 604(7907), 635–642 (2022)

    Article  Google Scholar 

  20. Sevgili, Ö., Shelmanov, A., Arkhipov, M., et al.: Neural entity linking: a survey of models based on deep learning. Semant. Web 13(3), 527–570 (2022)

    Article  Google Scholar 

  21. Shen, W., Li, Y., Liu, Y., et al.: Entity linking meets deep learning: techniques and solutions. IEEE Trans. Knowl. Data Eng. 35(3), 2556–2578 (2021)

    MATH  Google Scholar 

  22. Wang, S., Zhao, Z., Ouyang, X., Wang, Q., Shen, D.: Chatcad: interactive computer-aided diagnosis on medical image using large language models. arXiv preprint arXiv:2302.07257 (2023)

  23. Wang, S., Sun, X., Li, X., et al.: GPT-NER: named entity recognition via large language models. arXiv preprint arXiv:2304.10428 (2023)

  24. Wei, J., et al.: Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural. Inf. Process. Syst. 35, 24824–24837 (2022)

    Google Scholar 

  25. Whitehouse, C., Choudhury, M., Aji, A.F.: LLM-powered data augmentation for enhanced cross-lingual performance. arXiv preprint arXiv:2305.14288 (2023)

  26. Yang, J., et al.: Harnessing the power of LLMs in practice: a survey on chatgpt and beyond. ACM Trans. Knowl. Discov. Data 18(6), 1–32 (2024)

    Article  MATH  Google Scholar 

Download references

Acknowledgement

This work was partially funded by the German Federal Ministry for Economic Affairs and Climate Action (BMWK) through the Antrieb 4.0 project (Grant No. 13IK015B) and the MANDAT project (Grant No. 16DTM107A).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Freund .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Freund, M., Dorsch, R., Schmid, S., Wehr, T., Harth, A. (2025). Enriching RDF Data with LLM Based Named Entity Recognition and Linking on Embedded Natural Language Annotations. In: Tiwari, S., Villazón-Terrazas, B., Ortiz-Rodríguez, F., Sahri, S. (eds) Knowledge Graphs and Semantic Web. KGSWC 2024. Lecture Notes in Computer Science, vol 15459. Springer, Cham. https://doi.org/10.1007/978-3-031-81221-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-81221-7_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-81220-0

  • Online ISBN: 978-3-031-81221-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics