Navigating Ontology Development with Large Language Models

Saeedizade, Mohammad Javad; Blomqvist, Eva

doi:10.1007/978-3-031-60626-7_8

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14664))

Included in the following conference series:

European Semantic Web Conference

1133 Accesses

Abstract

Ontology engineering is a complex and time-consuming task, even with the help of current modelling environments. Often the result is error-prone unless developed by experienced ontology engineers. However, with the emergence of new tools, such as generative AI, inexperienced modellers might receive assistance. This study investigates the capability of Large Language Models (LLMs) to generate OWL ontologies directly from ontological requirements. Specifically, our research question centres on the potential of LLMs in assisting human modellers, by generating OWL modelling suggestions and alternatives. We experiment with several state-of-the-art models. Our methodology incorporates diverse prompting techniques like Chain of Thoughts (CoT), Graph of Thoughts (GoT), and Decomposed Prompting, along with the Zero-shot method. Results show that currently, GPT-4 is the only model capable of providing suggestions of sufficient quality, and we also note the benefits and drawbacks of the prompting techniques. Overall, we conclude that it seems feasible to use advanced LLMs to generate OWL suggestions, which are at least comparable to the quality of human novice modellers. Our research is a pioneering contribution in this area, being the first to systematically study the ability of LLMs to assist ontology engineers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

OBOWLMorph: Starting Ontology Development from PURO Background Models

Extending Ontology Engineering Practices to Facilitate Application Development

Application of the Tree-of-Thoughts Framework to LLM-Enabled Domain Modeling

Notes

1.
https://github.com/LiUSemWeb/LLMs4OntologyDev-ESWC2024.
2.
https://protege.stanford.edu/.
3.
https://allegrograph.com/topbraid-composer/.
4.
See for instance the vocabularies section of https://www.w3.org/TR/ld-bp/ or the whitepaper at https://www.nist.gov/document/nist-ai-rfi-cubrcinc002pdf for OBO ontologies.
5.
Details related to the versions and settings of these models can be found in our supplementary material.
6.
The test is passed if a query can be formulated, i.e., no test data is used, and the complexity of the queries has not been analysed so far.

References

Alharbi, R., et al.: Exploring the role of generative AI in constructing knowledge graphs for drug indications with medical context. In: 15th International Semantic Web Applications and Tools for Healthcare and Life Sciences (SWAT4HCLS 2024) (2024). (to appear)
Google Scholar
Alharbi, R., Tamma, V., Grasso, F., Payne, T.: An experiment in retrofitting competency questions for existing ontologies. arXiv preprint arXiv:2311.05662 (2023)
Almazrouei, E., et al.: Falcon-40B: an open large language model with state-of-the-art performance (2023). https://huggingface.co/tiiuae/falcon-40b
Babaei Giglou, H., D’Souza, J., Auer, S.: Llms4ol: large language models for ontology learning. In: Payne, T.R., et al. (eds.) ISWC 2023. LNCS, pp. 408–427. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-47240-4_22
Chapter Google Scholar
Besta, M.: Graph of thoughts: solving elaborate problems with large language models. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 16, pp. 17682–17690 (2024)
Google Scholar
Blomqvist, E., Hammar, K., Presutti, V.: Engineering ontologies with patterns-the extreme design methodology. In: Ontology Engineering with Ontology Design Patterns. IOS Press (2016)
Google Scholar
Blomqvist, E., Sandkuhl, K.: Patterns in ontology engineering: classification of ontology patterns. In: ICEIS, vol. 3, pp. 413–416. SciTePress (2005). https://doi.org/10.5220/0002518804130416. ISBN: 972-8865-19-8. INSTICC
Blomqvist, E., Seil Sepour, A., Presutti, V.: Ontology testing-methodology and tool. In: ten Teije, A., et al. (eds.) Knowledge Engineering and Knowledge Management. EKAW 2012. LNCS, vol. 7603, pp. 216–226. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33876-2_20
Chapter Google Scholar
Caufield, J.H., et al.: Structured prompt interrogation and recursive extraction of semantics (SPIRES): a method for populating knowledge bases using zero-shot learning. Bioinformatics 40(3), btae104 (2024). https://doi.org/10.1093/bioinformatics/btae104
Article Google Scholar
Chen, M., et al.: Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021)
Chen, Q., et al.: Large language models in biomedical natural language processing: benchmarks, baselines, and recommendations (2024). https://arxiv.org/abs/2305.16326
Fernández, M., Gómez-Pérez, A., Juristo, N.: Methontology: from ontological art towards ontological engineering. In: Proceedings of the AAAI97 Spring Symposium Series on Ontological Engineering (1997)
Google Scholar
Gangemi, A.: Ontology design patterns for semantic web content. In: Gil, Y., Motta, E., Benjamins, V.R., Musen, M.A. (eds.) ISWC 2005. LNCS, vol. 3729, pp. 262–276. Springer, Heidelberg (2005). https://doi.org/10.1007/11574620_21
Chapter Google Scholar
Grüninger, M., Fox, M.S.: The role of competency questions in enterprise engineering. In: Rolstadås, A. (eds.) Benchmarking — Theory and Practice. IFIP Advances in Information and Communication Technology, pp. 22–31. Springer, MA (1995). https://doi.org/10.1007/978-0-387-34847-6_3
He, Y., Chen, J., Dong, H., Horrocks, I., Allocca, C., Kim, T., Sapkota, B.: Deeponto: A python package for ontology engineering with deep learning (2024). (To appear in the Semantic Web Journal)
Google Scholar
Hertling, S., Paulheim, H.: OLaLa: ontology matching with large language models. In: Proceedings of the 12th Knowledge Capture Conference 2023. K-CAP ’23, pp. 131–139. Association for Computing Machinery, New York, NY (2023). https://doi.org/10.1145/3587259.3627571
Hogan, A., et al.: Knowledge Graphs. Morgan & Claypool Publishers, San Rafael (2021)
Google Scholar
Khot, T., et al.: Decomposed prompting: a modular approach for solving complex tasks. arXiv preprint arXiv:2210.02406 (2022)
Li, Y., et al.: Competition-level code generation with alphacode. Science 378(6624), 1092–1097 (2022)
Article Google Scholar
Lopes, A., Carbonera, J., Schmidt, D., Garcia, L., Rodrigues, F., Abel, M.: Using terms and informal definitions to classify domain entities into top-level ontology concepts: an approach based on language models. Knowl. Based Syst. 265, 110385 (2023). https://doi.org/10.1016/j.knosys.2023.110385, https://www.sciencedirect.com/science/article/pii/S0950705123001351
Mateiu, P., Groza, A.: Ontology engineering with large language models (2023). https://arxiv.org/abs/2307.16699
Mihindukulasooriya, N., Tiwari, S., Enguix, C.F., Lata, K.: Text2kgbench: a benchmark for ontology-driven knowledge graph generation from text. In: Payne, T.R., et al. (eds.) ISWC 2023. LNCS, vol. 14266, pp. 247–265. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-47243-5_14
Chapter Google Scholar
Neuhaus, F.: Ontologies in the era of large language models-a perspective. Appl. Ontol. 18(4), 399–407 (2023)
Article Google Scholar
Penedo, G., et al.: The refinedweb dataset for falcon LLM: outperforming curated corpora with web data, and web data only. arXiv preprint arXiv:2306.01116 (2023)
Peroni, S.: A simplified agile methodology for ontology development. In: Dragoni, M., Poveda-Villalón, M., Jimenez-Ruiz, E. (eds.) OWLED ORE 2016 2016. LNCS, vol. 10161, pp. 55–69. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-54627-8_5
Chapter Google Scholar
Petrucci, G., Rospocher, M., Ghidini, C.: Expressive ontology learning as neural machine translation. J. Web Seman. 52, 66–82 (2018)
Article Google Scholar
Poveda-Villalón, M., Fernández-Izquierdo, A., Fernández-López, M., García-Castro, R.: Lot: an industrial oriented ontology engineering framework. Eng. Appl. Artif. Intell. 111, 104755 (2022). https://doi.org/10.1016/j.engappai.2022.104755, https://www.sciencedirect.com/science/article/pii/S0952197622000525
Roziere, B., et al.: Code llama: open foundation models for code. arXiv preprint arXiv:2308.12950 (2023)
Shimizu, C., Hammar, K., Hitzler, P.: Modular ontology modeling. Semant. Web 14(3), 459–489 (2023)
Google Scholar
Suárez-Figueroa, M., Gómez-Pérez, A., Motta, E., Gangemi, A. (eds.): Ontology Engineering in a Networked World. Springer, Cham (2012)
Google Scholar
Taori, R., et al.: Stanford alpaca: an instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca (2023)
Touvron, H., et al.: Llama: open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023)
Vaithilingam, P., Zhang, T., Glassman, E.L.: Expectation vs. experience: evaluating the usability of code generation tools powered by large language models. In: Chi Conference on Human Factors in Computing Systems Extended Abstracts, pp. 1–7 (2022)
Google Scholar
Wang, L., et al.: Plan-and-solve prompting: improving xero-shot chain-of-thought reasoning by large language models. In: Rogers, A., Boyd-Graber, J., Okazaki, N. (eds.): Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, vol. 1: Long Papers, pp. 2609–2634. Association for Computational Linguistics, Toronto (2023). https://doi.org/10.18653/v1/2023.acl-long.147
Wang, X., et al.: Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171 (2022)
Wei, J., et al.: Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural. Inf. Process. Syst. 35, 24824–24837 (2022)
Google Scholar
Xu, C., et al.: Wizardlm: empowering large language models to follow complex instructions. arXiv preprint arXiv:2304.12244 (2023)

Download references

Acknowledgement

This project has received funding from the European Union’s Horizon Europe research and innovation programme under grant agreement no. 101058682 (Onto-DESIDE), and is supported by the strategic research area Security Link. The student solutions used in the research were collected as part of a master’s course taught by Assoc. Prof. Blomqvist while employed at Jönköping University.

ChatGPT was used to enhance the readability of some of the text and improve the language of this paper, after the content was first added manually. All material was then checked manually before submission.

Author information

Authors and Affiliations

Linköping University, Linköping, Sweden
Mohammad Javad Saeedizade & Eva Blomqvist

Authors

Mohammad Javad Saeedizade
View author publications
You can also search for this author in PubMed Google Scholar
Eva Blomqvist
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammad Javad Saeedizade .

Editor information

Editors and Affiliations

King’s College London, London, UK
Albert Meroño Peñuela
KU Leuven, Sint-Katelijne-Waver, Belgium
Anastasia Dimou
EURECOM, Biot, France
Raphaël Troncy
Linköping University, Linköping, Sweden
Olaf Hartig
Technical University of Munich, Heilbronn, Germany
Maribel Acosta
Polytechnic Institute of Paris, Palaiseau, France
Mehwish Alam
University of Mannheim, Mannheim, Germany
Heiko Paulheim
EURECOM, Biot, France
Pasquale Lisena

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Appendix

1.1 Motivations, Limitations and Negative Results

Prompt Components: As mentioned in the methodology section 3, there are four sections in each prompt. At a quick glance, the header and story sections appear to be necessary since we provide a brief prompt and the story requirements. The helper and footer sections may be considered optional. However, removing the helper section causes the LLM to completely avoid modelling reifications and misplacing properties, such as putting a datatype property as a range for an object property. The helper begins by outlining strategies to establish a taxonomy, which is otherwise often ignored by LLMs.

The footer, or pitfall section, also enhances the output significantly. It offers the LLMs with common mistakes that they produce. Common errors that are mentioned as pitfalls to avoid are: (1) Providing an empty output of the given prompt. (2) Avoid the use of the Turtle syntax and instead provide a list of items in Python syntax. (3) Avoiding to provide an OWL output without establishing any taxonomy of classes. (4) In the thoughts prompting techniques, avoiding to run the complete plan (several steps) at a current step, since LLMs can ignore instructions and give the complete answer at the first step. (5) Providing explanations instead of providing the code.

Ontology Design Patterns (ODPs) serve as guides for ontology engineer to model an ontology. However, adding examples to prompts seems to degrade output performance. Despite fitting the prompt and story, 32K context LLMs tend to forget the ontology story (we tried with the 128K context GPT4-turbo model and it failed). This could be because the large context is distracting the current LLMs (this could be caused by the low performance of attention layers in LLMs). We used the term “distraction” since the model starts modelling the ODPs in the output instead of the given task.

Limitations: This study, while insightful, has several limitations. Our choice of evaluation method was additionally influenced by the time constraints faced by human experts in manually evaluating the outputs. While this approach was necessary given the available resources, it may not capture the full depth and nuances of LLM-generated ontologies compared to a more thorough, even though time-consuming, manual evaluation.

Due to their extensive branching, the tree of thoughts and the full version of the graph of thoughts techniques proved expensive. This complexity led to slower processing times and increased costs, limiting their practicality for larger-scale or time-sensitive applications.

We used the Microsoft Azure API to access GPT-3.5 and GPT-4, versions 613 trained until 2021. Consequently, our analysis did not consider any advancements or updates in these models post-2021, including the introduction of seed features in newer updates. This might limit the relevance of our findings in the context of the latest LLM capabilities. The accessibility of hyperparameters in GPT-4 and GPT-3.5 is limited, which presented challenges in our experiment. Despite setting the temperature and penalty parameters to zero (except in plan generation for GoT and CoT-SC, where they were set to 0.5), we observed inconsistencies in the outcomes when using identical prompts. This variability underscores the significance of utilizing open-source LLMs for achieving more consistent and reliable LLM performance rather than depending on unpredictable factors.

We faced another setback in our attempt to produce a more efficient OWL code to reduce context size or general improvement of modelling. For example, in CQbyCQ, when a CQ is addressed, we simply merge it with the previous CQs instead of asking LLM to merge if this CQ has not been addressed. This choice was made since LLMs often forgot to merge classes (or properties) from the previous section, which resulted in incomplete modelling.

Lastly, we encountered another challenge by experimenting with few-shot prompting techniques. In few-shot prompting, a few examples are provided to LLMs as an example. We faced difficulty finding examples of ontology modelling that were not too similar to the ontology story, as this could potentially provide an answer to the LLM. However, this challenge may lead to a similar experiment as the one we mentioned earlier in the usage of ODPs (LLM distraction due to large context size).

1.2 Initial Experiment Result Details

Due to space limitations we were not able to present all details of the initial experiment in the main paper body, merely a conclusion summary. The detailed results of the initial experiment, phase 2, are instead reflected here. In Table 2, the LLM-Prompting scores are presented, averaged over the three tasks and 8 criteria, and a threshold of 0.9 is chosen to pass.

Table 2. After conducting the initial experiment phase two, it was decided that CoT, CoT-SC, CQbyCQ, and GoT would move to the next stage (score > 0.9). GPT-3.5 was excluded as its performance was found to be equal to or less than GPT-4.

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Saeedizade, M.J., Blomqvist, E. (2024). Navigating Ontology Development with Large Language Models. In: Meroño Peñuela, A., et al. The Semantic Web. ESWC 2024. Lecture Notes in Computer Science, vol 14664. Springer, Cham. https://doi.org/10.1007/978-3-031-60626-7_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-60626-7_8
Published: 19 May 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-60625-0
Online ISBN: 978-3-031-60626-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigating Ontology Development with Large Language Models

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

OBOWLMorph: Starting Ontology Development from PURO Background Models

Extending Ontology Engineering Practices to Facilitate Application Development

Application of the Tree-of-Thoughts Framework to LLM-Enabled Domain Modeling

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

Appendix

Appendix

1.1 Motivations, Limitations and Negative Results

1.2 Initial Experiment Result Details

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us