LLMs4OM: Matching Ontologies with Large Language Models

Babaei Giglou, Hamed; D’Souza, Jennifer; Engel, Felix; Auer, Sören

doi:10.1007/978-3-031-78952-6_3

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15344))

Included in the following conference series:

European Semantic Web Conference

56 Accesses

Abstract

Ontology Matching (OM), is a critical task in knowledge integration, where aligning heterogeneous ontologies facilitates data interoperability and knowledge sharing. Traditional OM systems often rely on expert knowledge or predictive models, with limited exploration of the potential of Large Language Models (LLMs). We present the LLMs4OM framework, a novel approach to evaluate the effectiveness of LLMs in OM tasks. This framework utilizes two modules for retrieval and matching, respectively, enhanced by zero-shot prompting across three ontology representations: concept, concept-parent, and concept-children. Through comprehensive evaluations using 20 OM datasets from various domains, we demonstrate that LLMs, under the LLMs4OM framework, can match and even surpass the performance of traditional OM systems, particularly in complex matching scenarios. Our results highlight the potential of LLMs to significantly contribute to the field of OM.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Algergawy, A., Babalou, S., Klan, F., König-Ries, B.: Ontology modularization with OAPT. J. Data Semant. 9(2), 53–83 (2020). https://doi.org/10.1007/s13740-020-00114-7, https://doi.org/10.1007/s13740-020-00114-7
Almazrouei, E., et al.: The falcon series of open language models (2023)
Google Scholar
Alsentzer, E., et al.: Publicly available clinical BERT embeddings. In: Proceedings of the 2nd Clinical Natural Language Processing Workshop, pp. 72–78. Association for Computational Linguistics, Minneapolis, Minnesota, USA (2019). https://doi.org/10.18653/v1/W19-1909, https://www.aclweb.org/anthology/W19-1909
Amir, M., et al.: Truveta Mapper: a zero-shot ontology alignment framework (2023)
Google Scholar
Cer, D., et al.: Universal sentence encoder (2018)
Google Scholar
Chung, H.W., et al.: Scaling instruction-finetuned language models (2022)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding (2019)
Google Scholar
Dragisic, Z., Ivanova, V., Li, H., Lambrix, P.: Experiences from the anatomy track in the ontology alignment evaluation initiative. J. Biomed. Semant. 8(1), 56 (2017). https://doi.org/10.1186/s13326-017-0166-5
Efeoglu, S.: GraphMatcher: a graph representation learning approach for ontology matching. In: OM@ISWC. CEUR Workshop Proceedings, vol. 3324, pp. 174–180. CEUR-WS.org (2022)
Google Scholar
Euzenat, J., Shvaiko, P.: Ontology Matching. Springer Publishing Company, Incorporated, 2nd edn. (2013). https://doi.org/10.1007/978-3-642-38721-0
Euzenat, J., Meilicke, C., Stuckenschmidt, H., Shvaiko, P., Trojahn, C.: Ontology alignment evaluation initiative: six years of experience. J. Data Semant. 15, 158–192 (2011). https://doi.org/10.1007/978-3-642-22630-4_6
Fallatah, O., Zhang, Z., Hopfgartner, F.: A gold standard dataset for large knowledge graphs matching (2020). https://eprints.whiterose.ac.uk/173366/, 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (http://creativecommons.org/licenses/by/4.0)
Faria, D., Pesquita, C., Santos, E., Palmonari, M., Cruz, I.F., Couto, F.M.: The AgreementMakerLight ontology matching system. In: Meersman, R., et al. (eds.) OTM 2013. LNCS, vol. 8185, pp. 527–541. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41030-7_38
Chapter Google Scholar
Faria, D., Silva, M.C., Cotovio, P., Ferraz, L., Balbi, L., Pesquita, C.: Results for Matcha and Matcha-DL in OAEI 2023. In: OM@ISWC. CEUR Workshop Proceedings, vol. 3591, pp. 164–169. CEUR-WS.org (2023)
Google Scholar
Gosselin, F., Zouaq, A.: SORBET: a Siamese network for ontology embeddings using a distance-based regression loss and BERT. In: Payne, T.R., et al. (eds.) The Semantic Web - ISWC 2023, pp. 561–578. Springer Nature Switzerland, Cham (2023)
Chapter MATH Google Scholar
Gu, A., Dao, T.: Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752 (2023)
Harrow, I., et al.: Matching disease and phenotype ontologies in the ontology alignment evaluation initiative. J. Biomed. Semant. 8(1), 55 (2017). https://doi.org/10.1186/s13326-017-0162-9, https://doi.org/10.1186/s13326-017-0162-9
He, Y., Chen, J., Antonyrajah, D., Horrocks, I.: BERTMap: A BERT-based ontology alignment system (2022)
Google Scholar
He, Y., Chen, J., Dong, H., Horrocks, I.: Exploring large language models for ontology alignment (2023)
Google Scholar
He, Y., Chen, J., Dong, H., Jiménez-Ruiz, E., Hadian, A., Horrocks, I.: Machine learning-friendly biomedical datasets for equivalence and subsumption ontology matching. In: Sattler, U., et al. (eds.) The Semantic Web - ISWC 2022, pp. 575–591. Springer International Publishing, Cham (2022). https://doi.org/10.1007/978-3-031-19433-7_33
Chapter Google Scholar
Hertling, S., Paulheim, H.: OLaLa: ontology matching with large language models. In: Proceedings of the 12th Knowledge Capture Conference 2023, pp. 131–139. K-CAP ’23, Association for Computing Machinery, New York, NY, USA (2023). https://doi.org/10.1145/3587259.3627571
Jiang, A.Q., et al.: Mistral 7B (2023)
Google Scholar
Jiménez-Ruiz, E., Cuenca Grau, B.: LogMap: logic-based and scalable ontology matching. In: Aroyo, L., et al. (eds.) ISWC 2011. LNCS, vol. 7031, pp. 273–288. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25073-6_18
Chapter MATH Google Scholar
Karam, N., Khiat, A., Algergawy, A., Sattler, M., Weiland, C., Schmidt, M.: Matching biodiversity and ecology ontologies: challenges and evaluation results. Knowl. Eng. Rev. 35, e9 (2020). https://doi.org/10.1017/S0269888920000132, https://doi.org/10.1017/S0269888920000132
Labrak, Y., Bazoge, A., Morin, E., Gourraud, P.A., Rouvier, M., Dufour, R.: BioMistral: a collection of open-source pretrained large language models for medical domains (2024)
Google Scholar
Lewis, P., et al.: Retrieval-augmented generation for knowledge-intensive NLP tasks (2021)
Google Scholar
Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., Neubig, G.: Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing (2021)
Google Scholar
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach (2019)
Google Scholar
Nas, E., Huschka, M.: MSE Benchmark. https://github.com/EngyNasr/MSE-Benchmark (2023)
Norouzi, S.S., Mahdavinejad, M.S., Hitzler, P.: Conversational ontology alignment with ChatGPT. In: OM@ISWC. CEUR Workshop Proceedings, vol. 3591, pp. 61–66. CEUR-WS.org (2023)
Google Scholar
Noy, N., Mcguinness, D.: Ontology development 101: a guide to creating your first ontology. Knowl. Syst. Lab. 32 (2001)
Google Scholar
OpenAI: ChatGPT. https://openai.com/chat-gpt/ (2023). Accessed 5 May 2023
OpenAI: new and improved embedding model (2023). https://openai.com/blog/new-and-improved-embedding-model. Retrieved 15 Dec 2022
Osman, I., Ben Yahia, S., Diallo, G.: Ontology integration: approaches and challenging issues. Inf. Fus. 71, 38–63 (2021). https://doi.org/10.1016/j.inffus.2021.01.007
Article Google Scholar
Peng, Y., Alam, M., Bonald, T.: Ontology matching using textual class descriptions. In: OM@ISWC. CEUR Workshop Proceedings, vol. 3591, pp. 67–72. CEUR-WS.org (2023)
Google Scholar
Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: Moschitti, A., Pang, B., Daelemans, W. (eds.) Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/D14-1162, https://aclanthology.org/D14-1162
Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using Siamese BERT-networks (2019)
Google Scholar
Sammut, C., Webb, G.I. (eds.): TF–IDF, pp. 986–987. Springer US, Boston, MA (2010). https://doi.org/10.1007/978-0-387-30164-8_832
Sharma, A., Jain, S.: LSMatch and LSMatch-multilingual results for OAEI 2023. In: OM@ISWC. CEUR Workshop Proceedings, vol. 3591, pp. 159–163. CEUR-WS.org (2023)
Google Scholar
Shvaiko, P., Euzenat, J., Jiménez-Ruiz, E., Hassanzadeh, O., Trojahn, C. (eds.): Proceedings of the 18th International Workshop on Ontology Matching co-located with the 22nd International Semantic Web Conference (ISWC 2023), Athens, Greece, November 7, 2023, CEUR Workshop Proceedings, vol. 3591. CEUR-WS.org (2023)
Google Scholar
da Silva, J., Revoredo, K., Baião, F., Lima, C.: ALIN results for OAEI 2023. In: OM@ISWC. CEUR Workshop Proceedings, vol. 3591, pp. 140–145. CEUR-WS.org (2023)
Google Scholar
Singh, A., D’Arcy, M., Cohan, A., Downey, D., Feldman, S.: SciRepEval: a multi-format benchmark for scientific document representations. ArXiv abs/2211.13308 (2022)
Google Scholar
Sousa, G., Lima, R., Trojahn, C.: Combining word and sentence embeddings with alignment extension for property matching. In: OM@ISWC. CEUR Workshop Proceedings, vol. 3591, pp. 91–96. CEUR-WS.org (2023)
Google Scholar
Stephan, G., Pascal, H., Andreas, A.: Knowledge Representation and Ontologies, pp. 51–105. Springer, Berlin, Heidelberg (2007). https://doi.org/10.1007/3-540-70894-4_3
Team, M.N.: Introducing MPT-7B: a new standard for open-source, commercially usable LLMs (2023). www.mosaicml.com/blog/mpt-7b. Accessed 05 May 2023
Touvron, H., at el.: Llama 2: open foundation and fine-tuned chat models (2023)
Google Scholar
Wang, Z.: AMD results for OAEI 2023. In: OM@ISWC. CEUR Workshop Proceedings, vol. 3591, pp. 146–153. CEUR-WS.org (2023)
Google Scholar
Wang, Z.: Contextualized structural self-supervised learning for ontology matching (2023)
Google Scholar
Wolf, T., et al.: HuggingFace’s transformers: state-of-the-art natural language processing (2020)
Google Scholar
Xue, L., et al.: ByT5: towards a token-free future with pre-trained byte-to-byte models (2022)
Google Scholar
Zhang, X., Zhao, C., Wang, X.: A survey on knowledge representation in materials science and engineering: an ontological perspective. Computers in Industry 73, 8–22 (2015). https://doi.org/10.1016/j.compind.2015.07.005
Zheng, L., et al.: Judging LLM-as-a-judge with MT-bench and chatbot arena (2023)
Google Scholar

Download references

Acknowledgments

We thank Nenad Krdzavac for valuable insights on a previous draft of this paper. This work was supported by the German BMBF project SCINEXT (ID 01lS22070), the European Research Council for ScienceGRAPH (GA ID: 819536), and German DFG for NFDI4DataScience (no. 460234259).

Author information

Authors and Affiliations

TIB Leibniz Information Centre for Science and Technology, Hannover, Germany
Hamed Babaei Giglou, Jennifer D’Souza, Felix Engel & Sören Auer

Authors

Hamed Babaei Giglou
View author publications
You can also search for this author in PubMed Google Scholar
Jennifer D’Souza
View author publications
You can also search for this author in PubMed Google Scholar
Felix Engel
View author publications
You can also search for this author in PubMed Google Scholar
Sören Auer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hamed Babaei Giglou .

Editor information

Editors and Affiliations

King’s College London, London, UK
Albert Meroño Peñuela
Universidad Politécnica de Madrid, Madrid, Spain
Oscar Corcho
University of Amsterdam, Amsterdam, Noord-Holland, The Netherlands
Paul Groth
King’s College London, London, UK
Elena Simperl
University of Liverpool, Liverpool, UK
Valentina Tamma
National Research Council, Bologna, Italy
Andrea Giovanni Nuzzolese
Universidad Politécnica de Madrid, Madrid, Spain
Maria Poveda-Villalón
Vienna University of Economics and Business, Vienna, Austria
Marta Sabou
University of Bologna, Bologna, Italy
Valentina Presutti
Cefriel - Politecnico Di Milano, Milan, Italy
Irene Celino
Semantic Web Company, Vienna, Austria
Artem Revenko
University of Paris-Saclay, Gif-sur-Yvette, France
Joe Raad
Ludwig-Maximilians-Universität München, Munich, Germany
Bruno Sartini
EURECOM, Biot, France
Pasquale Lisena

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Babaei Giglou, H., D’Souza, J., Engel, F., Auer, S. (2025). LLMs4OM: Matching Ontologies with Large Language Models. In: Meroño Peñuela, A., et al. The Semantic Web: ESWC 2024 Satellite Events. ESWC 2024. Lecture Notes in Computer Science, vol 15344. Springer, Cham. https://doi.org/10.1007/978-3-031-78952-6_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-78952-6_3
Published: 28 January 2025
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-78951-9
Online ISBN: 978-3-031-78952-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

LLMs4OM: Matching Ontologies with Large Language Models