research-article

Open access

Towards Zero-shot Knowledge Graph building: Automated Schema Inference

Authors:

Salvatore Carta,

Alessandro Giuliani,

Marco Manolo Manca,

Leonardo Piano,

Sandro Gabriele TiddiaAuthors Info & Claims

UMAP Adjunct '24: Adjunct Proceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization

Pages 467 - 473

https://doi.org/10.1145/3631700.3665234

Published: 28 June 2024 Publication History

All formats PDF

Abstract

In the current Digital Transformation scenario, Knowledge Graphs are essential for comprehending, representing, and exploiting complex information in a structured form. The main paradigm in automatically generating proper Knowledge Graphs relies on predefined schemas or ontologies. Such schemas are typically manually constructed, requiring an intensive human effort, and are often sensitive to information loss due to negligence, incomplete analysis, or human subjectivity or inclination. Limiting human bias and the resulting information loss in creating proper Knowledge Graphs is paramount, particularly for user modeling in various sectors, such as education or healthcare. To this end, we propose a novel approach to automatically generating a proper entity schema. The devised methodology combines the language understanding capabilities of LLM with classical machine learning methods such as clustering to properly build an entity schema from a set of documents. This solution eliminates the need for human intervention and fosters a more efficient and comprehensive knowledge representation. The assessment of our proposal concerns adopting a state-of-the-art entity extraction model (UniNER) to estimate the relevance of the extracted entities based on the generated schema. Results confirm the potential of our approach, as we observed a negligible difference between the topic similarity score obtained with the ground truth and with the automatically generated schema (less than 1% on average on three different datasets). Such an outcome confirms that the proposed approach may be valuable in automatically creating an entity schema from a set of documents.

References

[1]

Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, 2024. Scaling instruction-finetuned language models. Journal of Machine Learning Research 25, 70 (2024), 1–53.

[2]

Philipp Cimiano and Johanna Völker. 2005. Text2Onto. In International Conference on Applications of Natural Language to Data Bases. https://api.semanticscholar.org/CorpusID:263889270

Digital Library

[3]

Kenneth Clarkson, Anna Lisa Gentile, Daniel Gruhl, Petar Ristoski, Joseph Terdiman, and Steve Welch. 2018. User-Centric Ontology Population. In The Semantic Web, Aldo Gangemi, Roberto Navigli, Maria-Esther Vidal, Pascal Hitzler, Raphaël Troncy, Laura Hollink, Anna Tordai, and Mehwish Alam (Eds.). Springer International Publishing, Cham, 112–127.

[4]

Nigel Collier and Jin-Dong Kim. 2004. Introduction to the Bio-entity Recognition Task at JNLPBA. In NLPBA/BioNLP. https://api.semanticscholar.org/CorpusID:7985741

[5]

Lisa Ehrlinger and Wolfram Wöß. 2016. Towards a Definition of Knowledge Graphs. In SEMANTiCS (Posters, Demos, SuCCESS).

[6]

Maurice Funk, Simon Hosemann, Jean Christoph Jung, and Carsten Lutz. 2023. Towards Ontology Construction with Language Models. ArXiv abs/2309.09898 (2023). https://api.semanticscholar.org/CorpusID:262044094

[7]

Pengcheng Jiang, Jiacheng Lin, Zifeng Wang, Jimeng Sun, and Jiawei Han. 2024. GenRES: Rethinking Evaluation for Generative Relation Extraction in the Era of Large Language Models. arXiv preprint arXiv:2402.10744 (2024).

[8]

Jingjing Liu, Panupong Pasupat, D. Scott Cyphers, and James R. Glass. 2013. Asgard: A portable architecture for multilingual dialogue systems. 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (2013), 8386–8390. https://api.semanticscholar.org/CorpusID:14903208

[9]

Alexander Maedche and Raphael Volz. 2001. The text-to-onto ontology extraction and maintenance system. https://api.semanticscholar.org/CorpusID:60483181

[10]

Shervin Minaee, Nal Kalchbrenner, Erik Cambria, Narjes Nikzad, Meysam Chenaghlu, and Jianfeng Gao. 2021. Deep Learning–based Text Classification: A Comprehensive Review. ACM Comput. Surv. 54, 3, Article 62 (apr 2021), 40 pages. https://doi.org/10.1145/3439726

Digital Library

[11]

Fionn Murtagh and Pedro Contreras. 2012. Algorithms for hierarchical clustering: an overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 2, 1 (2012), 86–97.

[12]

Atsushi Oba, Incheon Paik, and Ayato Kuwana. 2021. Automatic Classification for Ontology Generation by Pretrained Language Model. In Advances and Trends in Artificial Intelligence. Artificial Intelligence Practices: 34th International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2021, Kuala Lumpur, Malaysia, July 26–29, 2021, Proceedings, Part I 34. Springer, 210–221.

Digital Library

[13]

Joel Oksanen, Oana Cocarascu, and Francesca Toni. 2021. Automatic Product Ontology Extraction from Textual Reviews. arXiv preprint arXiv:2105.10966 (2021).

[14]

Krithikha Sanju Saravanan and Velammal Bhagavathiappan. 2024. Innovative agricultural ontology construction using NLP methodologies and graph neural network. Engineering Science and Technology, an International Journal 52 (2024), 101675.

[15]

Seungmin Seo, Byungkook Oh, Eunju Jo, Sanghak Lee, Dongho Lee, Kyong-Ho Lee, Donghoon Shin, and Yeonsoo Lee. 2022. Active Learning for Knowledge Graph Schema Expansion. IEEE Transactions on Knowledge and Data Engineering 34, 12 (2022), 5610–5620. https://doi.org/10.1109/TKDE.2021.3070317

[16]

Ketan Rajshekhar Shahapure and Charles Nicholas. 2020. Cluster quality analysis using silhouette score. In 2020 IEEE 7th international conference on data science and advanced analytics (DSAA). IEEE, 747–748.

[17]

Mehrnoush Shamsfard and Ahmad Abdollahzadeh Barforoush. 2004. Learning ontologies from natural language texts. International journal of human-computer studies 60, 1 (2004), 17–63.

Digital Library

[18]

Milena Trajanoska, Riste Stojanov, and Dimitar Trajanov. 2023. Enhancing knowledge graph construction using large language models. arXiv preprint arXiv:2305.04676 (2023).

[19]

Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, 2023. Zephyr: Direct distillation of lm alignment. arXiv preprint arXiv:2310.16944 (2023).

[20]

Bohui Zhang, Valentina Anita Carriero, Katrin Schreiberhuber, Stefani Tsaneva, Lucía Sánchez González, Jongmo Kim, and Jacopo de Berardinis. 2024. OntoChat: a Framework for Conversational Ontology Engineering using Language Models. arXiv preprint arXiv:2403.05921 (2024).

[21]

Wenxuan Zhou, Sheng Zhang, Yu Gu, Muhao Chen, and Hoifung Poon. 2023. UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition. ArXiv abs/2308.03279 (2023). https://api.semanticscholar.org/CorpusID:260682557

Index Terms

Towards Zero-shot Knowledge Graph building: Automated Schema Inference
1. Information systems
  1. Information retrieval

Recommendations

Translating relational schema into XML schema definition with data semantic preservation and XSD graph

Many legacy systems have been created by using relational database operating not for the Internet expression. Since the relational database is not an efficient way for data explosion, electronic transfer of data, and electronic business on the Web, we ...
Definitive XML Schema
Towards a theory of schema-mapping optimization
PODS '08: Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

A schema mapping is a high-level specification that describes the relationship between two database schemas. As schema mappings constitute the essential building blocks of data exchange and data integration, an extensive investigation of the foundations ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

UMAP Adjunct '24: Adjunct Proceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization

June 2024

662 pages

ISBN:9798400704666

DOI:10.1145/3631700

General Chairs:
Ludovico Boratto
University of Cagliari, Italy
,
Cristina Gena
University of Turin, Italy
,
Mirko Marras
University of Cagliari, Italy
,
Program Chairs:
Panagiotis Germanakos
SAP SE, Germany
,
Elvira Popescus
University of Craiova, Romania

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2024

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Ministero dell'Università e della Ricerca

Conference

UMAP '24

Sponsor:

UMAP '24: 32nd ACM Conference on User Modeling, Adaptation and Personalization

July 1 - 4, 2024

Cagliari, Italy

Acceptance Rates

Overall Acceptance Rate 162 of 633 submissions, 26%

Upcoming Conference

UMAP '25

Sponsor:
sigchi
sigchi

33rd ACM Conference on User Modeling, Adaptation and Personalization

June 16 - 19, 2025

New York City , NY , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
341
Total Downloads

Downloads (Last 12 months)341
Downloads (Last 6 weeks)66

Reflects downloads up to 12 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten