skip to main content
10.1145/3631700.3665234acmconferencesArticle/Chapter ViewAbstractPublication PagesumapConference Proceedingsconference-collections
research-article
Open access

Towards Zero-shot Knowledge Graph building: Automated Schema Inference

Published: 28 June 2024 Publication History

Abstract

In the current Digital Transformation scenario, Knowledge Graphs are essential for comprehending, representing, and exploiting complex information in a structured form. The main paradigm in automatically generating proper Knowledge Graphs relies on predefined schemas or ontologies. Such schemas are typically manually constructed, requiring an intensive human effort, and are often sensitive to information loss due to negligence, incomplete analysis, or human subjectivity or inclination. Limiting human bias and the resulting information loss in creating proper Knowledge Graphs is paramount, particularly for user modeling in various sectors, such as education or healthcare. To this end, we propose a novel approach to automatically generating a proper entity schema. The devised methodology combines the language understanding capabilities of LLM with classical machine learning methods such as clustering to properly build an entity schema from a set of documents. This solution eliminates the need for human intervention and fosters a more efficient and comprehensive knowledge representation. The assessment of our proposal concerns adopting a state-of-the-art entity extraction model (UniNER) to estimate the relevance of the extracted entities based on the generated schema. Results confirm the potential of our approach, as we observed a negligible difference between the topic similarity score obtained with the ground truth and with the automatically generated schema (less than 1% on average on three different datasets). Such an outcome confirms that the proposed approach may be valuable in automatically creating an entity schema from a set of documents.

References

[1]
Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, 2024. Scaling instruction-finetuned language models. Journal of Machine Learning Research 25, 70 (2024), 1–53.
[2]
Philipp Cimiano and Johanna Völker. 2005. Text2Onto. In International Conference on Applications of Natural Language to Data Bases. https://api.semanticscholar.org/CorpusID:263889270
[3]
Kenneth Clarkson, Anna Lisa Gentile, Daniel Gruhl, Petar Ristoski, Joseph Terdiman, and Steve Welch. 2018. User-Centric Ontology Population. In The Semantic Web, Aldo Gangemi, Roberto Navigli, Maria-Esther Vidal, Pascal Hitzler, Raphaël Troncy, Laura Hollink, Anna Tordai, and Mehwish Alam (Eds.). Springer International Publishing, Cham, 112–127.
[4]
Nigel Collier and Jin-Dong Kim. 2004. Introduction to the Bio-entity Recognition Task at JNLPBA. In NLPBA/BioNLP. https://api.semanticscholar.org/CorpusID:7985741
[5]
Lisa Ehrlinger and Wolfram Wöß. 2016. Towards a Definition of Knowledge Graphs. In SEMANTiCS (Posters, Demos, SuCCESS).
[6]
Maurice Funk, Simon Hosemann, Jean Christoph Jung, and Carsten Lutz. 2023. Towards Ontology Construction with Language Models. ArXiv abs/2309.09898 (2023). https://api.semanticscholar.org/CorpusID:262044094
[7]
Pengcheng Jiang, Jiacheng Lin, Zifeng Wang, Jimeng Sun, and Jiawei Han. 2024. GenRES: Rethinking Evaluation for Generative Relation Extraction in the Era of Large Language Models. arXiv preprint arXiv:2402.10744 (2024).
[8]
Jingjing Liu, Panupong Pasupat, D. Scott Cyphers, and James R. Glass. 2013. Asgard: A portable architecture for multilingual dialogue systems. 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (2013), 8386–8390. https://api.semanticscholar.org/CorpusID:14903208
[9]
Alexander Maedche and Raphael Volz. 2001. The text-to-onto ontology extraction and maintenance system. https://api.semanticscholar.org/CorpusID:60483181
[10]
Shervin Minaee, Nal Kalchbrenner, Erik Cambria, Narjes Nikzad, Meysam Chenaghlu, and Jianfeng Gao. 2021. Deep Learning–based Text Classification: A Comprehensive Review. ACM Comput. Surv. 54, 3, Article 62 (apr 2021), 40 pages. https://doi.org/10.1145/3439726
[11]
Fionn Murtagh and Pedro Contreras. 2012. Algorithms for hierarchical clustering: an overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 2, 1 (2012), 86–97.
[12]
Atsushi Oba, Incheon Paik, and Ayato Kuwana. 2021. Automatic Classification for Ontology Generation by Pretrained Language Model. In Advances and Trends in Artificial Intelligence. Artificial Intelligence Practices: 34th International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2021, Kuala Lumpur, Malaysia, July 26–29, 2021, Proceedings, Part I 34. Springer, 210–221.
[13]
Joel Oksanen, Oana Cocarascu, and Francesca Toni. 2021. Automatic Product Ontology Extraction from Textual Reviews. arXiv preprint arXiv:2105.10966 (2021).
[14]
Krithikha Sanju Saravanan and Velammal Bhagavathiappan. 2024. Innovative agricultural ontology construction using NLP methodologies and graph neural network. Engineering Science and Technology, an International Journal 52 (2024), 101675.
[15]
Seungmin Seo, Byungkook Oh, Eunju Jo, Sanghak Lee, Dongho Lee, Kyong-Ho Lee, Donghoon Shin, and Yeonsoo Lee. 2022. Active Learning for Knowledge Graph Schema Expansion. IEEE Transactions on Knowledge and Data Engineering 34, 12 (2022), 5610–5620. https://doi.org/10.1109/TKDE.2021.3070317
[16]
Ketan Rajshekhar Shahapure and Charles Nicholas. 2020. Cluster quality analysis using silhouette score. In 2020 IEEE 7th international conference on data science and advanced analytics (DSAA). IEEE, 747–748.
[17]
Mehrnoush Shamsfard and Ahmad Abdollahzadeh Barforoush. 2004. Learning ontologies from natural language texts. International journal of human-computer studies 60, 1 (2004), 17–63.
[18]
Milena Trajanoska, Riste Stojanov, and Dimitar Trajanov. 2023. Enhancing knowledge graph construction using large language models. arXiv preprint arXiv:2305.04676 (2023).
[19]
Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, 2023. Zephyr: Direct distillation of lm alignment. arXiv preprint arXiv:2310.16944 (2023).
[20]
Bohui Zhang, Valentina Anita Carriero, Katrin Schreiberhuber, Stefani Tsaneva, Lucía Sánchez González, Jongmo Kim, and Jacopo de Berardinis. 2024. OntoChat: a Framework for Conversational Ontology Engineering using Language Models. arXiv preprint arXiv:2403.05921 (2024).
[21]
Wenxuan Zhou, Sheng Zhang, Yu Gu, Muhao Chen, and Hoifung Poon. 2023. UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition. ArXiv abs/2308.03279 (2023). https://api.semanticscholar.org/CorpusID:260682557

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
UMAP Adjunct '24: Adjunct Proceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization
June 2024
662 pages
ISBN:9798400704666
DOI:10.1145/3631700
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2024

Check for updates

Author Tags

  1. Large Language Models
  2. Named Entity Recognition
  3. Ontology Learning

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

UMAP '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 162 of 633 submissions, 26%

Upcoming Conference

UMAP '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 341
    Total Downloads
  • Downloads (Last 12 months)341
  • Downloads (Last 6 weeks)67
Reflects downloads up to 11 Feb 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media