On Contrasting YAGO with GPT-J: An Experiment for Person-Related Attributes

Martin-Moncunill, David; Sicilia, Miguel-Angel; González, Lino; Rodríguez, Diego

doi:10.1007/978-3-031-21422-6_17

David Martin-Moncunill¹⁰,
Miguel-Angel Sicilia¹¹,
Lino González¹⁰ &
…
Diego Rodríguez¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1686))

Included in the following conference series:

Iberoamerican Knowledge Graphs and Semantic Web Conference

808 Accesses
3 Citations

Abstract

Language models (LMs) trained or large text corpora have demonstrated their superior performance in different language related tasks in the last years. These models automatically implicitly incorporate factual knowledge that can be used to complement existing Knowledge Graphs (KGs) that in most cases are structured from human curated databases. Here we report an experiment that attempts to gain insights about the extent to which LMs can generate factual information as that present in KGs. Concretely, we have tested such process using the English Wikipedia subset of YAGO and the GPT-J model for attributes related to individuals. Results show that the generation of correct factual information depends on the generation parameters of the model and are unevenly balanced across diverse individuals. Further, the LM can be used to populate further factual information, but it requires intermediate parsing to correctly map to KG attributes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

WDV: A Broad Data Verbalisation Dataset Built from Wikidata

Factuality challenges in the era of large language models and opportunities for fact-checking

Article 22 August 2024

A Comprehensive Evaluation Method for KG-Augmented Large Language Models

Notes

References

Agarwal, O., Ge, H., Shakeri, S., Al-Rfou, R.: Knowledge graph based synthetic corpus generation for knowledge-enhanced language model pre-training (2020). arXiv preprint arXiv:2010.12688
Callahan, E.S., Herring, S.C.: Cultural bias in Wikipedia content on famous persons. J. Am. Soc. Inf. Sci. Technol. 62(10), 1899–1915 (2011)
Article Google Scholar
Gao, L., et al.: The pile: An 800GB dataset of diverse text for language modeling (2020). arXiv preprint arXiv:2101.00027
Hao, S., Tan, B., Tang, K., Zhang, H., Xing, E.P., Hu, Z.: BertNet: Harvesting Knowledge Graphs from Pretrained Language Models (2022). arXiv preprint arXiv:2206.14268
Huaman, E., Fensel, D.: Knowledge graph curation: a practical frame-work. In: The 10th International Joint Conference on Knowledge Graphs, pp. 166–171 (2021)
Google Scholar
Jiang, Z., Anastasopoulos, A., Araki, J., Ding, H., Neubig, G.: X-FACTR: Multilingual factual knowledge retrieval from pretrained language models (2020). arXiv preprint arXiv:2010.06189
Kalo, J.-C., Fichtel, L., Ehler, P., Balke, W.-T.: KnowlyBERT - hybrid query answering over language models and knowledge graphs. In: Pan, J.Z., et al. (eds.) The Semantic Web – ISWC 2020. LNCS, vol. 12506, pp. 294–310. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62419-4_17
Chapter Google Scholar
Logan IV, R.L., Liu, N.F., Peters, M.E., Gardner, M., Singh, S.: Barack’s wife hillary: Using knowledge-graphs for fact-aware language modeling (2019). arXiv preprint arXiv:1906.07241
Min, B., et al.: Recent advances in natural language processing via large pre-trained language models: a survey (2021). arXiv preprint arXiv:2111.01243
Mora-Cantallops, M., Sánchez-Alonso, S., García-Barriocanal, E.: A systematic literature review on Wikidata. Data Technol. Appl. 53(3), 250–268 (2019)
Google Scholar
Omeliyanenko, J., Zehe, A., Hettinger, L., Hotho, A.: LM4KG: improving common sense knowledge graphs with language models. In: Pan, J.Z., et al. (eds.) The Semantic Web – ISWC 2020. LNCS, vol. 12506, pp. 456–473. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62419-4_26
Chapter Google Scholar
Tanon, T.P., Weikum, G., Suchanek, F.: YAGO 4: a reasonable knowledge base. In: Harth, A., et al. (eds.) The Semantic Web. LNCS, vol. 12123, pp. 583–596. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49461-2_34
Chapter Google Scholar
Petroni, F., et al.: Language models as knowledge bases? (2019). arXiv preprint arXiv:1909.01066
Tripodi, F.: Ms. Categorized: Gender, notability, and inequality on Wikipedia. New Media Soc. 14614448211023772 (2021)
Google Scholar
Wang, C., Liu, X., Song, D.: Language models are open knowledge graphs (2020). arXiv preprint arXiv:2010.11967
Yasunaga, M., Ren, H., Bosselut, A., Liang, P., Leskovec, J.: QA-GNN: reasoning with language models and knowledge graphs for question answering (2021). arXiv preprint arXiv:2104.06378

Download references

Author information

Authors and Affiliations

Computing and Artificial Intelligence Lab (CAILab), School of Science and Technology, Camilo José Cela University, C/Castillo de Alarcón, 49, Urb. Villafranca del Castillo, 28692, Madrid, Spain
David Martin-Moncunill, Lino González & Diego Rodríguez
Computer Science Department, University of Alcalá, Polytechnic Building. Ctra. Barcelona km. 33.6, 28871, Alcalá de Henares (Madrid), Spain
Miguel-Angel Sicilia

Authors

David Martin-Moncunill
View author publications
You can also search for this author in PubMed Google Scholar
Miguel-Angel Sicilia
View author publications
You can also search for this author in PubMed Google Scholar
Lino González
View author publications
You can also search for this author in PubMed Google Scholar
Diego Rodríguez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Miguel-Angel Sicilia .

Editor information

Editors and Affiliations

EY wavespace/UNIR, Madrid, Spain
Boris Villazón-Terrazas
Autonomous University of Tamaulipas, Ciudad Victoria, Mexico
Fernando Ortiz-Rodriguez
Autonomous University of Tamaulipas, Ciudad Victoria, Mexico
Sanju Tiwari
University of Alcalá, Alcalá de Henares, Spain
Miguel-Angel Sicilia
Universidad Camilo José Cela, Madrid, Spain
David Martín-Moncunill

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Martin-Moncunill, D., Sicilia, MA., González, L., Rodríguez, D. (2022). On Contrasting YAGO with GPT-J: An Experiment for Person-Related Attributes. In: Villazón-Terrazas, B., Ortiz-Rodriguez, F., Tiwari, S., Sicilia, MA., Martín-Moncunill, D. (eds) Knowledge Graphs and Semantic Web . KGSWC 2022. Communications in Computer and Information Science, vol 1686. Springer, Cham. https://doi.org/10.1007/978-3-031-21422-6_17

Download citation

DOI: https://doi.org/10.1007/978-3-031-21422-6_17
Published: 13 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21421-9
Online ISBN: 978-3-031-21422-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics