Skip to main content

On Contrasting YAGO with GPT-J: An Experiment for Person-Related Attributes

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1686))

Abstract

Language models (LMs) trained or large text corpora have demonstrated their superior performance in different language related tasks in the last years. These models automatically implicitly incorporate factual knowledge that can be used to complement existing Knowledge Graphs (KGs) that in most cases are structured from human curated databases. Here we report an experiment that attempts to gain insights about the extent to which LMs can generate factual information as that present in KGs. Concretely, we have tested such process using the English Wikipedia subset of YAGO and the GPT-J model for attributes related to individuals. Results show that the generation of correct factual information depends on the generation parameters of the model and are unevenly balanced across diverse individuals. Further, the LM can be used to populate further factual information, but it requires intermediate parsing to correctly map to KG attributes.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://yago-knowledge.org/.

  2. 2.

    https://yago-knowledge.org/.

  3. 3.

    https://github.com/rdfjs/N3.js/.

  4. 4.

    http://yago-knowledge.org/resource/Human.

  5. 5.

    https://huggingface.co/docs/transformers/modeidoc/gptj.

  6. 6.

    https://github.com/seatgeek/thefuzz.

References

  1. Agarwal, O., Ge, H., Shakeri, S., Al-Rfou, R.: Knowledge graph based synthetic corpus generation for knowledge-enhanced language model pre-training (2020). arXiv preprint arXiv:2010.12688

  2. Callahan, E.S., Herring, S.C.: Cultural bias in Wikipedia content on famous persons. J. Am. Soc. Inf. Sci. Technol. 62(10), 1899–1915 (2011)

    Article  Google Scholar 

  3. Gao, L., et al.: The pile: An 800GB dataset of diverse text for language modeling (2020). arXiv preprint arXiv:2101.00027

  4. Hao, S., Tan, B., Tang, K., Zhang, H., Xing, E.P., Hu, Z.: BertNet: Harvesting Knowledge Graphs from Pretrained Language Models (2022). arXiv preprint arXiv:2206.14268

  5. Huaman, E., Fensel, D.: Knowledge graph curation: a practical frame-work. In: The 10th International Joint Conference on Knowledge Graphs, pp. 166–171 (2021)

    Google Scholar 

  6. Jiang, Z., Anastasopoulos, A., Araki, J., Ding, H., Neubig, G.: X-FACTR: Multilingual factual knowledge retrieval from pretrained language models (2020). arXiv preprint arXiv:2010.06189

  7. Kalo, J.-C., Fichtel, L., Ehler, P., Balke, W.-T.: KnowlyBERT - hybrid query answering over language models and knowledge graphs. In: Pan, J.Z., et al. (eds.) The Semantic Web – ISWC 2020. LNCS, vol. 12506, pp. 294–310. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62419-4_17

    Chapter  Google Scholar 

  8. Logan IV, R.L., Liu, N.F., Peters, M.E., Gardner, M., Singh, S.: Barack’s wife hillary: Using knowledge-graphs for fact-aware language modeling (2019). arXiv preprint arXiv:1906.07241

  9. Min, B., et al.: Recent advances in natural language processing via large pre-trained language models: a survey (2021). arXiv preprint arXiv:2111.01243

  10. Mora-Cantallops, M., Sánchez-Alonso, S., García-Barriocanal, E.: A systematic literature review on Wikidata. Data Technol. Appl. 53(3), 250–268   (2019)

    Google Scholar 

  11. Omeliyanenko, J., Zehe, A., Hettinger, L., Hotho, A.: LM4KG: improving common sense knowledge graphs with language models. In: Pan, J.Z., et al. (eds.) The Semantic Web – ISWC 2020. LNCS, vol. 12506, pp. 456–473. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62419-4_26

    Chapter  Google Scholar 

  12. Tanon, T.P., Weikum, G., Suchanek, F.: YAGO 4: a reasonable knowledge base. In: Harth, A., et al. (eds.) The Semantic Web. LNCS, vol. 12123, pp. 583–596. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49461-2_34

    Chapter  Google Scholar 

  13. Petroni, F., et al.: Language models as knowledge bases? (2019). arXiv preprint arXiv:1909.01066

  14. Tripodi, F.: Ms. Categorized: Gender, notability, and inequality on Wikipedia. New Media Soc. 14614448211023772 (2021)

    Google Scholar 

  15. Wang, C., Liu, X., Song, D.: Language models are open knowledge graphs (2020). arXiv preprint arXiv:2010.11967

  16. Yasunaga, M., Ren, H., Bosselut, A., Liang, P., Leskovec, J.: QA-GNN: reasoning with language models and knowledge graphs for question answering (2021). arXiv preprint arXiv:2104.06378

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Miguel-Angel Sicilia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Martin-Moncunill, D., Sicilia, MA., González, L., Rodríguez, D. (2022). On Contrasting YAGO with GPT-J: An Experiment for Person-Related Attributes. In: Villazón-Terrazas, B., Ortiz-Rodriguez, F., Tiwari, S., Sicilia, MA., Martín-Moncunill, D. (eds) Knowledge Graphs and Semantic Web . KGSWC 2022. Communications in Computer and Information Science, vol 1686. Springer, Cham. https://doi.org/10.1007/978-3-031-21422-6_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-21422-6_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-21421-9

  • Online ISBN: 978-3-031-21422-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics