research-article

Contrastive Language-knowledge Graph Pre-training

Authors:

Yequan WangAuthors Info & Claims

ACM Transactions on Asian and Low-Resource Language Information Processing, Volume 23, Issue 4

Article No.: 51, Pages 1 - 21

https://doi.org/10.1145/3644820

Published: 15 April 2024 Publication History

Abstract

Recent years have witnessed a surge of academic interest in knowledge-enhanced pre-trained language models (PLMs) that incorporate factual knowledge to enhance knowledge-driven applications. Nevertheless, existing studies primarily focus on shallow, static, and separately pre-trained entity embeddings, with few delving into the potential of deep contextualized knowledge representation for knowledge incorporation. Consequently, the performance gains of such models remain limited. In this article, we introduce a simple yet effective knowledge-enhanced model, College (Contrastive Language-Knowledge Graph Pre-training), which leverages contrastive learning to incorporate factual knowledge into PLMs. This approach maintains the knowledge in its original graph structure to provide the most available information and circumvents the issue of heterogeneous embedding fusion. Experimental results demonstrate that our approach achieves more effective results on several knowledge-intensive tasks compared to previous state-of-the-art methods. Our code and trained models are available at https://github.com/Stacy027/COLLEGE.

References

[1]

Antoine Bordes, Nicolas Usunier, Alberto García-Durán, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In 27th Annual Conference on Neural Information Processing Systems. 2787–2795.

[2]

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language models are few-shot learners. In Annual Conference on Neural Information Processing Systems (NeurIPS’20).

[3]

Zewen Chi, Li Dong, Furu Wei, Nan Yang, Saksham Singhal, Wenhui Wang, Xia Song, Xian-Ling Mao, Heyan Huang, and Ming Zhou. 2021. InfoXLM: An information-theoretic framework for cross-lingual language model pre-training. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’21). 3576–3588.

[4]

Eunsol Choi, Omer Levy, Yejin Choi, and Luke Zettlemoyer. 2018. Ultra-fine entity typing. In 56th Annual Meeting of the Association for Computational Linguistics (ACL’18). 87–96.

[5]

Andrew M. Dai and Quoc V. Le. 2015. Semi-supervised sequence learning. In Annual Conference on Neural Information Processing Systems. 3079–3087.

[6]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19). 4171–4186.

[7]

Edsger W. Dijkstra. 1959. A note on two problems in connexion with graphs. Numer. Math. 1 (1959), 269–271.

Digital Library

[8]

Hongchao Fang and Pengtao Xie. 2020. CERT: Contrastive self-supervised learning for language understanding. CoRR abs/2005.12766 (2020).

[9]

Paolo Ferragina and Ugo Scaiella. 2010. TAGME: On-the-fly annotation of short text fragments (by Wikipedia entities). In 19th ACM Conference on Information and Knowledge Management (CIKM’10). 1625–1628.

Digital Library

[10]

Charles J. Fillmore. 1976. Frame semantics and the nature of language. Ann. New York Acad. Sci.: Conf. Origin Devel. Lang. Speech 280, 1 (1976), 20–32.

[11]

Xu Han, Hao Zhu, Pengfei Yu, Ziyun Wang, Yuan Yao, Zhiyuan Liu, and Maosong Sun. 2018. FewRel: A large-scale supervised few-shot relation classification dataset with state-of-the-art evaluation. In Conference on Empirical Methods in Natural Language Processing. 4803–4809.

[12]

Bin He, Di Zhou, Jinghui Xiao, Xin Jiang, Qun Liu, Nicholas Jing Yuan, and Tong Xu. 2020. Integrating graph contextualized knowledge into pre-trained language models. In Findings of the Association for Computational Linguistics (EMNLP’20), Vol. EMNLP 2020. 2281–2290.

[13]

Jeremy Howard and Sebastian Ruder. 2018. Universal language model fine-tuning for text classification. In 56th Annual Meeting of the Association for Computational Linguistics (ACL’18). 328–339.

[14]

Dan Iter, Kelvin Guu, Larry Lansing, and Dan Jurafsky. 2020. Pretraining with contrastive sentence objectives improves discourse performance of language models. In 58th Annual Meeting of the Association for Computational Linguistics (ACL’20). 4859–4870.

[15]

Robert L. Logan IV, Nelson F. Liu, Matthew E. Peters, Matt Gardner, and Sameer Singh. 2019. Barack’s wife hillary: Using knowledge graphs for fact-aware language modeling. In 57th Conference of the Association for Computational Linguistics (ACL’19). 5962–5971.

[16]

Gautier Izacard, Patrick S. H. Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni, Timo Schick, Jane Dwivedi-Yu, Armand Joulin, Sebastian Riedel, and Edouard Grave. 2023. Atlas: Few-shot learning with retrieval augmented language models. J. Mach. Learn. Res. 24 (2023), 251:1–251:43. Retrieved from http://jmlr.org/papers/v24/23-0037.html

[17]

Joel Jang, Seonghyeon Ye, Changho Lee, Sohee Yang, Joongbo Shin, Janghoon Han, Gyeonghun Kim, and Minjoon Seo. 2022. TemporalWiki: A lifelong benchmark for training and evaluating ever-evolving language models. In Conference on Empirical Methods in Natural Language Processing (EMNLP’22), Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, 6237–6250. DOI:

[18]

Jungo Kasai, Keisuke Sakaguchi, Yoichi Takahashi, Ronan Le Bras, Akari Asai, Xinyan Yu, Dragomir R. Radev, Noah A. Smith, Yejin Choi, and Kentaro Inui. 2022. RealTime QA: What’s the answer right now? CoRR abs/2207.13332 (2022).

[19]

Bill Yuchen Lin, Xinyue Chen, Jamin Chen, and Xiang Ren. 2019. KagNet: Knowledge-aware graph networks for commonsense reasoning. In Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19), Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (Eds.). Association for Computational Linguistics, 2829–2839. DOI:

[20]

Weijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Qi Ju, Haotang Deng, and Ping Wang. 2020. K-BERT: Enabling language representation with knowledge graph. In 34th AAAI Conference on Artificial Intelligence (AAAI”20), 32nd Innovative Applications of Artificial Intelligence Conference (IAAI’20), 10th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI’20). 2901–2908.

[21]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. CoRR abs/1907.11692 (2019).

[22]

Alex Mallen, Akari Asai, Victor Zhong, Rajarshi Das, Daniel Khashabi, and Hannaneh Hajishirzi. 2023. When not to trust language models: Investigating effectiveness of parametric and non-parametric memories. In 61st Annual Meeting of the Association for Computational Linguistics (ACL’23), Anna Rogers, Jordan L. Boyd-Graber, and Naoaki Okazaki (Eds.). Association for Computational Linguistics, 9802–9822. DOI:

[23]

Todor Mihaylov, Peter Clark, Tushar Khot, and Ashish Sabharwal. 2018. Can a suit of armor conduct electricity? A new dataset for open book question answering. In Conference on Empirical Methods in Natural Language Processing, Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii (Eds.). Association for Computational Linguistics, 2381–2391. DOI:

[24]

Tomás Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. In 27th Annual Conference on Neural Information Processing Systems. 3111–3119.

Digital Library

[25]

Sewon Min, Weijia Shi, Mike Lewis, Xilun Chen, Wen-tau Yih, Hannaneh Hajishirzi, and Luke Zettlemoyer. 2023. Nonparametric masked language modeling. In Findings of the Association for Computational Linguistics (ACL’23), Anna Rogers, Jordan L. Boyd-Graber, and Naoaki Okazaki (Eds.). Association for Computational Linguistics, 2097–2118. DOI:

[26]

Marvin Minsky. 1988. Society of Mind. Simon and Schuster.

[27]

Xuran Pan, Tianzhu Ye, Dongchen Han, Shiji Song, and Gao Huang. 2022. Contrastive language-image pre-training with knowledge graphs. In Annual Conference on Neural Information Processing Systems (NeurIPS’22).

[28]

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global vectors for word representation. In Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1532–1543.

[29]

Matthew E. Peters, Mark Neumann, Robert L. Logan IV, Roy Schwartz, Vidur Joshi, Sameer Singh, and Noah A. Smith. 2019. Knowledge enhanced contextual word representations. In Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). 43–54.

[30]

Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’18). 2227–2237.

[31]

Fabio Petroni, Tim Rocktäschel, Sebastian Riedel, Patrick S. H. Lewis, Anton Bakhtin, Yuxiang Wu, and Alexander H. Miller. 2019. Language models as knowledge bases? In Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). 2463–2473.

[32]

Nina Pörner, Ulli Waltinger, and Hinrich Schütze. 2020. E-BERT: Efficient-yet-effective entity embeddings for BERT. In Findings of the Association for Computational Linguistics (EMNLP’20), Vol. EMNLP 2020. 803–818.

[33]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning transferable visual models from natural language supervision. In 38th International Conference on Machine Learning (ICML’21), Vol. 139. 8748–8763.

[34]

Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving Language Understanding by Generative Pre-training. Technical Report. OpenAI..

[35]

Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI Blog 1, 8 (2019), 9.

[36]

Nikunj Saunshi, Orestis Plevrakis, Sanjeev Arora, Mikhail Khodak, and Hrishikesh Khandeparkar. 2019. A theoretical analysis of contrastive unsupervised representation learning. In 36th International Conference on Machine Learning (ICML’19). 5628–5637.

[37]

Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. FaceNet: A unified embedding for face recognition and clustering. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 815–823.

[38]

Mike Schuster and Kaisuke Nakajima. 2012. Japanese and Korean voice search. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’12). 5149–5152.

[39]

Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In 54th Annual Meeting of the Association for Computational Linguistics (ACL’16).

[40]

Robyn Speer, Joshua Chin, and Catherine Havasi. 2017. ConceptNet 5.5: An open multilingual graph of general knowledge. In 31st AAAI Conference on Artificial Intelligence, Satinder Singh and Shaul Markovitch (Eds.). AAAI Press, 4444–4451. DOI:

[41]

Tianxiang Sun, Yunfan Shao, Xipeng Qiu, Qipeng Guo, Yaru Hu, Xuanjing Huang, and Zheng Zhang. 2020. CoLAKE: Contextualized language and knowledge embedding. In 28th International Conference on Computational Linguistics (COLING’20). 3660–3670.

[42]

Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, and Jian Tang. 2019. RotatE: Knowledge graph embedding by relational rotation in complex space. In 7th International Conference on Learning Representations (ICLR’19).

[43]

Alon Talmor, Jonathan Herzig, Nicholas Lourie, and Jonathan Berant. 2019. CommonsenseQA: A question answering challenge targeting commonsense knowledge. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 4149–4158. DOI:

[44]

Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard. 2016. Complex embeddings for simple link prediction. In 33rd International Conference on Machine Learning (ICML16), Vol. 48. 2071–2080.

[45]

Joseph P. Turian, Lev-Arie Ratinov, and Yoshua Bengio. 2010. Word representations: A simple and general method for semi-supervised learning. In 48th Annual Meeting of the Association for Computational Linguistics (ACL’10). 384–394.

[46]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In 31st International Conference on Neural Information Processing Systems (NIPS’17). 6000–6010.

Digital Library

[47]

Denny Vrandecic and Markus Krötzsch. 2014. Wikidata: A free collaborative knowledgebase. Commun. ACM 57, 10 (2014), 78–85.

Digital Library

[48]

Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In 7th International Conference on Learning Representations (ICLR’19).

[49]

Ruize Wang, Duyu Tang, Nan Duan, Zhongyu Wei, Xuanjing Huang, Jianshu Ji, Guihong Cao, Daxin Jiang, and Ming Zhou. 2021. K-Adapter: Infusing knowledge into pre-trained models with adapters. In Findings of the Association for Computational Linguistics (ACL/IJCNLP’21). 1405–1418.

[50]

Xiaozhi Wang, Tianyu Gao, Zhaocheng Zhu, Zhengyan Zhang, Zhiyuan Liu, Juanzi Li, and Jian Tang. 2021. KEPLER: A unified model for knowledge embedding and pre-trained language representation. Trans. Assoc. Comput. Ling. 9 (2021), 176–194.

[51]

Ruobing Xie, Zhiyuan Liu, Jia Jia, Huanbo Luan, and Maosong Sun. 2016. Representation learning of knowledge graphs with entity descriptions. In 30th AAAI Conference on Artificial Intelligence. 2659–2665.

[52]

Wenhan Xiong, Jingfei Du, William Yang Wang, and Veselin Stoyanov. 2020. Pretrained encyclopedia: Weakly supervised knowledge-pretrained language model. In 8th International Conference on Learning Representations (ICLR’20). Retrieved from https://openreview.net/forum?id=BJlzm64tDH

[53]

Ikuya Yamada, Hiroyuki Shindo, Hideaki Takeda, and Yoshiyasu Takefuji. 2016. Joint learning of the embedding of words and entities for named entity disambiguation. In 20th SIGNLL Conference on Computational Natural Language Learning (CoNLL’16). 250–259.

[54]

Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. 2015. Embedding entities and relations for learning and inference in knowledge bases. In 3rd International Conference on Learning Representations (ICLR’15).

[55]

Zhilin Yang, Zihang Dai, Yiming Yang, Jaime G. Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized autoregressive pretraining for language understanding. In Annual Conference on Neural Information Processing Systems (NeurIPS’19). 5754–5764.

[56]

Liang Yao, Chengsheng Mao, and Yuan Luo. 2019. KG-BERT: BERT for knowledge graph completion. CoRR abs/1909.03193 (2019).

[57]

Michihiro Yasunaga, Antoine Bosselut, Hongyu Ren, Xikun Zhang, Christopher D. Manning, Percy S. Liang, and Jure Leskovec. 2022. Deep bidirectional language-knowledge graph pretraining. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.). Vol. 35. Curran Associates, Inc., 37309–37323.

[58]

Michihiro Yasunaga, Hongyu Ren, Antoine Bosselut, Percy Liang, and Jure Leskovec. 2021. QA-GNN: Reasoning with language models and knowledge graphs for question answering. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’21). 535–546.

[59]

Xikun Zhang, Antoine Bosselut, Michihiro Yasunaga, Hongyu Ren, Percy Liang, Christopher D. Manning, and Jure Leskovec. 2022. GreaseLM: Graph REASoning enhanced language models. In 10th International Conference on Learning Representations (ICLR’22). OpenReview.net. Retrieved from https://openreview.net/forum?id=41e9o6cQPj

[60]

Zhengyan Zhang, Xu Han, Zhiyuan Liu, Xin Jiang, Maosong Sun, and Qun Liu. 2019. ERNIE: Enhanced language representation with informative entities. In 57th Conference of the Association for Computational Linguistics (ACL’19). 1441–1451.

[61]

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, and Ji-Rong Wen. 2023. A survey of large language models. CoRR abs/2303.18223 (2023).

Cited By

Xu WLiu BPeng MJiang ZJia XLiu KLiu LPeng M(2025)Historical facts learning from Long-Short Terms with Language Model for Temporal Knowledge Graph ReasoningInformation Processing & Management10.1016/j.ipm.2024.10404762:3(104047)Online publication date: May-2025
https://doi.org/10.1016/j.ipm.2024.104047

Index Terms

Contrastive Language-knowledge Graph Pre-training
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

PreAdapter: Pre-training Language Models on Knowledge Graphs
The Semantic Web – ISWC 2024
Abstract
Pre-trained language models have demonstrated state-of-the-art performance in various downstream tasks such as summarization, sentiment classification, and question answering. Leveraging vast amounts of textual data during training, these models ...
Retrieval-based Knowledge Augmented Vision Language Pre-training
MM '23: Proceedings of the 31st ACM International Conference on Multimedia

With the recent progress in large-scale vision and language representation learning, Vision Language Pre-training (VLP) models have achieved promising improvements on various multi-modal downstream tasks. Albeit powerful, these models have not fully ...
KRACL: Contrastive Learning with Graph Context Modeling for Sparse Knowledge Graph Completion
WWW '23: Proceedings of the ACM Web Conference 2023

Knowledge Graph Embeddings (KGE) aim to map entities and relations to low dimensional spaces and have become the de-facto standard for knowledge graph completion. Most existing KGE methods suffer from the sparsity challenge, where it is harder to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 23, Issue 4

April 2024

221 pages

EISSN:2375-4702

DOI:10.1145/3613577

Editor:
Imed Zitouni
Google, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 April 2024

Online AM: 09 February 2024

Accepted: 25 January 2024

Revised: 19 November 2023

Received: 31 May 2023

Published in TALLIP Volume 23, Issue 4

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Key R&D Program of China
National Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
444
Total Downloads

Downloads (Last 12 months)319
Downloads (Last 6 weeks)27

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Xu WLiu BPeng MJiang ZJia XLiu KLiu LPeng M(2025)Historical facts learning from Long-Short Terms with Language Model for Temporal Knowledge Graph ReasoningInformation Processing & Management10.1016/j.ipm.2024.10404762:3(104047)Online publication date: May-2025
https://doi.org/10.1016/j.ipm.2024.104047

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents