ABSTRACT
Knowledge-Enhanced Pre-trained Language Models (KEPLMs) improve the language understanding abilities of deep language models by leveraging the rich semantic knowledge from knowledge graphs, other than plain pre-training texts. However, previous efforts mostly use homogeneous knowledge (especially structured relation triples in knowledge graphs) to enhance the context-aware representations of entity mentions, whose performance may be limited by the coverage of knowledge graphs. Also, it is unclear whether these KEPLMs truly understand the injected semantic knowledge due to the "black-box'' training mechanism. In this paper, we propose a novel KEPLM named HORNET, which integrates Heterogeneous knowledge from various structured and unstructured sources into the Roberta NETwork and hence takes full advantage of both linguistic and factual knowledge simultaneously. Specifically, we design a hybrid attention heterogeneous graph convolution network (HaHGCN) to learn heterogeneous knowledge representations based on the structured relation triplets from knowledge graphs and the unstructured entity description texts. Meanwhile, we propose the explicit dual knowledge understanding tasks to help induce a more effective infusion of the heterogeneous knowledge, promoting our model for learning the complicated mappings from the knowledge graph embedding space to the deep context-aware embedding space and vice versa. Experiments show that our HORNET model outperforms various KEPLM baselines on knowledge-aware tasks including knowledge probing, entity typing and relation extraction. Our model also achieves substantial improvement over several GLUE benchmark datasets, compared to other KEPLMs.
Supplemental Material
- Eneko Agirre, Lluís Màrquez, and Richard Wicentowski (Eds.). 2007. Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007). Prague, Czech Republic. Google ScholarDigital Library
- Antoine Bordes, Nicolas Usunier, Alberto García-Durán, Jason Weston, and Oksana Yakhnenko. 2013. Translating Embeddings for Modeling Multi-relational Data. In NIPS. 2787--2795. Google ScholarDigital Library
- Eunsol Choi, Omer Levy, Yejin Choi, and Luke Zettlemoyer. 2018. Ultra-Fine Entity Typing. In ACL. 87--96.Google Scholar
- Baiyun Cui, Yingming Li, Ming Chen, and Zhongfei Zhang. 2019. Fine-tune BERT with Sparse Self-Attention Mechanism. In EMNLP. 3539--3544.Google Scholar
- Ido Dagan, Oren Glickman, and Bernardo Magnini. 2005. The PASCAL Recognising Textual Entailment Challenge. In PASCAL, Vol. 3944. 177--190. Google ScholarDigital Library
- Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V Le, and Ruslan Salakhutdinov. 2019. Transformer-xl: Attentive language models beyond a fixedlength context. arXiv preprint arXiv:1901.02860 (2019).Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL. 4171--4186.Google Scholar
- William B. Dolan and Chris Brockett. 2005. Automatically Constructing a Corpus of Sentential Paraphrases. In IWP.Google Scholar
- Xu Han, Hao Zhu, Pengfei Yu, Ziyun Wang, Yuan Yao, Zhiyuan Liu, and Maosong Sun. 2018. FewRel: A Large-Scale Supervised Few-shot Relation Classification Dataset with State-of-the-Art Evaluation. In EMNLP. 4803--4809.Google Scholar
- Hiroaki Hayashi, Zecong Hu, Chenyan Xiong, and Graham Neubig. 2020. Latent Relation Language Models. In AAAI. 7911--7918.Google Scholar
- Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S. Weld, Luke Zettlemoyer, and Omer Levy. 2020. SpanBERT: Improving Pre-training by Representing and Predicting Spans. Trans. Assoc. Comput. Linguistics 8 (2020), 64--77.Google ScholarCross Ref
- Hector J. Levesque. 2011. The Winograd Schema Challenge. In AAAI.Google Scholar
- Xiaonan Li, Hang Yan, Xipeng Qiu, and Xuanjing Huang. 2020. FLAT: Chinese NER Using Flat-Lattice Transformer. In ACL. 6836--6842.Google Scholar
- Zhouhan Lin, Minwei Feng, Cícero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. A Structured Self-Attentive Sentence Embedding. In ICLR.Google Scholar
- Xiao Ling, Sameer Singh, and Daniel S. Weld. 2015. Design Challenges for Entity Linking. Trans. Assoc. Comput. Linguistics 3 (2015), 315--328.Google ScholarCross Ref
- Dayiheng Liu, Yeyun Gong, Jie Fu, Yu Yan, Jiusheng Chen, Daxin Jiang, Jiancheng Lv, and Nan Duan. [n.d.]. RikiNet: Reading Wikipedia Pages for Natural Question Answering. In ACL. 6762--6771.Google Scholar
- Weijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Qi Ju, Haotang Deng, and Ping Wang. 2020. K-BERT: Enabling Language Representation with Knowledge Graph. In AAAI. 2901--2908.Google Scholar
- Xiaodong Liu, Pengcheng He, Weizhu Chen, and Jianfeng Gao. 2019. Multi-Task Deep Neural Networks for Natural Language Understanding. In ACL. 4487--4496.Google Scholar
- Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019).Google Scholar
- Guoshun Nan, Zhijiang Guo, Ivan Sekulic, and Wei Lu. 2020. Reasoning with Latent Structure Refinement for Document-Level Relation Extraction. In ACL. 1546--1557.Google Scholar
- Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report 1999--66. Stanford InfoLab.Google Scholar
- Matthew E. Peters, Mark Neumann, Robert L. Logan IV, Roy Schwartz, Vidur Joshi, Sameer Singh, and Noah A. Smith. 2019. Knowledge Enhanced Contextual Word Representations. In EMNLP. 43--54.Google Scholar
- Fabio Petroni, Tim Rocktäschel, Sebastian Riedel, Patrick S. H. Lewis, Anton Bakhtin, Yuxiang Wu, and Alexander H. Miller. 2019. Language Models as Knowledge Bases?. In EMNLP. 2463--2473.Google Scholar
- Xipeng Qiu, Tianxiang Sun, Yige Xu, Yunfan Shao, Ning Dai, and Xuanjing Huang. 2020. Pre-trained Models for Natural Language Processing: A Survey. CoRR abs/2003.08271 (2020).Google ScholarCross Ref
- Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In EMNLP. 2383--2392.Google Scholar
- Michael Sejr Schlichtkrull, Thomas N. Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, and Max Welling. 2018. Modeling Relational Data with Graph Convolutional Networks. In ESWC. 593--607.Google Scholar
- Iulian Vlad Serban, Alessandro Sordoni, Yoshua Bengio, Aaron C. Courville, and Joelle Pineau. 2016. AAAI. 3776--3784.Google Scholar
- Sonse Shimaoka, Pontus Stenetorp, Kentaro Inui, and Sebastian Riedel. 2016. An Attentive Neural Architecture for Fine-grained Entity Type Classification. In AKBC. 69--74.Google Scholar
- Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng, and Christopher Potts. 2013. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. In EMNLP. 1631--1642.Google Scholar
- Tianxiang Sun, Yunfan Shao, Xipeng Qiu, Qipeng Guo, Yaru Hu, Xuanjing Huang, and Zheng Zhang. 2020. CoLAKE: Contextualized Language and Knowledge Embedding. In COLING. 3660--3670.Google Scholar
- Yu Sun, Shuohuan Wang, Yu-Kun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, and Hua Wu. 2019. ERNIE: Enhanced Representation through Knowledge Integration. CoRR abs/1904.09223 (2019).Google Scholar
- Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, 11 (2008).Google Scholar
- Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In ICLR.Google Scholar
- Xiaozhi Wang, Tianyu Gao, Zhaocheng Zhu, Zhiyuan Liu, Juanzi Li, and Jian Tang. 2019. KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation. CoRR abs/1911.06136 (2019).Google Scholar
- Alex Warstadt, Amanpreet Singh, and Samuel R. Bowman. 2019. Neural Network Acceptability Judgments. Trans. Assoc. Comput. Linguistics 7 (2019), 625--641.Google ScholarCross Ref
- Adina Williams, Nikita Nangia, and Samuel R. Bowman. 2018. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. In NAACL. 1112--1122.Google Scholar
- Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. 2015. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. In ICLR.Google Scholar
- Zhilin Yang, Zihang Dai, Yiming Yang, Jaime G. Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding. In NIPS. 5754--5764. Google ScholarDigital Library
- Denghui Zhang, Zixuan Yuan, Yanchi Liu, Zuohui Fu, Fuzhen Zhuang, Pengyang Wang, Haifeng Chen, and Hui Xiong. 2020. E-BERT: A Phrase and Product Knowledge Enhanced Language Model for E-commerce. CoRR abs/2009.02835 (2020).Google Scholar
- Yuhao Zhang, Peng Qi, and Christopher D. Manning. 2018. Graph Convolution over Pruned Dependency Trees Improves Relation Extraction. In EMNLP. 2205-- 2215. https://doi.org/10.18653/v1/d18--1244Google Scholar
- Yuhao Zhang, Victor Zhong, Danqi Chen, Gabor Angeli, and Christopher D. Manning. 2017. Position-aware Attention and Supervised Data Improve Slot Filling. In EMNLP. 35--45.Google Scholar
- Zhengyan Zhang, Xu Han, Zhiyuan Liu, Xin Jiang, Maosong Sun, and Qun Liu. 2019. ERNIE: Enhanced Language Representation with Informative Entities. In ACL. 1441--1451.Google Scholar
- Xuhui Zhou, Yue Zhang, Leyang Cui, and Dandan Huang. 2020. Evaluating commonsense in pre-trained language models. In AAAI. 9733--9740.Google Scholar
Index Terms
- HORNET: Enriching Pre-trained Language Representations with Heterogeneous Knowledge Sources
Recommendations
ReLMKG: reasoning with pre-trained language models and knowledge graphs for complex question answering
AbstractThe goal of complex question answering over knowledge bases (KBQA) is to find an answer entity in a knowledge graph. Recent information retrieval-based methods have focused on the topology of the knowledge graph, ignoring inconsistencies between ...
K-DLM: A Domain-Adaptive Language Model Pre-Training Framework with Knowledge Graph
Artificial Neural Networks and Machine Learning – ICANN 2023AbstractDespite the excellent performance of pre-trained language models, such as BERT, on various natural language processing tasks, they struggle with tasks that require domain-specific knowledge. Integrating information from knowledge graphs through ...
Embracing ambiguity: Improving similarity-oriented tasks with contextual synonym knowledge
AbstractContextual synonym knowledge is crucial for those similarity-oriented tasks whose core challenge lies in capturing semantic similarity between entities in their contexts, such as entity linking and entity matching. However, most Pre-...
Highlights- Contextual synonym knowledge is effective for similarity-oriented tasks.
- We ...
Comments