research-article

Public Access

DualDE: Dually Distilling Knowledge Graph Embedding for Faster and Cheaper Reasoning

Authors:

Huajun ChenAuthors Info & Claims

WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining

Pages 1516 - 1524

https://doi.org/10.1145/3488560.3498437

Published: 15 February 2022 Publication History

Abstract

Knowledge Graph Embedding (KGE) is a popular method for KG reasoning and training KGEs with higher dimension are usually preferred since they have better reasoning capability. However, high-dimensional KGEs pose huge challenges to storage and computing resources and are not suitable for resource-limited or time-constrained applications, for which faster and cheaper reasoning is necessary. To address this problem, we propose DualDE, a knowledge distillation method to build low-dimensional student KGE from pre-trained high-dimensional teacher KGE. DualDE considers the dual-influence between the teacher and the student. In DualDE, we propose a soft label evaluation mechanism to adaptively assign different soft label and hard label weights to different triples, and a two-stage distillation approach to improve the student's acceptance of the teacher. Our DualDE is general enough to be applied to various KGEs. Experimental results show that our method can successfully reduce the embedding parameters of a high-dimensional KGE by 7× - 15× and increase the inference speed by 2× - 6× while retaining a high performance. We also experimentally prove the effectiveness of our soft label evaluation mechanism and two-stage distillation approach via ablation study.

Supplementary Material

MP4 File (WSDM22-fp348.mp4)

Presentation video

Download
35.14 MB

References

[1]

Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. Semantic parsing on freebase from question-answer pairs. In EMNLP, pages 1533--1544. ACL, 2013.

[2]

Jonathan Berant and Percy Liang. Semantic parsing via paraphrasing. In ACL (1), pages 1415--1425. The Association for Computer Linguistics, 2014.

[3]

Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke S. Zettlemoyer, and Daniel S. Weld. Knowledge-based weak supervision for information extraction of overlapping relations. In ACL, pages 541--550. The Association for Computer Linguistics, 2011.

Digital Library

[4]

Joachim Daiber, Max Jakob, Chris Hokamp, and Pablo N. Mendes. Improving efficiency and accuracy in multilingual entity extraction. In I-SEMANTICS, pages 121--124. ACM, 2013.

Digital Library

[5]

Yuanzhe Zhang, Kang Liu, Shizhu He, Guoliang Ji, Zhanyi Liu, Hua Wu, and Jun Zhao. Question answering over knowledge base with neural attention combining global knowledge information. CoRR, abs/1606.00979, 2016.

[6]

Dennis Diefenbach, Kamal Deep Singh, and Pierre Maret. Wdaqua-core1: A question answering service for RDF knowledge bases. In WWW (Companion Volume), pages 1087--1091. ACM, 2018.

[7]

Antoine Bordes, Nicolas Usunier, Alberto Garc'i a-Durá n, Jason Weston, and Oksana Yakhnenko. Translating embeddings for modeling multi-relational data. In NIPS, pages 2787--2795, 2013.

Digital Library

[8]

Thé o Trouillon, Johannes Welbl, Sebastian Riedel, É ric Gaussier, and Guillaume Bouchard. Complex embeddings for simple link prediction. In ICML, volume 48 of JMLR Workshop and Conference Proceedings, pages 2071--2080. JMLR.org, 2016.

[9]

Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, and Jian Tang. Rotate: Knowledge graph embedding by relational rotation in complex space. In ICLR (Poster). OpenReview.net, 2019.

[10]

Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey Dean. Distilling the knowledge in a neural network. CoRR, abs/1503.02531, 2015.

[11]

Wonpyo Park, Dongju Kim, Yan Lu, and Minsu Cho. Relational knowledge distillation. In CVPR, pages 3967--3976. Computer Vision Foundation / IEEE, 2019.

[12]

Seyed-Iman Mirzadeh, Mehrdad Farajtabar, Ang Li, Nir Levine, Akihiro Matsukawa, and Hassan Ghasemzadeh. Improved knowledge distillation via teacher assistant. In AAAI, pages 5191--5198. AAAI Press, 2020.

[13]

Kai Wang, Yu Liu, Qian Ma, and Quan Z. Sheng. Mulde: Multi-teacher knowledge distillation for low-dimensional knowledge graph embeddings. In WWW, pages 1716--1726. ACM / IW3C2, 2021.

Digital Library

[14]

Siqi Sun, Yu Cheng, Zhe Gan, and Jingjing Liu. Patient knowledge distillation for BERT model compression. In EMNLP/IJCNLP (1), pages 4322--4331. Association for Computational Linguistics, 2019.

[15]

Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. A three-way model for collective learning on multi-relational data. In ICML, pages 809--816. Omnipress, 2011.

Digital Library

[16]

Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. Embedding entities and relations for learning and inference in knowledge bases. In ICLR (Poster), 2015.

[17]

Seyed Mehran Kazemi and David Poole. Simple embedding for link prediction in knowledge graphs. In NeurIPS, pages 4289--4300, 2018.

[18]

Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. Knowledge graph embedding by translating on hyperplanes. In AAAI, pages 1112--1119. AAAI Press, 2014.

Digital Library

[19]

Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. Learning entity and relation embeddings for knowledge graph completion. In AAAI, pages 2181--2187. AAAI Press, 2015.

Digital Library

[20]

Guoliang Ji, Shizhu He, Liheng Xu, Kang Liu, and Jun Zhao. Knowledge graph embedding via dynamic mapping matrix. In ACL (1), pages 687--696. The Association for Computer Linguistics, 2015.

[21]

Shuai Zhang, Yi Tay, Lina Yao, and Qi Liu. Quaternion knowledge graph embeddings. In NeurIPS, pages 2731--2741, 2019.

[22]

Canran Xu and Ruijiang Li. Relation embedding with dihedral group in knowledge graph. In ACL (1), pages 263--272. Association for Computational Linguistics, 2019.

[23]

Mrinmaya Sachan. Knowledge graph embedding compression. In ACL, pages 2681--2691. Association for Computational Linguistics, 2020.

[24]

Ruihao Gong, Xianglong Liu, Shenghu Jiang, Tianxiang Li, Peng Hu, Jiazhen Lin, Fengwei Yu, and Junjie Yan. Differentiable soft quantization: Bridging full-precision and low-bit neural networks. In ICCV, pages 4851--4860. IEEE, 2019.

[25]

Giovanna Castellano, Anna Maria Fanelli, and Marcello Pelillo. An iterative pruning algorithm for feedforward neural networks. IEEE Trans. Neural Networks, 8(3):519--531, 1997.

Digital Library

[26]

Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. Pruning convolutional neural networks for resource efficient inference. In ICLR (Poster). OpenReview.net, 2017.

[27]

Darryl Dexu Lin, Sachin S. Talathi, and V. Sreekanth Annapureddy. Fixed point quantization of deep convolutional networks. In ICML, volume 48 of JMLR Workshop and Conference Proceedings, pages 2849--2858. JMLR.org, 2016.

[28]

Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, and Lukasz Kaiser. Universal transformers. In ICLR (Poster). OpenReview.net, 2019.

[29]

Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. ALBERT: A lite BERT for self-supervised learning of language representations. In ICLR. OpenReview.net, 2020.

[30]

Raphael Tang, Yao Lu, Linqing Liu, Lili Mou, Olga Vechtomova, and Jimmy Lin. Distilling task-specific knowledge from BERT into simple neural networks. CoRR, abs/1903.12136, 2019.

[31]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT (1), pages 4171--4186. Association for Computational Linguistics, 2019.

[32]

Yonglong Tian, Dilip Krishnan, and Phillip Isola. Contrastive representation distillation. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26--30, 2020. OpenReview.net, 2020.

[33]

Sanqiang Zhao, Raghav Gupta, Yang Song, and Denny Zhou. Extreme language model compression with optimal subwords and shared projections. CoRR, abs/1909.11687, 2019.

[34]

Alessandro Achille and Stefano Soatto. Emergence of invariance and disentanglement in deep representations. J. Mach. Learn. Res., 19:50:1--50:34, 2018.

[35]

Kristina Toutanova, Danqi Chen, Patrick Pantel, Hoifung Poon, Pallavi Choudhury, and Michael Gamon. Representing text for joint embedding of text and knowledge bases. In EMNLP, pages 1499--1509. The Association for Computational Linguistics, 2015.

[36]

Tim Dettmers, Pasquale Minervini, Pontus Stenetorp, and Sebastian Riedel. Convolutional 2d knowledge graph embeddings. In AAAI, pages 1811--1818. AAAI Press, 2018.

[37]

Meng-Chieh Wu, Ching-Te Chiu, and Kun-Hsuan Wu. Multi-teacher knowledge distillation for compressed video action recognition on deep neural networks. In ICASSP, pages 2202--2206. IEEE, 2019.

[38]

Xu Han, Shulin Cao, Xin Lv, Yankai Lin, Zhiyuan Liu, Maosong Sun, and Juanzi Li. Openke: An open toolkit for knowledge embedding. In EMNLP (Demonstration), pages 139--144. Association for Computational Linguistics, 2018.

[39]

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In ICLR (Poster), 2015.

Cited By

Yan XYi YShi WTian HSu X(2024)Improvement of Web Semantic and Transformer-Based Knowledge Graph Completion in Low-Dimensional SpacesInternational Journal on Semantic Web and Information Systems10.4018/IJSWIS.33691920:1(1-18)Online publication date: 31-Jan-2024
https://doi.org/10.4018/IJSWIS.336919
Shi LLiu WWu YDai CJi ZGanchev I(2024)Knowledge Graph Embedding Using a Multi-Channel Interactive Convolutional Neural Network with Triple AttentionMathematics10.3390/math1218282112:18(2821)Online publication date: 11-Sep-2024
https://doi.org/10.3390/math12182821
Liu JKe WWang PShang ZGao JLi GJi KLiu YWooldridge MDy JNatarajan S(2024)Towards continual knowledge graph embedding via incremental distillationProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i8.28722(8759-8768)Online publication date: 20-Feb-2024
https://dl.acm.org/doi/10.1609/aaai.v38i8.28722
Show More Cited By

Index Terms

DualDE: Dually Distilling Knowledge Graph Embedding for Faster and Cheaper Reasoning
1. Computing methodologies
  1. Artificial intelligence
    1. Knowledge representation and reasoning
2. Information systems
  1. World Wide Web
    1. Web mining

Recommendations

Confidence-aware Self-Semantic Distillation on Knowledge Graph Embedding
CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management

Knowledge Graph Embedding (KGE), which projects entities and relations into continuous vector spaces, has garnered significant attention. Although high-dimensional KGE methods offer better performance, they come at the expense of significant computation ...
Dealing with partial labels by knowledge distillation
Abstract
Partial label learning (PLL) is a weakly supervised methodology dealing with tasks that have annotation problems by replacing the single label with a collection of candidate labels. Compared to single labels, utilizing partial labels faces ...
Highlights
- Different samples have different possibilities to have candidate labels.
- Knowledge distillation can utilize the clean samples in partial label learning.
- Different labels have different potential to be a candidate label.
- ...
FSKD: Detecting Fake News with Few-Shot Knowledge Distillation
Advanced Data Mining and Applications
Abstract
The detection of fake news on social networks is highly desirable and socially beneficial. In real scenarios, there are few labeled news articles and a large number of unlabeled articles. One prominent way is to consider fake news detection as a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining

February 2022

1690 pages

ISBN:9781450391320

DOI:10.1145/3488560

General Chairs:
K. Selcuk Candan
Arizona State University, USA
,
Huan Liu
Arizona State University, USA
,
Program Chairs:
Leman Akoglu
Carnegie Mellon University, USA
,
Xin Luna Dong
Meta Platforms, Inc. (former Facebook), USA
,
Jiliang Tang
Michigan State University, USA

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 February 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation
National Key Research Program

Conference

WSDM '22

Sponsor:

WSDM '22: The Fifteenth ACM International Conference on Web Search and Data Mining

February 21 - 25, 2022

AZ, Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
813
Total Downloads

Downloads (Last 12 months)260
Downloads (Last 6 weeks)14

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yan XYi YShi WTian HSu X(2024)Improvement of Web Semantic and Transformer-Based Knowledge Graph Completion in Low-Dimensional SpacesInternational Journal on Semantic Web and Information Systems10.4018/IJSWIS.33691920:1(1-18)Online publication date: 31-Jan-2024
https://doi.org/10.4018/IJSWIS.336919
Shi LLiu WWu YDai CJi ZGanchev I(2024)Knowledge Graph Embedding Using a Multi-Channel Interactive Convolutional Neural Network with Triple AttentionMathematics10.3390/math1218282112:18(2821)Online publication date: 11-Sep-2024
https://doi.org/10.3390/math12182821
Liu JKe WWang PShang ZGao JLi GJi KLiu YWooldridge MDy JNatarajan S(2024)Towards continual knowledge graph embedding via incremental distillationProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i8.28722(8759-8768)Online publication date: 20-Feb-2024
https://dl.acm.org/doi/10.1609/aaai.v38i8.28722
Liang KMeng LLiu MLiu YTu WWang SZhou SLiu XSun FHe K(2024)A Survey of Knowledge Graph Reasoning on Graph Types: Static, Dynamic, and Multi-ModalIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.341745146:12(9456-9478)Online publication date: Dec-2024
https://doi.org/10.1109/TPAMI.2024.3417451
Cai BXiang YGao LWu DZhang HJin JLuan T(2024)From Wide to Deep: Dimension Lifting Network for Parameter-Efficient Knowledge Graph EmbeddingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.343747936:12(8341-8348)Online publication date: Dec-2024
https://doi.org/10.1109/TKDE.2024.3437479
Liu XMao TShi YRen Y(2024)Overview of knowledge reasoning for knowledge graphNeurocomputing10.1016/j.neucom.2024.127571585:COnline publication date: 7-Jun-2024
https://dl.acm.org/doi/10.1016/j.neucom.2024.127571
Fang YLang QLu WLiu XYang J(2024)Hierarchical knowledge graph relationship prediction leverage of axiomatic fuzzy set graph structureExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.124090251:COnline publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1016/j.eswa.2024.124090
He TLiu MCao YQu MZheng ZQin B(2024)VEML: an easy but effective framework for fusing text and structure knowledge on sparse knowledge graph completionData Mining and Knowledge Discovery10.1007/s10618-023-01001-y38:2(343-371)Online publication date: 6-Feb-2024
https://dl.acm.org/doi/10.1007/s10618-023-01001-y
Tang WKong QLuo YMao W(2023)Neuro-Logic Learning for Relation Reasoning over Event Knowledge Graph2023 IEEE International Conference on Intelligence and Security Informatics (ISI)10.1109/ISI58743.2023.10297212(1-6)Online publication date: 2-Oct-2023
https://doi.org/10.1109/ISI58743.2023.10297212
Zhang QGao ZZhang MDuan JWang HHe L(2023)Compression Models via Meta-Learning and Structured Distillation for Named Entity Recognition2023 International Conference on Asian Language Processing (IALP)10.1109/IALP61005.2023.10336991(90-94)Online publication date: 18-Nov-2023
https://doi.org/10.1109/IALP61005.2023.10336991

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten