A data-centric framework of improving graph neural networks for knowledge graph embedding

Cao, Yanan; Lin, Xixun; Wu, Yongxuan; Shi, Fengzhao; Shang, Yanmin; Tan, Qingfeng; Zhou, Chuan; Zhang, Peng

doi:10.1007/s11280-024-01320-0

A data-centric framework of improving graph neural networks for knowledge graph embedding

Published: 27 November 2024

Volume 28, article number 2, (2025)
Cite this article

World Wide Web Aims and scope Submit manuscript

Yanan Cao^1,2,
Xixun Lin¹,
Yongxuan Wu^1,2,
Fengzhao Shi^1,2,
Yanmin Shang^1,2,
Qingfeng Tan³,
Chuan Zhou^2,4 &
…
Peng Zhang³

341 Accesses
Explore all metrics

Abstract

Knowledge Graph Embedding (KGE) aims to learn representations of entities and relations of knowledge graph (KG). Recently Graph Neural Networks (GNNs) have gained great success on KGE, but for the reason behind it, most views simply attribute to the well learning of knowledge graph structure, which still remains a limited understanding of the internal mechanism. In this work, we first study a fundamental problem, i.e., what are the important factors for GNNs to help KGE. To investigate this problem, we discuss the core idea of current GNN models for KG, and propose a new assumption of relational homophily that connected nodes possess similar features after relation’s transforming, to explain why aggregating neighbors with relation can help KGE. Based on the model and empirical analyses, we then introduce a novel data-centric framework for applying GNNs to KGE called KSG-GNN. In KSG-GNN, we construct a new graph structure from KG named Knowledge Similarity Graph (KSG), where each node connects with its similar nodes as neighbors, and then we apply GNNs on this graph to perform KGE. Instead of following the relational homophily assumption in KG, KSG aligns with homogeneous graphs that can directly satisfy homophily assumption. Hence, any GNN developed on homogeneous graphs like GCN, GAT, GraphSAGE, etc., can be applied out-of-the-box as KSG-GNN without modification, which provides a more general and effective GNN paradigm. Finally, we conduct extensive experiments on two benchmark datasets, i.e., FB15k-237 and WN18RR, demonstrating the superior performance of KSG-GNN over multiple strong baselines. The source code is available at https://github.com/advancer99/WWWJ-KGE.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Heterogeneous Graph Neural Network with Hypernetworks for Knowledge Graph Embedding

GCL-KGE: Graph Contrastive Learning for Knowledge Graph Embedding

Entity-relation aggregation mechanism graph neural network for knowledge graph embedding

Article 28 November 2024

Data Availability

Data is provided within the manuscript.

Notes

The relational homophily can be achieved in two directions: h through r to t, or t through inverse r to h.
We also use the same formulation for (3), which is to verify neighbors can learn similar matching pattern with central node, that focuses differently with here.
The concrete details are provided in Experiments.
https://docs.dgl.ai/

References

Wang, Q., Mao, Z., Wang, B., Guo, L.: Knowledge graph embedding: A survey of approaches and applications. IEEE Trans. Knowl. Data Eng. 29(12), 2724–2743 (2017)
Article MATH Google Scholar
Arora, S.: A survey on graph neural networks for knowledge graph completion. CoRR. arXiv:2007.12374 (2020)
Bordes, A., Usunier, N., García-Durán, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: NIPS, pp. 2787–2795 (2013)
Lin, Y., Liu, Z., Sun, M., Liu, Y., Zhu, X.: Learning entity and relation embeddings for knowledge graph completion. In: AAAI, pp. 2181–2187 (2015)
Sun, Z., Deng, Z., Nie, J., Tang, J.: Rotate: Knowledge graph embedding by relational rotation in complex space. In: ICLR (Poster) (2019)
Nickel, M., Tresp, V., Kriegel, H.: A three-way model for collective learning on multi-relational data. In: ICML, pp. 809–816 (2011)
Yang, B., Yih, W., He, X., Gao, J., Deng, L.: Embedding entities and relations for learning and inference in knowledge bases. In: ICLR (Poster) (2015)
Trouillon, T., Welbl, J., Riedel, S., Gaussier, É., Bouchard, G.: Complex embeddings for simple link prediction. In: ICML. JMLR Workshop and Conference Proceedings, vol. 48, pp. 2071–2080 (2016)
Dettmers, T., Minervini, P., Stenetorp, P., Riedel, S.: Convolutional 2d knowledge graph embeddings. In: AAAI, pp. 1811–1818 (2018)
Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Philip, S.Y.: A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 32(1), 4–24 (2020)
Article MathSciNet MATH Google Scholar
Schlichtkrull, M.S., Kipf, T.N., Bloem, P., Berg, R., Titov, I., Welling, M.: Modeling relational data with graph convolutional networks. In: ESWC. Lecture Notes in Computer Science, vol. 10843, pp. 593–607 (2018)
Vashishth, S., Sanyal, S., Nitin, V., Talukdar, P.P.: Composition-based multi-relational graph convolutional networks. In: ICLR (2020)
Bansal, T., Juan, D., Ravi, S., McCallum, A.: A2N: attending to neighbors for knowledge graph inference. In: ACL (1), pp. 4387–4392 (2019)
Shang, C., Tang, Y., Huang, J., Bi, J., He, X., Zhou, B.: End-to-end structure-aware convolutional networks for knowledge base completion. In: AAAI, pp. 3060–3067 (2019)
McPherson, M., Smith-Lovin, L., Cook, J.M.: Birds of a feather: Homophily in social networks. Ann. Rev. Sociol. 27(1), 415–444 (2001)
Article MATH Google Scholar
Chien, E., Peng, J., Li, P., Milenkovic, O.: Adaptive universal generalized pagerank graph neural network. In: ICLR (2021)
Yang, L., Li, M., Liu, L., Wang, C., Cao, X., Guo, Y., et al.: Diverse message passing for attribute with heterophily. Adv. Neural Inf. Process. Syst. 34 (2021)
Zhu, J., Rossi, R.A., Rao, A., Mai, T., Lipka, N., Ahmed, N.K., Koutra, D.: Graph neural networks with heterophily. In: AAAI, pp. 11168–11176 (2021)
Bo, D., Wang, X., Shi, C., Shen, H.: Beyond low-frequency information in graph convolutional networks. In: AAAI, pp. 3950–3957 (2021)
Wang, Z., Zhang, J., Feng, J., Chen, Z.: Knowledge graph embedding by translating on hyperplanes. In: AAAI, pp. 1112–1119 (2014)
Vashishth, S., Sanyal, S., Nitin, V., Agrawal, N., Talukdar, P.P.: Interacte: Improving convolution-based knowledge graph embeddings by increasing feature interactions. In: AAAI, pp. 3009–3016 (2020)
Zha, D., Bhat, Z.P., Lai, K.-H., Yang, F., Jiang, Z., Zhong, S., Hu, X.: Data-centric artificial intelligence: A survey. arXiv:2303.10158 (2023)
Chen, J., Saad, Y., Zhang, Z.: Graph coarsening: from scientific computing to machine learning. SeMA J. 79(1), 187–223 (2022)
Article MathSciNet MATH Google Scholar
Cui, P., Wang, X., Pei, J., Zhu, W.: A survey on network embedding. IEEE Trans. Knowl. Data Eng. 31(5), 833–852 (2018)
Article MATH Google Scholar
Hamilton, W.L.: Graph Representation Learning (2020)
You, Y., Chen, T., Sui, Y., Chen, T., Wang, Z., Shen, Y.: Graph contrastive learning with augmentations. Adv. Neural Inf. Process. Syst. 33, 5812–5823 (2020)
MATH Google Scholar
Rong, Y., Huang, W., Xu, T., Huang, J.: Dropedge: Towards deep graph convolutional networks on node classification. arXiv:1907.10903 (2019)
Liu, N., Wang, X., Bo, D., Shi, C., Pei, J.: Revisiting graph contrastive learning from the perspective of graph spectrum. Adv. Neural Inf. Process. Syst. 35, 2972–2983 (2022)
MATH Google Scholar
Fung, W.S., Hariharan, R., Harvey, N.J., Panigrahi, D.: A general framework for graph sparsification. In: Proceedings of the Forty-third Annual ACM Symposium on Theory of Computing, pp. 71–80 (2011)
Srinivasa, R.S., Xiao, C., Glass, L., Romberg, J., Sun, J.: Fast graph attention networks using effective resistance based graph sparsification. arXiv:2006.08796 (2020)
You, J., Ying, R., Ren, X., Hamilton, W., Leskovec, J.: Graphrnn: Generating realistic graphs with deep auto-regressive models. In: International Conference on Machine Learning, pp. 5708–5717 (2018). PMLR
Luo, Y., Yan, K., Ji, S.: Graphdf: A discrete flow model for molecular graph generation. In: International Conference on Machine Learning, pp. 7192–7203 (2021). PMLR
Goyal, N., Jain, H.V., Ranu, S.: Graphgen: A scalable approach to domain-agnostic labeled graph generation. In: Proceedings of The Web Conference 2020, pp. 1253–1263 (2020)
Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. Adv. Neural Inf. Process. Syst. 30 (2017)
Chen, J., Ma, T., Xiao, C.: Fastgcn: fast learning with graph convolutional networks via importance sampling. arXiv:1801.10247 (2018)
Yoon, M., Gervet, T., Shi, B., Niu, S., He, Q., Yang, J.: Performance-adaptive sampling strategy towards fast and accurate graph neural networks. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 2046–2056 (2021)
Rusch, T.K., Bronstein, M.M., Mishra, S.: A survey on oversmoothing in graph neural networks. arXiv:2303.10993 (2023)
Wang, Y., Wang, W., Liang, Y., Cai, Y., Liu, J., Hooi, B.: Nodeaug: Semi-supervised node classification with data augmentation. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 207–217 (2020)
Verma, V., Lamb, A., Beckham, C., Najafi, A., Mitliagkas, I., Lopez-Paz, D., Bengio, Y.: Manifold mixup: Better representations by interpolating hidden states. In: International Conference on Machine Learning, pp. 6438–6447 (2019). PMLR
Yang, L., Zhang, L., Yang, W.: Graph adversarial self-supervised learning. Adv. Neural Inf. Process. Syst. 34, 14887–14899 (2021)
MATH Google Scholar
Li, J., Guo, R., Liu, C., Liu, H.: Adaptive unsupervised feature selection on attributed networks. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 92–100 (2019)
Huang, Q., Xia, T., Sun, H., Yamada, M., Chang, Y.: Unsupervised nonlinear feature selection from high-dimensional signed networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 4182–4189 (2020)
Jiang, B., Wang, B., Luo, B.: Sparse norm regularized attribute selection for graph neural networks. Pattern Recognit. 137, 109265 (2023)
Article MATH Google Scholar
Wang, Y., Wang, W., Liang, Y., Cai, Y., Hooi, B.: Mixup for node and graph classification. In: Proceedings of the Web Conference 2021, pp. 3663–3674 (2021)
Han, X., Jiang, Z., Liu, N., Hu, X.: G-mixup: Graph data augmentation for graph classification. In: International Conference on Machine Learning, pp. 8230–8248 (2022). PMLR
Navarro, M., Segarra, S.: Graphmad: Graph mixup for data augmentation using data-driven convex clustering. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5 (2023). IEEE
Li, Q., Han, Z., Wu, X.-M.: Deeper insights into graph convolutional networks for semi-supervised learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Sun, K., Lin, Z., Zhu, Z.: Multi-stage self-supervised learning for graph convolutional networks on graphs with few labeled nodes. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 5892–5899 (2020)
Li, Y., Yin, J., Chen, L.: Informative pseudo-labeling for graph neural networks with few labels. Data Min. Knowl. Discov. 37(1), 228–254 (2023)
Article MATH Google Scholar
Gao, L., Yang, H., Zhou, C., Wu, J., Pan, S., Hu, Y.: Active discriminative network representation learning. In: IJCAI International Joint Conference on Artificial Intelligence (2018)
Hu, S., Xiong, Z., Qu, M., Yuan, X., Côté, M.-A., Liu, Z., Tang, J.: Graph policy network for transferable active learning on graphs. Adv. Neural Inf. Process. Syst. 33, 10174–10185 (2020)
MATH Google Scholar
Zhang, Y., Tong, H., Xia, Y., Zhu, Y., Chi, Y., Ying, L.: Batch active learning with graph neural networks via multi-agent deep reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 9118–9126 (2022)
Hamilton, W.L., Ying, R., Leskovec, J.: Representation learning on graphs: Methods and applications. arXiv:1709.05584 (2017)
Toutanova, K., Chen, D.: Observed versus latent features for knowledge base and text inference. In: Proceedings of the 3rd Workshop on Continuous Vector Space Models and Their Compositionality, pp. 57–66 (2015)
Zhu, J., Yan, Y., Zhao, L., Heimann, M., Akoglu, L., Koutra, D.: Beyond homophily in graph neural networks: Current limitations and effective designs. In: NeurIPS (2020)
Yang, C., Bo, D., Liu, J., Peng, Y., Chen, B., Dai, H., Sun, A., Yu, Y., Xiao, Y., Zhang, Q., et al.: Data-centric graph learning: A survey. arXiv:2310.04987 (2023)
Choudhury, A., Sharma, S., Mitra, P., Sebastian, C., Naidu, S.S., Chelliah, M.: Simcat: an entity similarity measure for heterogeneous knowledge graph with categories. In: CODS, pp. 112–113 (2015)
Zhu, G., Iglesias, C.A.: Computing semantic similarity of concepts in knowledge graphs. IEEE Trans. Knowl. Data Eng. 29(1), 72–85 (2017)
Article MATH Google Scholar
Sun, H., Ren, R., Cai, H., Xu, B., Liu, Y., Li, T.: Topic model based knowledge graph for entity similarity measuring. In: ICEBE, pp. 94–101 (2018)
Harris, Z.S.: Distributional structure. Word 10(2–3), 146–162 (1954)
Article MATH Google Scholar
Sun, Z., Vashishth, S., Sanyal, S., Talukdar, P.P., Yang, Y.: A re-evaluation of knowledge graph completion methods. In: ACL, pp. 5516–5522 (2020)
Chao, L., He, J., Wang, T., Chu, W.: Pairre: Knowledge graph embeddings via paired relation vectors. In: ACL/IJCNLP (1), pp. 4360–4369 (2021)
Balazevic, I., Allen, C., Hospedales, T.M.: Tucker: Tensor factorization for knowledge graph completion. In: EMNLP/IJCNLP (1), pp. 5184–5193 (2019)
Peng, X., Chen, G., Lin, C., Stevenson, M.: Highly efficient knowledge graph embedding learning with orthogonal procrustes analysis. In: NAACL-HLT, pp. 2364–2375 (2021)
Nathani, D., Chauhan, J., Sharma, C., Kaul, M.: Learning attention-based embeddings for relation prediction in knowledge graphs. In: ACL (1), pp. 4710–4723 (2019)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR (Poster) (2017)
Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y.: Graph attention networks. In: ICLR (Poster) (2018)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv:1409.0473 (2014)
Hamilton, W.L., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: NIPS, pp. 1024–1034 (2017)
Xu, K., Hu, W., Leskovec, J., Jegelka, S.: How powerful are graph neural networks? In: ICLR (2019)
Weisfeiler, B., Leman, A.: The reduction of a graph to canonical form and the algebra which appears therein. nti, Series 2(9), 12–16 (1968)
MATH Google Scholar
Ying, C., Cai, T., Luo, S., Zheng, S., Ke, G., He, D., Shen, Y., Liu, T.-Y.: Do transformers really perform badly for graph representation? Adv. Neural Inf. Process. Syst. 34, 28877–28888 (2021)
MATH Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Balin, M.F., Çatalyürek, Ü.: Layer-neighbor sampling—defusing neighborhood explosion in gnns. Adv. Neural Inf. Process. Syst. 36 (2024)

Download references

Funding

This work is supported by the National Key Research and Development Program of China (NO.2022YFB3102200), the Strategic Priority Research Program of the Chinese Academy of Sciences (XDB0680101), and the National Natural Science Foundation of China (No.62402491, No.62472416 and No.62376064).

Author information

Authors and Affiliations

Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Yanan Cao, Xixun Lin, Yongxuan Wu, Fengzhao Shi & Yanmin Shang
School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
Yanan Cao, Yongxuan Wu, Fengzhao Shi, Yanmin Shang & Chuan Zhou
Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, China
Qingfeng Tan & Peng Zhang
Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
Chuan Zhou

Authors

Yanan Cao
View author publications
You can also search for this author in PubMed Google Scholar
Xixun Lin
View author publications
You can also search for this author in PubMed Google Scholar
Yongxuan Wu
View author publications
You can also search for this author in PubMed Google Scholar
Fengzhao Shi
View author publications
You can also search for this author in PubMed Google Scholar
Yanmin Shang
View author publications
You can also search for this author in PubMed Google Scholar
Qingfeng Tan
View author publications
You can also search for this author in PubMed Google Scholar
Chuan Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Peng Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Yanao Cao and Xixun Lin wrote the main manuscript text. Yongxuan Wu and Fengzhao Shi prepared for model implementation and experiments. Yanmin Shang prepared used figures and tables. Qingfeng Tan, Chuan Zhou and Peng Zhang reviewed the manuscript and provided the feedback for revision.

Corresponding author

Correspondence to Xixun Lin.

Ethics declarations

Competing Interests

The authors declare no competing interests.

Ethical Approval

This declaration is not applicable, because our manuscript does not contain any studies with human participants or animals.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Cao, Y., Lin, X., Wu, Y. et al. A data-centric framework of improving graph neural networks for knowledge graph embedding. World Wide Web 28, 2 (2025). https://doi.org/10.1007/s11280-024-01320-0

Download citation

Received: 16 August 2024
Revised: 06 November 2024
Accepted: 20 November 2024
Published: 27 November 2024
DOI: https://doi.org/10.1007/s11280-024-01320-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A data-centric framework of improving graph neural networks for knowledge graph embedding

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Heterogeneous Graph Neural Network with Hypernetworks for Knowledge Graph Embedding

GCL-KGE: Graph Contrastive Learning for Knowledge Graph Embedding

Entity-relation aggregation mechanism graph neural network for knowledge graph embedding

Data Availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing Interests

Ethical Approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A data-centric framework of improving graph neural networks for knowledge graph embedding

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Heterogeneous Graph Neural Network with Hypernetworks for Knowledge Graph Embedding

GCL-KGE: Graph Contrastive Learning for Knowledge Graph Embedding

Entity-relation aggregation mechanism graph neural network for knowledge graph embedding

Data Availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing Interests

Ethical Approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation