Abstract
Knowledge Graph Embedding (KGE) aims to learn representations of entities and relations of knowledge graph (KG). Recently Graph Neural Networks (GNNs) have gained great success on KGE, but for the reason behind it, most views simply attribute to the well learning of knowledge graph structure, which still remains a limited understanding of the internal mechanism. In this work, we first study a fundamental problem, i.e., what are the important factors for GNNs to help KGE. To investigate this problem, we discuss the core idea of current GNN models for KG, and propose a new assumption of relational homophily that connected nodes possess similar features after relation’s transforming, to explain why aggregating neighbors with relation can help KGE. Based on the model and empirical analyses, we then introduce a novel data-centric framework for applying GNNs to KGE called KSG-GNN. In KSG-GNN, we construct a new graph structure from KG named Knowledge Similarity Graph (KSG), where each node connects with its similar nodes as neighbors, and then we apply GNNs on this graph to perform KGE. Instead of following the relational homophily assumption in KG, KSG aligns with homogeneous graphs that can directly satisfy homophily assumption. Hence, any GNN developed on homogeneous graphs like GCN, GAT, GraphSAGE, etc., can be applied out-of-the-box as KSG-GNN without modification, which provides a more general and effective GNN paradigm. Finally, we conduct extensive experiments on two benchmark datasets, i.e., FB15k-237 and WN18RR, demonstrating the superior performance of KSG-GNN over multiple strong baselines. The source code is available at https://github.com/advancer99/WWWJ-KGE.






Similar content being viewed by others
Data Availability
Data is provided within the manuscript.
Notes
The relational homophily can be achieved in two directions: h through r to t, or t through inverse r to h.
We also use the same formulation for (3), which is to verify neighbors can learn similar matching pattern with central node, that focuses differently with here.
The concrete details are provided in Experiments.
References
Wang, Q., Mao, Z., Wang, B., Guo, L.: Knowledge graph embedding: A survey of approaches and applications. IEEE Trans. Knowl. Data Eng. 29(12), 2724–2743 (2017)
Arora, S.: A survey on graph neural networks for knowledge graph completion. CoRR. arXiv:2007.12374 (2020)
Bordes, A., Usunier, N., García-Durán, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: NIPS, pp. 2787–2795 (2013)
Lin, Y., Liu, Z., Sun, M., Liu, Y., Zhu, X.: Learning entity and relation embeddings for knowledge graph completion. In: AAAI, pp. 2181–2187 (2015)
Sun, Z., Deng, Z., Nie, J., Tang, J.: Rotate: Knowledge graph embedding by relational rotation in complex space. In: ICLR (Poster) (2019)
Nickel, M., Tresp, V., Kriegel, H.: A three-way model for collective learning on multi-relational data. In: ICML, pp. 809–816 (2011)
Yang, B., Yih, W., He, X., Gao, J., Deng, L.: Embedding entities and relations for learning and inference in knowledge bases. In: ICLR (Poster) (2015)
Trouillon, T., Welbl, J., Riedel, S., Gaussier, É., Bouchard, G.: Complex embeddings for simple link prediction. In: ICML. JMLR Workshop and Conference Proceedings, vol. 48, pp. 2071–2080 (2016)
Dettmers, T., Minervini, P., Stenetorp, P., Riedel, S.: Convolutional 2d knowledge graph embeddings. In: AAAI, pp. 1811–1818 (2018)
Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Philip, S.Y.: A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 32(1), 4–24 (2020)
Schlichtkrull, M.S., Kipf, T.N., Bloem, P., Berg, R., Titov, I., Welling, M.: Modeling relational data with graph convolutional networks. In: ESWC. Lecture Notes in Computer Science, vol. 10843, pp. 593–607 (2018)
Vashishth, S., Sanyal, S., Nitin, V., Talukdar, P.P.: Composition-based multi-relational graph convolutional networks. In: ICLR (2020)
Bansal, T., Juan, D., Ravi, S., McCallum, A.: A2N: attending to neighbors for knowledge graph inference. In: ACL (1), pp. 4387–4392 (2019)
Shang, C., Tang, Y., Huang, J., Bi, J., He, X., Zhou, B.: End-to-end structure-aware convolutional networks for knowledge base completion. In: AAAI, pp. 3060–3067 (2019)
McPherson, M., Smith-Lovin, L., Cook, J.M.: Birds of a feather: Homophily in social networks. Ann. Rev. Sociol. 27(1), 415–444 (2001)
Chien, E., Peng, J., Li, P., Milenkovic, O.: Adaptive universal generalized pagerank graph neural network. In: ICLR (2021)
Yang, L., Li, M., Liu, L., Wang, C., Cao, X., Guo, Y., et al.: Diverse message passing for attribute with heterophily. Adv. Neural Inf. Process. Syst. 34 (2021)
Zhu, J., Rossi, R.A., Rao, A., Mai, T., Lipka, N., Ahmed, N.K., Koutra, D.: Graph neural networks with heterophily. In: AAAI, pp. 11168–11176 (2021)
Bo, D., Wang, X., Shi, C., Shen, H.: Beyond low-frequency information in graph convolutional networks. In: AAAI, pp. 3950–3957 (2021)
Wang, Z., Zhang, J., Feng, J., Chen, Z.: Knowledge graph embedding by translating on hyperplanes. In: AAAI, pp. 1112–1119 (2014)
Vashishth, S., Sanyal, S., Nitin, V., Agrawal, N., Talukdar, P.P.: Interacte: Improving convolution-based knowledge graph embeddings by increasing feature interactions. In: AAAI, pp. 3009–3016 (2020)
Zha, D., Bhat, Z.P., Lai, K.-H., Yang, F., Jiang, Z., Zhong, S., Hu, X.: Data-centric artificial intelligence: A survey. arXiv:2303.10158 (2023)
Chen, J., Saad, Y., Zhang, Z.: Graph coarsening: from scientific computing to machine learning. SeMA J. 79(1), 187–223 (2022)
Cui, P., Wang, X., Pei, J., Zhu, W.: A survey on network embedding. IEEE Trans. Knowl. Data Eng. 31(5), 833–852 (2018)
Hamilton, W.L.: Graph Representation Learning (2020)
You, Y., Chen, T., Sui, Y., Chen, T., Wang, Z., Shen, Y.: Graph contrastive learning with augmentations. Adv. Neural Inf. Process. Syst. 33, 5812–5823 (2020)
Rong, Y., Huang, W., Xu, T., Huang, J.: Dropedge: Towards deep graph convolutional networks on node classification. arXiv:1907.10903 (2019)
Liu, N., Wang, X., Bo, D., Shi, C., Pei, J.: Revisiting graph contrastive learning from the perspective of graph spectrum. Adv. Neural Inf. Process. Syst. 35, 2972–2983 (2022)
Fung, W.S., Hariharan, R., Harvey, N.J., Panigrahi, D.: A general framework for graph sparsification. In: Proceedings of the Forty-third Annual ACM Symposium on Theory of Computing, pp. 71–80 (2011)
Srinivasa, R.S., Xiao, C., Glass, L., Romberg, J., Sun, J.: Fast graph attention networks using effective resistance based graph sparsification. arXiv:2006.08796 (2020)
You, J., Ying, R., Ren, X., Hamilton, W., Leskovec, J.: Graphrnn: Generating realistic graphs with deep auto-regressive models. In: International Conference on Machine Learning, pp. 5708–5717 (2018). PMLR
Luo, Y., Yan, K., Ji, S.: Graphdf: A discrete flow model for molecular graph generation. In: International Conference on Machine Learning, pp. 7192–7203 (2021). PMLR
Goyal, N., Jain, H.V., Ranu, S.: Graphgen: A scalable approach to domain-agnostic labeled graph generation. In: Proceedings of The Web Conference 2020, pp. 1253–1263 (2020)
Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. Adv. Neural Inf. Process. Syst. 30 (2017)
Chen, J., Ma, T., Xiao, C.: Fastgcn: fast learning with graph convolutional networks via importance sampling. arXiv:1801.10247 (2018)
Yoon, M., Gervet, T., Shi, B., Niu, S., He, Q., Yang, J.: Performance-adaptive sampling strategy towards fast and accurate graph neural networks. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 2046–2056 (2021)
Rusch, T.K., Bronstein, M.M., Mishra, S.: A survey on oversmoothing in graph neural networks. arXiv:2303.10993 (2023)
Wang, Y., Wang, W., Liang, Y., Cai, Y., Liu, J., Hooi, B.: Nodeaug: Semi-supervised node classification with data augmentation. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 207–217 (2020)
Verma, V., Lamb, A., Beckham, C., Najafi, A., Mitliagkas, I., Lopez-Paz, D., Bengio, Y.: Manifold mixup: Better representations by interpolating hidden states. In: International Conference on Machine Learning, pp. 6438–6447 (2019). PMLR
Yang, L., Zhang, L., Yang, W.: Graph adversarial self-supervised learning. Adv. Neural Inf. Process. Syst. 34, 14887–14899 (2021)
Li, J., Guo, R., Liu, C., Liu, H.: Adaptive unsupervised feature selection on attributed networks. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 92–100 (2019)
Huang, Q., Xia, T., Sun, H., Yamada, M., Chang, Y.: Unsupervised nonlinear feature selection from high-dimensional signed networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 4182–4189 (2020)
Jiang, B., Wang, B., Luo, B.: Sparse norm regularized attribute selection for graph neural networks. Pattern Recognit. 137, 109265 (2023)
Wang, Y., Wang, W., Liang, Y., Cai, Y., Hooi, B.: Mixup for node and graph classification. In: Proceedings of the Web Conference 2021, pp. 3663–3674 (2021)
Han, X., Jiang, Z., Liu, N., Hu, X.: G-mixup: Graph data augmentation for graph classification. In: International Conference on Machine Learning, pp. 8230–8248 (2022). PMLR
Navarro, M., Segarra, S.: Graphmad: Graph mixup for data augmentation using data-driven convex clustering. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5 (2023). IEEE
Li, Q., Han, Z., Wu, X.-M.: Deeper insights into graph convolutional networks for semi-supervised learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Sun, K., Lin, Z., Zhu, Z.: Multi-stage self-supervised learning for graph convolutional networks on graphs with few labeled nodes. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 5892–5899 (2020)
Li, Y., Yin, J., Chen, L.: Informative pseudo-labeling for graph neural networks with few labels. Data Min. Knowl. Discov. 37(1), 228–254 (2023)
Gao, L., Yang, H., Zhou, C., Wu, J., Pan, S., Hu, Y.: Active discriminative network representation learning. In: IJCAI International Joint Conference on Artificial Intelligence (2018)
Hu, S., Xiong, Z., Qu, M., Yuan, X., Côté, M.-A., Liu, Z., Tang, J.: Graph policy network for transferable active learning on graphs. Adv. Neural Inf. Process. Syst. 33, 10174–10185 (2020)
Zhang, Y., Tong, H., Xia, Y., Zhu, Y., Chi, Y., Ying, L.: Batch active learning with graph neural networks via multi-agent deep reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 9118–9126 (2022)
Hamilton, W.L., Ying, R., Leskovec, J.: Representation learning on graphs: Methods and applications. arXiv:1709.05584 (2017)
Toutanova, K., Chen, D.: Observed versus latent features for knowledge base and text inference. In: Proceedings of the 3rd Workshop on Continuous Vector Space Models and Their Compositionality, pp. 57–66 (2015)
Zhu, J., Yan, Y., Zhao, L., Heimann, M., Akoglu, L., Koutra, D.: Beyond homophily in graph neural networks: Current limitations and effective designs. In: NeurIPS (2020)
Yang, C., Bo, D., Liu, J., Peng, Y., Chen, B., Dai, H., Sun, A., Yu, Y., Xiao, Y., Zhang, Q., et al.: Data-centric graph learning: A survey. arXiv:2310.04987 (2023)
Choudhury, A., Sharma, S., Mitra, P., Sebastian, C., Naidu, S.S., Chelliah, M.: Simcat: an entity similarity measure for heterogeneous knowledge graph with categories. In: CODS, pp. 112–113 (2015)
Zhu, G., Iglesias, C.A.: Computing semantic similarity of concepts in knowledge graphs. IEEE Trans. Knowl. Data Eng. 29(1), 72–85 (2017)
Sun, H., Ren, R., Cai, H., Xu, B., Liu, Y., Li, T.: Topic model based knowledge graph for entity similarity measuring. In: ICEBE, pp. 94–101 (2018)
Harris, Z.S.: Distributional structure. Word 10(2–3), 146–162 (1954)
Sun, Z., Vashishth, S., Sanyal, S., Talukdar, P.P., Yang, Y.: A re-evaluation of knowledge graph completion methods. In: ACL, pp. 5516–5522 (2020)
Chao, L., He, J., Wang, T., Chu, W.: Pairre: Knowledge graph embeddings via paired relation vectors. In: ACL/IJCNLP (1), pp. 4360–4369 (2021)
Balazevic, I., Allen, C., Hospedales, T.M.: Tucker: Tensor factorization for knowledge graph completion. In: EMNLP/IJCNLP (1), pp. 5184–5193 (2019)
Peng, X., Chen, G., Lin, C., Stevenson, M.: Highly efficient knowledge graph embedding learning with orthogonal procrustes analysis. In: NAACL-HLT, pp. 2364–2375 (2021)
Nathani, D., Chauhan, J., Sharma, C., Kaul, M.: Learning attention-based embeddings for relation prediction in knowledge graphs. In: ACL (1), pp. 4710–4723 (2019)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR (Poster) (2017)
Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y.: Graph attention networks. In: ICLR (Poster) (2018)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv:1409.0473 (2014)
Hamilton, W.L., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: NIPS, pp. 1024–1034 (2017)
Xu, K., Hu, W., Leskovec, J., Jegelka, S.: How powerful are graph neural networks? In: ICLR (2019)
Weisfeiler, B., Leman, A.: The reduction of a graph to canonical form and the algebra which appears therein. nti, Series 2(9), 12–16 (1968)
Ying, C., Cai, T., Luo, S., Zheng, S., Ke, G., He, D., Shen, Y., Liu, T.-Y.: Do transformers really perform badly for graph representation? Adv. Neural Inf. Process. Syst. 34, 28877–28888 (2021)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Balin, M.F., Çatalyürek, Ü.: Layer-neighbor sampling—defusing neighborhood explosion in gnns. Adv. Neural Inf. Process. Syst. 36 (2024)
Funding
This work is supported by the National Key Research and Development Program of China (NO.2022YFB3102200), the Strategic Priority Research Program of the Chinese Academy of Sciences (XDB0680101), and the National Natural Science Foundation of China (No.62402491, No.62472416 and No.62376064).
Author information
Authors and Affiliations
Contributions
Yanao Cao and Xixun Lin wrote the main manuscript text. Yongxuan Wu and Fengzhao Shi prepared for model implementation and experiments. Yanmin Shang prepared used figures and tables. Qingfeng Tan, Chuan Zhou and Peng Zhang reviewed the manuscript and provided the feedback for revision.
Corresponding author
Ethics declarations
Competing Interests
The authors declare no competing interests.
Ethical Approval
This declaration is not applicable, because our manuscript does not contain any studies with human participants or animals.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Cao, Y., Lin, X., Wu, Y. et al. A data-centric framework of improving graph neural networks for knowledge graph embedding. World Wide Web 28, 2 (2025). https://doi.org/10.1007/s11280-024-01320-0
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11280-024-01320-0