Skip to main content

Advertisement

Log in

A data-centric framework of improving graph neural networks for knowledge graph embedding

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Knowledge Graph Embedding (KGE) aims to learn representations of entities and relations of knowledge graph (KG). Recently Graph Neural Networks (GNNs) have gained great success on KGE, but for the reason behind it, most views simply attribute to the well learning of knowledge graph structure, which still remains a limited understanding of the internal mechanism. In this work, we first study a fundamental problem, i.e., what are the important factors for GNNs to help KGE. To investigate this problem, we discuss the core idea of current GNN models for KG, and propose a new assumption of relational homophily that connected nodes possess similar features after relation’s transforming, to explain why aggregating neighbors with relation can help KGE. Based on the model and empirical analyses, we then introduce a novel data-centric framework for applying GNNs to KGE called KSG-GNN. In KSG-GNN, we construct a new graph structure from KG named Knowledge Similarity Graph (KSG), where each node connects with its similar nodes as neighbors, and then we apply GNNs on this graph to perform KGE. Instead of following the relational homophily assumption in KG, KSG aligns with homogeneous graphs that can directly satisfy homophily assumption. Hence, any GNN developed on homogeneous graphs like GCN, GAT, GraphSAGE, etc., can be applied out-of-the-box as KSG-GNN without modification, which provides a more general and effective GNN paradigm. Finally, we conduct extensive experiments on two benchmark datasets, i.e., FB15k-237 and WN18RR, demonstrating the superior performance of KSG-GNN over multiple strong baselines. The source code is available at https://github.com/advancer99/WWWJ-KGE.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data Availability

Data is provided within the manuscript.

Notes

  1. The relational homophily can be achieved in two directions: h through r to t, or t through inverse r to h.

  2. We also use the same formulation for (3), which is to verify neighbors can learn similar matching pattern with central node, that focuses differently with here.

  3. The concrete details are provided in Experiments.

  4. https://docs.dgl.ai/

References

  1. Wang, Q., Mao, Z., Wang, B., Guo, L.: Knowledge graph embedding: A survey of approaches and applications. IEEE Trans. Knowl. Data Eng. 29(12), 2724–2743 (2017)

    Article  MATH  Google Scholar 

  2. Arora, S.: A survey on graph neural networks for knowledge graph completion. CoRR. arXiv:2007.12374 (2020)

  3. Bordes, A., Usunier, N., García-Durán, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: NIPS, pp. 2787–2795 (2013)

  4. Lin, Y., Liu, Z., Sun, M., Liu, Y., Zhu, X.: Learning entity and relation embeddings for knowledge graph completion. In: AAAI, pp. 2181–2187 (2015)

  5. Sun, Z., Deng, Z., Nie, J., Tang, J.: Rotate: Knowledge graph embedding by relational rotation in complex space. In: ICLR (Poster) (2019)

  6. Nickel, M., Tresp, V., Kriegel, H.: A three-way model for collective learning on multi-relational data. In: ICML, pp. 809–816 (2011)

  7. Yang, B., Yih, W., He, X., Gao, J., Deng, L.: Embedding entities and relations for learning and inference in knowledge bases. In: ICLR (Poster) (2015)

  8. Trouillon, T., Welbl, J., Riedel, S., Gaussier, É., Bouchard, G.: Complex embeddings for simple link prediction. In: ICML. JMLR Workshop and Conference Proceedings, vol. 48, pp. 2071–2080 (2016)

  9. Dettmers, T., Minervini, P., Stenetorp, P., Riedel, S.: Convolutional 2d knowledge graph embeddings. In: AAAI, pp. 1811–1818 (2018)

  10. Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Philip, S.Y.: A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 32(1), 4–24 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  11. Schlichtkrull, M.S., Kipf, T.N., Bloem, P., Berg, R., Titov, I., Welling, M.: Modeling relational data with graph convolutional networks. In: ESWC. Lecture Notes in Computer Science, vol. 10843, pp. 593–607 (2018)

  12. Vashishth, S., Sanyal, S., Nitin, V., Talukdar, P.P.: Composition-based multi-relational graph convolutional networks. In: ICLR (2020)

  13. Bansal, T., Juan, D., Ravi, S., McCallum, A.: A2N: attending to neighbors for knowledge graph inference. In: ACL (1), pp. 4387–4392 (2019)

  14. Shang, C., Tang, Y., Huang, J., Bi, J., He, X., Zhou, B.: End-to-end structure-aware convolutional networks for knowledge base completion. In: AAAI, pp. 3060–3067 (2019)

  15. McPherson, M., Smith-Lovin, L., Cook, J.M.: Birds of a feather: Homophily in social networks. Ann. Rev. Sociol. 27(1), 415–444 (2001)

    Article  MATH  Google Scholar 

  16. Chien, E., Peng, J., Li, P., Milenkovic, O.: Adaptive universal generalized pagerank graph neural network. In: ICLR (2021)

  17. Yang, L., Li, M., Liu, L., Wang, C., Cao, X., Guo, Y., et al.: Diverse message passing for attribute with heterophily. Adv. Neural Inf. Process. Syst. 34 (2021)

  18. Zhu, J., Rossi, R.A., Rao, A., Mai, T., Lipka, N., Ahmed, N.K., Koutra, D.: Graph neural networks with heterophily. In: AAAI, pp. 11168–11176 (2021)

  19. Bo, D., Wang, X., Shi, C., Shen, H.: Beyond low-frequency information in graph convolutional networks. In: AAAI, pp. 3950–3957 (2021)

  20. Wang, Z., Zhang, J., Feng, J., Chen, Z.: Knowledge graph embedding by translating on hyperplanes. In: AAAI, pp. 1112–1119 (2014)

  21. Vashishth, S., Sanyal, S., Nitin, V., Agrawal, N., Talukdar, P.P.: Interacte: Improving convolution-based knowledge graph embeddings by increasing feature interactions. In: AAAI, pp. 3009–3016 (2020)

  22. Zha, D., Bhat, Z.P., Lai, K.-H., Yang, F., Jiang, Z., Zhong, S., Hu, X.: Data-centric artificial intelligence: A survey. arXiv:2303.10158 (2023)

  23. Chen, J., Saad, Y., Zhang, Z.: Graph coarsening: from scientific computing to machine learning. SeMA J. 79(1), 187–223 (2022)

    Article  MathSciNet  MATH  Google Scholar 

  24. Cui, P., Wang, X., Pei, J., Zhu, W.: A survey on network embedding. IEEE Trans. Knowl. Data Eng. 31(5), 833–852 (2018)

    Article  MATH  Google Scholar 

  25. Hamilton, W.L.: Graph Representation Learning (2020)

  26. You, Y., Chen, T., Sui, Y., Chen, T., Wang, Z., Shen, Y.: Graph contrastive learning with augmentations. Adv. Neural Inf. Process. Syst. 33, 5812–5823 (2020)

    MATH  Google Scholar 

  27. Rong, Y., Huang, W., Xu, T., Huang, J.: Dropedge: Towards deep graph convolutional networks on node classification. arXiv:1907.10903 (2019)

  28. Liu, N., Wang, X., Bo, D., Shi, C., Pei, J.: Revisiting graph contrastive learning from the perspective of graph spectrum. Adv. Neural Inf. Process. Syst. 35, 2972–2983 (2022)

    MATH  Google Scholar 

  29. Fung, W.S., Hariharan, R., Harvey, N.J., Panigrahi, D.: A general framework for graph sparsification. In: Proceedings of the Forty-third Annual ACM Symposium on Theory of Computing, pp. 71–80 (2011)

  30. Srinivasa, R.S., Xiao, C., Glass, L., Romberg, J., Sun, J.: Fast graph attention networks using effective resistance based graph sparsification. arXiv:2006.08796 (2020)

  31. You, J., Ying, R., Ren, X., Hamilton, W., Leskovec, J.: Graphrnn: Generating realistic graphs with deep auto-regressive models. In: International Conference on Machine Learning, pp. 5708–5717 (2018). PMLR

  32. Luo, Y., Yan, K., Ji, S.: Graphdf: A discrete flow model for molecular graph generation. In: International Conference on Machine Learning, pp. 7192–7203 (2021). PMLR

  33. Goyal, N., Jain, H.V., Ranu, S.: Graphgen: A scalable approach to domain-agnostic labeled graph generation. In: Proceedings of The Web Conference 2020, pp. 1253–1263 (2020)

  34. Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. Adv. Neural Inf. Process. Syst. 30 (2017)

  35. Chen, J., Ma, T., Xiao, C.: Fastgcn: fast learning with graph convolutional networks via importance sampling. arXiv:1801.10247 (2018)

  36. Yoon, M., Gervet, T., Shi, B., Niu, S., He, Q., Yang, J.: Performance-adaptive sampling strategy towards fast and accurate graph neural networks. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 2046–2056 (2021)

  37. Rusch, T.K., Bronstein, M.M., Mishra, S.: A survey on oversmoothing in graph neural networks. arXiv:2303.10993 (2023)

  38. Wang, Y., Wang, W., Liang, Y., Cai, Y., Liu, J., Hooi, B.: Nodeaug: Semi-supervised node classification with data augmentation. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 207–217 (2020)

  39. Verma, V., Lamb, A., Beckham, C., Najafi, A., Mitliagkas, I., Lopez-Paz, D., Bengio, Y.: Manifold mixup: Better representations by interpolating hidden states. In: International Conference on Machine Learning, pp. 6438–6447 (2019). PMLR

  40. Yang, L., Zhang, L., Yang, W.: Graph adversarial self-supervised learning. Adv. Neural Inf. Process. Syst. 34, 14887–14899 (2021)

    MATH  Google Scholar 

  41. Li, J., Guo, R., Liu, C., Liu, H.: Adaptive unsupervised feature selection on attributed networks. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 92–100 (2019)

  42. Huang, Q., Xia, T., Sun, H., Yamada, M., Chang, Y.: Unsupervised nonlinear feature selection from high-dimensional signed networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 4182–4189 (2020)

  43. Jiang, B., Wang, B., Luo, B.: Sparse norm regularized attribute selection for graph neural networks. Pattern Recognit. 137, 109265 (2023)

    Article  MATH  Google Scholar 

  44. Wang, Y., Wang, W., Liang, Y., Cai, Y., Hooi, B.: Mixup for node and graph classification. In: Proceedings of the Web Conference 2021, pp. 3663–3674 (2021)

  45. Han, X., Jiang, Z., Liu, N., Hu, X.: G-mixup: Graph data augmentation for graph classification. In: International Conference on Machine Learning, pp. 8230–8248 (2022). PMLR

  46. Navarro, M., Segarra, S.: Graphmad: Graph mixup for data augmentation using data-driven convex clustering. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5 (2023). IEEE

  47. Li, Q., Han, Z., Wu, X.-M.: Deeper insights into graph convolutional networks for semi-supervised learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

  48. Sun, K., Lin, Z., Zhu, Z.: Multi-stage self-supervised learning for graph convolutional networks on graphs with few labeled nodes. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 5892–5899 (2020)

  49. Li, Y., Yin, J., Chen, L.: Informative pseudo-labeling for graph neural networks with few labels. Data Min. Knowl. Discov. 37(1), 228–254 (2023)

    Article  MATH  Google Scholar 

  50. Gao, L., Yang, H., Zhou, C., Wu, J., Pan, S., Hu, Y.: Active discriminative network representation learning. In: IJCAI International Joint Conference on Artificial Intelligence (2018)

  51. Hu, S., Xiong, Z., Qu, M., Yuan, X., Côté, M.-A., Liu, Z., Tang, J.: Graph policy network for transferable active learning on graphs. Adv. Neural Inf. Process. Syst. 33, 10174–10185 (2020)

    MATH  Google Scholar 

  52. Zhang, Y., Tong, H., Xia, Y., Zhu, Y., Chi, Y., Ying, L.: Batch active learning with graph neural networks via multi-agent deep reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 9118–9126 (2022)

  53. Hamilton, W.L., Ying, R., Leskovec, J.: Representation learning on graphs: Methods and applications. arXiv:1709.05584 (2017)

  54. Toutanova, K., Chen, D.: Observed versus latent features for knowledge base and text inference. In: Proceedings of the 3rd Workshop on Continuous Vector Space Models and Their Compositionality, pp. 57–66 (2015)

  55. Zhu, J., Yan, Y., Zhao, L., Heimann, M., Akoglu, L., Koutra, D.: Beyond homophily in graph neural networks: Current limitations and effective designs. In: NeurIPS (2020)

  56. Yang, C., Bo, D., Liu, J., Peng, Y., Chen, B., Dai, H., Sun, A., Yu, Y., Xiao, Y., Zhang, Q., et al.: Data-centric graph learning: A survey. arXiv:2310.04987 (2023)

  57. Choudhury, A., Sharma, S., Mitra, P., Sebastian, C., Naidu, S.S., Chelliah, M.: Simcat: an entity similarity measure for heterogeneous knowledge graph with categories. In: CODS, pp. 112–113 (2015)

  58. Zhu, G., Iglesias, C.A.: Computing semantic similarity of concepts in knowledge graphs. IEEE Trans. Knowl. Data Eng. 29(1), 72–85 (2017)

    Article  MATH  Google Scholar 

  59. Sun, H., Ren, R., Cai, H., Xu, B., Liu, Y., Li, T.: Topic model based knowledge graph for entity similarity measuring. In: ICEBE, pp. 94–101 (2018)

  60. Harris, Z.S.: Distributional structure. Word 10(2–3), 146–162 (1954)

    Article  MATH  Google Scholar 

  61. Sun, Z., Vashishth, S., Sanyal, S., Talukdar, P.P., Yang, Y.: A re-evaluation of knowledge graph completion methods. In: ACL, pp. 5516–5522 (2020)

  62. Chao, L., He, J., Wang, T., Chu, W.: Pairre: Knowledge graph embeddings via paired relation vectors. In: ACL/IJCNLP (1), pp. 4360–4369 (2021)

  63. Balazevic, I., Allen, C., Hospedales, T.M.: Tucker: Tensor factorization for knowledge graph completion. In: EMNLP/IJCNLP (1), pp. 5184–5193 (2019)

  64. Peng, X., Chen, G., Lin, C., Stevenson, M.: Highly efficient knowledge graph embedding learning with orthogonal procrustes analysis. In: NAACL-HLT, pp. 2364–2375 (2021)

  65. Nathani, D., Chauhan, J., Sharma, C., Kaul, M.: Learning attention-based embeddings for relation prediction in knowledge graphs. In: ACL (1), pp. 4710–4723 (2019)

  66. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR (Poster) (2017)

  67. Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y.: Graph attention networks. In: ICLR (Poster) (2018)

  68. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv:1409.0473 (2014)

  69. Hamilton, W.L., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: NIPS, pp. 1024–1034 (2017)

  70. Xu, K., Hu, W., Leskovec, J., Jegelka, S.: How powerful are graph neural networks? In: ICLR (2019)

  71. Weisfeiler, B., Leman, A.: The reduction of a graph to canonical form and the algebra which appears therein. nti, Series 2(9), 12–16 (1968)

    MATH  Google Scholar 

  72. Ying, C., Cai, T., Luo, S., Zheng, S., Ke, G., He, D., Shen, Y., Liu, T.-Y.: Do transformers really perform badly for graph representation? Adv. Neural Inf. Process. Syst. 34, 28877–28888 (2021)

    MATH  Google Scholar 

  73. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)

  74. Balin, M.F., Çatalyürek, Ü.: Layer-neighbor sampling—defusing neighborhood explosion in gnns. Adv. Neural Inf. Process. Syst. 36 (2024)

Download references

Funding

This work is supported by the National Key Research and Development Program of China (NO.2022YFB3102200), the Strategic Priority Research Program of the Chinese Academy of Sciences (XDB0680101), and the National Natural Science Foundation of China (No.62402491, No.62472416 and No.62376064).

Author information

Authors and Affiliations

Authors

Contributions

Yanao Cao and Xixun Lin wrote the main manuscript text. Yongxuan Wu and Fengzhao Shi prepared for model implementation and experiments. Yanmin Shang prepared used figures and tables. Qingfeng Tan, Chuan Zhou and Peng Zhang reviewed the manuscript and provided the feedback for revision.

Corresponding author

Correspondence to Xixun Lin.

Ethics declarations

Competing Interests

The authors declare no competing interests.

Ethical Approval

This declaration is not applicable, because our manuscript does not contain any studies with human participants or animals.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cao, Y., Lin, X., Wu, Y. et al. A data-centric framework of improving graph neural networks for knowledge graph embedding. World Wide Web 28, 2 (2025). https://doi.org/10.1007/s11280-024-01320-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11280-024-01320-0

Keywords

Navigation