Abstract
Knowledge Graph Embedding (KGE) is a powerful technique for mining knowledge from knowledge graphs. Negative sampling plays a critical role in KGE training and significantly impacts the performance of KGE models. Negative sampling methods typically preserve a pair of Entity-Relation (ER) in each positive triple and replace the other entity with negative entities selected randomly from the entity set to create a consistent number of negative samples. However, the distribution of ER pairs is often long-tailed, making it problematic to assign the same number of negative samples to each ER pair, which is overlooked in most related works. This paper investigates the impact of assigning the same number of negative samples to ER pairs during training and demonstrates that this approach impedes the training from reaching the optimal solution in the negative sampling loss function and undermines the objective of the trained model. To address this issue, we propose a novel ER distribution-aware negative sampling method that can adaptively assign a varying number of negative samples to each ER pair based on its distribution characteristics. Furthermore, our proposed method also mitigates the issue of introducing false negative samples commonly found in many negative sampling methods. Our approach is founded on theoretical analysis and practical considerations and can be applied to most KGE models. We validate the effectiveness of our proposed method by testing it on conventional KGE and Neural Network-based KGE models. Our experimental results outperform most state-of-the-art negative sampling methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In the implementation, observed positive entities are filtered out. However, since the total number of entities \(|\mathcal {E}|\) is significantly larger than the frequency \(\#(e,r)\) of ER pairs, for the sake of simplification in theoretical analysis, we set \({p_n({q}|(e,r))}=\frac{1}{|\mathcal {E}|}\).
References
Ahrabian, K., Feizi, A., Salehi, Y., Hamilton, W.L., Bose, A.J.: Structure aware negative sampling in knowledge graphs. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). pp. 6093–6101 (2020)
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. Advances in neural information processing systems 26 (2013)
Cai, L., Wang, W.Y.: Kbgan: Adversarial learning for knowledge graph embeddings. CoRR (2017)
Chen, X., Zhou, Y., Wu, D., Zhang, W., Zhou, Y., Li, B., Wang, W.: Imagine by reasoning: A reasoning-based implicit semantic data augmentation for long-tailed classification. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 36, pp. 356–364 (2022)
Dettmers, T., Minervini, P., Stenetorp, P., Riedel, S.: Convolutional 2d knowledge graph embeddings. In: Proceedings of the AAAI conference on artificial intelligence. vol. 32 (2018)
Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural computation 10(7), 1895–1923 (1998)
Guo, G., Ouyang, S., Yuan, F., Wang, X.: Approximating word ranking and negative sampling for word embedding. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. p. 4092–4098 (2018)
Ji, G., He, S., Xu, L., Liu, K., Zhao, J.: Knowledge graph embedding via dynamic mapping matrix. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (volume 1: Long papers). pp. 687–696 (2015)
Kamigaito, H., Hayashi, K.: Unified interpretation of softmax cross-entropy and negative sampling: With case study for knowledge graph embedding. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). pp. 5517–5531 (2021)
Kamigaito, H., Hayashi, K.: Comprehensive analysis of negative sampling in knowledge graph representation learning. In: International Conference on Machine Learning. pp. 10661–10675. PMLR (2022)
Kazemi, S.M., Poole, D.: Simple embedding for link prediction in knowledge graphs. Advances in neural information processing systems 31 (2018)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. Technical report (2014)
Levy, O., Goldberg, Y.: Neural word embedding as implicit matrix factorization. Advances in neural information processing systems 27 (2014)
Li, Z., Ji, J., Fu, Z., Ge, Y., Xu, S., Chen, C., Zhang, Y.: Efficient non-sampling knowledge graph embedding. In: Proceedings of the Web Conference 2021. pp. 1727–1736 (2021)
Lin, Y., Liu, Z., Sun, M., Liu, Y., Zhu, X.: Learning entity and relation embeddings for knowledge graph completion. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, January 25–30, 2015, Austin, Texas, USA. pp. 2181–2187 (2015)
Mahdisoltani, F., Biega, J., Suchanek, F.: Yago3: A knowledge base from multilingual wikipedias. In: 7th biennial conference on innovative data systems research. CIDR Conference (2014)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems 26 (2013)
Nguyen, D.Q., Nguyen, T.D., Nguyen, D.Q., Phung, D.: A novel embedding model for knowledge base completion based on convolutional neural network. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). pp. 327–333 (2018)
Schlichtkrull, M., Kipf, T.N., Bloem, P., Berg, R.v.d., Titov, I., Welling, M.: Modeling relational data with graph convolutional networks. In: European semantic web conference. pp. 593–607 (2018)
Shang, C., Tang, Y., Huang, J., Bi, J., He, X., Zhou, B.: End-to-end structure-aware convolutional networks for knowledge base completion. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 33, pp. 3060–3067 (2019)
Sun, Z., Deng, Z.H., Nie, J.Y., Tang, J.: Rotate: Knowledge graph embedding by relational rotation in complex space. In: International Conference on Learning Representations (2019)
Toutanova, K., Chen, D.: Observed versus latent features for knowledge base and text inference. In: Proceedings of the 3rd workshop on continuous vector space models and their compositionality. pp. 57–66 (2015)
Trouillon, T., Welbl, J., Riedel, S., Gaussier, É., Bouchard, G.: Complex embeddings for simple link prediction. In: International conference on machine learning. pp. 2071–2080. PMLR (2016)
Vashishth, S., Sanyal, S., Nitin, V., Agrawal, N., Talukdar, P.: Interacte: Improving convolution-based knowledge graph embeddings by increasing feature interactions. In: Proceedings of the AAAI conference on artificial intelligence. vol. 34, pp. 3009–3016 (2020)
Wang, M., Qiu, L., Wang, X.: A survey on knowledge graph embeddings for link prediction. Symmetry 13(3), 485 (2021)
Wang, P., Li, S., et al.: Incorporating gan for negative sampling in knowledge representation learning. CoRR (2018)
Wang, Q., Mao, Z., Wang, B., Guo, L.: Knowledge graph embedding: A survey of approaches and applications. IEEE Transactions on Knowledge and Data Engineering 29(12), 2724–2743 (2017)
Wang, Y., Ruffinelli, D., Gemulla, R., Broscheit, S., Meilicke, C.: On evaluating embedding models for knowledge base completion. Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019) (2019)
Wang, Z., Zhang, J., Feng, J., Chen, Z.: Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the AAAI conference on artificial intelligence. vol. 28 (2014)
Yang, B., Yih, W., He, X., Gao, J., Deng, L.: Embedding entities and relations for learning and inference in knowledge bases. In: 3rd International Conference on Learning Representations (ICLR) (2015)
Yang, Z., Ding, M., Zhou, C., Yang, H., Zhou, J., Tang, J.: Understanding negative sampling in graph representation learning. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining. pp. 1666–1676 (2020)
Zhang, N., Deng, S., Sun, Z., Wang, G., Chen, X., Zhang, W., Chen, H.: Long-tail relation extraction via knowledge graph embeddings and graph convolution networks. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). pp. 3016–3025 (2019)
Zhang, W., Paudel, B., Wang, L., Chen, J., Zhu, H., Zhang, W., Bernstein, A., Chen, H.: Iteratively learning embeddings and rules for knowledge graph reasoning. In: The World Wide Web Conference. pp. 2366–2377 (2019)
Zhang, Y., Yao, Q., Shao, Y., Chen, L.: Nscaching: simple and efficient negative sampling for knowledge graph embedding. In: 2019 IEEE 35th International Conference on Data Engineering (ICDE). pp. 614–625 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Yao, N., Liu, Q., Yang, Y., Li, W., Bai, Q. (2023). Entity-Relation Distribution-Aware Negative Sampling for Knowledge Graph Embedding. In: Payne, T.R., et al. The Semantic Web – ISWC 2023. ISWC 2023. Lecture Notes in Computer Science, vol 14265. Springer, Cham. https://doi.org/10.1007/978-3-031-47240-4_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-47240-4_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47239-8
Online ISBN: 978-3-031-47240-4
eBook Packages: Computer ScienceComputer Science (R0)