Abstract
The way of obtaining the embeddings of the knowledge graph objects through modeling with binary classification method from the level of triple structure is coarser in granularity for the existing knowledge representation learning models based on the probability, and the space-time efficiency of negative sampling is lower for the most of the knowledge representation learning models at present. To solve these problems, this paper proposes a knowledge representation learning model KRL_Match, which carries out the knowledge graph objects matching centered on a certain kind of knowledge graph objects (head entity, tail entity, relation), and executes multi-classification learning to determine the true matching and dynamic implicit negative sampling. Specifically, first, we make two classes of the knowledge graph objects of target and source in the same kind of knowledge graph objects matched mutually by their matrix multiplication operation in a knowledge graph batch sample space, which is constructed by random sampling from the universe set of the knowledge graph instance, and the knowledge graph objects matching sample spaces will be implicitly generated meanwhile; then, we measure the matching degree of each matching of the knowledge graph objects by softmax regression multi-classification method in each implicit sample space; finally, we fit the real probability with the matching degree by optimizing the cross-entropy loss based on the local closed world assumption. We conduct the knowledge graph objects matching for the knowledge representation learning inspired by the attention mechanism and firstly create the dynamic implicit negative sampling method in the knowledge representation learning. Experiments show that the KRL_Match model has achieved better performances compared with the baselines: Hits@10 (filter) has increased by 12.2% and 6.1% on benchmarks FB15K and FB15K237 respectively for the entity prediction task, and accuracy has increased by 12.6% on benchmark FB13 for the triple classification task. In addition, space-time efficiency test indicates that the negative sampling of KRL_Match is less 7395.59s and half in time and the storage space separately than TransE’s on benchmark FB15K (BS = 12000).










Similar content being viewed by others
Notes
Some pictures are quoted from: https://www.imdb.com/, https://www.1905.com/.
To solve the problem of non-normalized parametric probability density estimation, the basic idea of NCE [43] is to transform the density estimation problem to a binary classification problem and distinguish the samples with the data distribution from the samples with the known noise distribution.
Specifically, such as FB15K, WN18, etc. See Sect. 4 for more information.
We also call the triple as a kind of the knowledge graph object, but the knowledge graph object generally refers to entity and relation in this paper.
It is the operation of their embeddings actually. We adopt a simple addition and subtraction operation referring to TransE [18] in this paper. Others are the same.
In this paper, each of the multiple negative samples needs to participate in the determination of the threshold.
References
Lin Y, Han X, Xie R, Liu Z, Sun M (2018) Knowledge representation learning: a quantitative review. arXiv preprint arXiv:1812.10901, pp 1–57
Wang Q, Mao Z, Wang B, Guo L (2017) Knowledge graph embedding: a survey of approaches and applications. IEEE Trans Knowl Data Eng (TKDE) 29:2724–2743. https://doi.org/10.1109/TKDE.2017.2754499
Ji S, Pan S, Cambria E, Marttinen P, Yu PS (2021) A survey on knowledge graphs: representation, acquisition, and applications. IEEE Trans Neural Netw Learn Syst (TNNLS). https://doi.org/10.1109/TNNLS.2021.3070843
Chen X, Jia S, Xiang Y (2020) A review: Knowledge reasoning over knowledge graph. Expert Syst Appl 141:112948.1-1129948.21. https://doi.org/10.1016/j.eswa.2019.112948
Nguyen HL, Vu DT, Jung JJ (2020) Knowledge graph fusion for smart systems: a survey. Inf Fus 61:56–70
Cui H, Peng T, Feng L, Bao T, Liu L (2021) Simple question answering over knowledge graph enhanced by question pattern classification. Knowl Inf Syst. https://doi.org/10.1007/s10115-021-01609-w
Bengio Y, Senecal J-S (2008) Adaptive importance sampling to accelerate training of a neural probabilistic language model. IEEE Trans Neural Netw 19(4):713–722
Kotnis B, Nastase V (2018) Analysis of the impact of negative sampling on link prediction in knowledge graphs. arXiv preprint arXiv:1708.06816v2
Rossi A, Barbosa D, Firmani D, Matinata A, Merialdo P (2021) Knowledge graph embedding for link prediction: a comparative analysis. ACM Trans Knowl Discov Data (TKDD) 15(2):1–49
Wang Z, Zhang J, Feng J, Chen Z (2014) Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), vol 28, no 1
Cai L, Wang WY (2018) KBGAN: Adversarial learning for knowledge graph embeddings. In: Proceedings of the 2018 conference of the north american chapter of the association for computational linguistics: human language technologies, Volume 1 (Long Papers). Association for Computational Linguistics, New Orleans, Louisiana, pp 1470–1480. https://doi.org/10.18653/v1/N18-1133. https://aclanthology.org/N18-1133
Chaudhari S, Polatkan G, Ramanath R, Mithal V (2019) An attentive survey of attention models. arXiv preprint arXiv:1904.02874
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems (NIPS). Curran Associates, Inc., pp 5998–6008. http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf
Brauwers G, Frasincar F (2021) A general survey on attention mechanisms in deep learning. IEEE Trans Knowl Data Eng 11(15):1–20. https://doi.org/10.1109/TKDE.2021.3126456
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: 3rd international conference on learning representations (ICLR), pp 1–15
Dong X, Gabrilovich E, Heitz G, Horn W, Lao N, Murphy K, Strohmann T, Sun S, Zhang W (2014) Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: ACM SIGKDD conference on knowledge discovery and data mining (KDD), KDD2014, Association for Computing Machinery, New York, NY, USA, pp 601–610. https://doi.org/10.1145/2623330.2623623
Wang Z, Zhang J, Feng J, Chen Z (2014) Knowledge graph and text jointly embedding. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, pp 1591–1601. https://doi.org/10.3115/v1/D14-1167
Zhong H, Zhang J, Wang Z, Wan H, Chen Z (2015) Aligning knowledge and text embeddings by entity descriptions. In: Proceedings of the 2015 conference on empirical methods in natural language processing. Association for Computational Linguistics, Lisbon, Portugal, pp 267–272. https://doi.org/10.18653/v1/D15-1031
He S, Liu K, Ji G, Zhao J (2015) Learning to represent knowledge graphs with Gaussian embedding. In: Proceedings of the 24th ACM international on conference on information and knowledge management, CIKM ’15. Association for Computing Machinery, New York, NY, USA, pp 623–632. https://doi.org/10.1145/2806416.2806502
Xiao H, Huang M, Zhu X (2016) TransG: a generative model for knowledge graph embedding. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, pp 2316–2325. https://doi.org/10.18653/v1/P16-1219
Bordes A, Usunier N, Garcia-Durán A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. In: Advances in neural information processing systems (NIPS), Vol. 26. Curran Associates, Inc., pp 2787–2795. https://proceedings.neurips.cc/paper/2013/file/1cecc7a77928ca8133fa24680a88d2f9-Paper.pdf
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: 1st international conference on learning representations (ICLR) (2013)
Mikolov T (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems (NIPS), vol 26, pp 3111–3119
Goldberg Y, Levy O (2014) word2vec explained: deriving Mikolov et al.’s negative-sampling word-embedding method. arXiv:1402.3722 (2014)
Toutanova K, Chen D, Pantel P, Poon H, Choudhury P, Gamon M (2015) Representing text for joint embedding of text and knowledge bases. In: Proceedings of the 2015 conference on empirical methods in natural language processing. Association for Computational Linguistics, Lisbon, Portugal, pp 1499–1509. https://doi.org/10.18653/v1/D15-1174
Toutanova K, Chen D (2015) Observed versus latent features for knowledge base and text inference. In: Proceedings of the 3rd workshop on continuous vector space models and their compositionality. Association for Computational Linguistics, Beijing, China, pp 57–66. https://doi.org/10.18653/v1/W15-4007
Trouillon T, Dance C, Gaussier É, Welbl J, Riedel S, Bouchard G (2017) Knowledge graph completion via complex tensor factorization. J Mach Learn Res 18:130:1-130:38
Trouillon T, Welbl J, Riedel S, Gaussier É, Bouchard G (2016) Complex embeddings for simple link prediction. In: International conference on machine learning (ICML), PMLR, pp 2071–2080
Lin Y, Liu Z, Sun M (2016) Knowledge representation learning with entities, attributes and relations. In: International joint conference on artificial intelligence (IJCAI), vol 1, pp 41–52
Fan M, Zhou Q, Zheng T, Grishman R (2017) Distributed representation learning for knowledge graphs with entity descriptions. Pattern Recognit Lett 93:31–37
Dettmers T, Minervini P, Stenetorp P, Riedel S (2018) Convolutional 2D knowledge graph embeddings. In: The association for the advancement of artificial intelligence (AAAI), pp 1811–1818. https://aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17366
Chen X, Chen M, Shi W, Sun Y, Zaniolo C, Embedding uncertain knowledge graphs. In: The association for the advancement of artificial intelligence (AAAI), vol. 33, pp 3363–3370. https://doi.org/10.1609/aaai.v33i01.33013363
Guan S, Jin X, Wang Y, Jia Y, Shen H, Li Z, Cheng X (2018) Self-learning and embedding based entity alignment. Knowl Inf Syst 59(2):361–386
Li L, Wang P, Wang Y, Wang S, Yan J, Jiang J, Tang B, Wang C, Liu Y (2020) A method to learn embedding of a probabilistic medical knowledge graph: algorithm development. JMIR Med Inform 8(5):e17645–e17645
Fan M, Zhou Q, Abel A, Zheng T, Grishman R (2016) Probabilistic belief embedding for large-scale knowledge population. Cogn Comput 8:1087–1102
Fan M, Feng Q, Abel A, Zheng T, Grishman R (2015) Probabilistic belief embedding for knowledge base completion. arXiv:1505.02433 (2015)
Gong F, Wang M, Wang H, Wang S, Liu M (2021) SMR: Medical knowledge graph embedding for safe medicine recommendation. Big Data Res 23:100174
Yang B, Yih W-t, He X, Gao J, Deng L (2015) Embedding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575
Ji G, He S, Xu L, Liu K, Zhao J (2015) Knowledge graph embedding via dynamic mapping matrix. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing, vol 1, pp 687–696. https://doi.org/10.3115/v1/p15-1067
Gutmann M, Hyvärinen A (2010) Noise-contrastive estimation: a new estimation principle for unnormalized statistical models. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics (AISTATS), pp 297–304
Gutmann M, Hyvärinen A (2012) Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. J Mach Learn Res 13:307–361
van den Oord A, Li Y, Vinyals O (2018) Representation learning with contrastive predictive coding. arXiv:1807.03748 (2018)
Mnih A, Teh Y (2012) A fast and simple algorithm for training neural probabilistic language models. In: International conference on machine learning (ICML), pp 1–8
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge
Kingma DP, Ba J (2017) Adam: A method for stochastic optimization, pp 1–15. arXiv preprint arXiv:1412.6980v9
Sagi O, Rokach L (2018) Ensemble learning: a survey, Wiley Interdisciplinary Reviews. Data Min Knowl Disc 8(4):1–18
Drumond L, Rendle S, Schmidt-Thieme L (2012) Predicting RDF triples in incomplete knowledge bases with tensor factorization. In: Proceedings of the 27th annual ACM symposium on applied computing, SAC ’12. Association for Computing Machinery, New York, NY, USA, pp 326–331. https://doi.org/10.1145/2245276.2245341
Chami I, Wolf A, Juan D-C, Sala F, Ravi S, Ré C (2020) Low-dimensional hyperbolic knowledge graph embeddings. In: Proceedings of the 58th annual meeting of the association for computational linguistics, association for computational linguistics, pp 6901–6914. https://doi.org/10.18653/v1/2020.acl-main.617.
Sun Z, Chen M, Hu W, Wang C, Dai J, Zhang W (2020) Knowledge association with hyperbolic knowledge graph embeddings. In: Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP). Association for Computational Linguistics, pp 5704–5716. https://doi.org/10.18653/v1/2020.emnlp-main.460
Sun Z, Deng ZH, Nie JY, Tang J (2019) Rotate: knowledge graph embedding by relational rotation in complex space. In: International conference on learning representations (ICLR)
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China under Grant No. 62172061; National Key R &D Program of China under Grant Nos. 2020YFB1711800 and 2020YFB1707900; the Science and Technology Project of Sichuan Province under Grant Nos. 2021GFW019, 2021YFG0152, 2021YFG0025, 2020YFG0479, 2020YFG0322, 2020GFW035, 2020GFW033, and the R &D Project of Chengdu City under Grant No. 2019-YF05-01790-GX.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendices
1.1 A. Proof
Proof:
Sufficiency:
\( \because max\left\{ p\left( {\left\{ o_{k1} \rightarrow o_{k2}^{\prime } \right\} }^{ overall} \right) \right\} = p\left( {o_{k1} \rightarrow o_{k2}^{\prime } \vert }_{k2=*} \right) \),
\( \therefore \forall \left\{ {o_{k1} \rightarrow o_{k2}^{\prime } \vert }_{k2=?} \right\} \in {\left\{ o_{k1} \rightarrow o_{k2}^{\prime } \right\} }^{ overall}, p\left( {o_{k1} \rightarrow o_{k2}^{\prime } \vert }_{k2=*} \right) > p\left( {o_{k1} \rightarrow o_{k2}^{\prime } \vert }_{k2=?} \right) \);
\( \because \forall S_{ batch}, \forall {\left\{ o_{k1} \rightarrow o_{k2}^{\prime } \right\} }^{ batch}, {\left\{ o_{k1} \rightarrow o_{k2}^{\prime } \right\} }^{ batch} \subset {\left\{ o_{k1} \rightarrow o_{k2}^{\prime } \right\} }^{ overall} \),
\( \therefore \forall \left\{ {o_{k1} \rightarrow o_{k2}^{\prime } \vert }_{k2=?} \right\} \in {\left\{ o_{k1} \rightarrow o_{k2}^{\prime } \right\} }^{ batch}, p\left( {o_{k1} \rightarrow o_{k2}^{\prime } \vert }_{k2=*} \right) > p\left( {o_{k1} \rightarrow o_{k2}^{\prime } \vert }_{k2=?} \right) \).
Sufficiency is easily proved.
Necessity:
\( \because \forall S_{ batch}, max\left\{ p\left( {\left\{ o_{k1} \rightarrow o_{k2}^{\prime } \right\} }^{ batch} \right) \right\} = p\left( {o_{k1} \rightarrow o_{k2}^{\prime } \vert }_{k2=*} \right) \),
\( \therefore \forall \left\{ {o_{k1} \rightarrow o_{k2}^{\prime } \vert }_{k2=?} \right\} \in {\left\{ o_{k1} \rightarrow o_{k2}^{\prime } \right\} }^{batch,0}, p\left( {o_{k1} \rightarrow o_{k2}^{\prime } \vert }_{k2=*} \right) > p\left( {o_{k1} \rightarrow o_{k2}^{\prime } \vert }_{k2=?} \right) \),
\( \forall \left\{ {o_{k1} \rightarrow o_{k2}^{\prime } \vert }_{k2=?} \right\} \in {\left\{ o_{k1} \rightarrow o_{k2}^{\prime } \right\} }^{batch,1}, p\left( {o_{k1} \rightarrow o_{k2}^{\prime } \vert }_{k2=*} \right) > p\left( {o_{k1} \rightarrow o_{k2}^{\prime } \vert }_{k2=?} \right) \),
\( \cdots \),
\( \forall \left\{ {o_{k1} \rightarrow o_{k2}^{\prime } \vert }_{k2=?} \right\} \in {\left\{ o_{k1} \rightarrow o_{k2}^{\prime } \right\} }^{batch,BS-1}, p\left( {o_{k1} \rightarrow o_{k2}^{\prime } \vert }_{k2=*} \right) > p\left( {o_{k1} \rightarrow o_{k2}^{\prime } \vert }_{k2=?} \right) \),
They can be concluded that:
\( \forall \left\{ {o_{k1} \rightarrow o_{k2}^{\prime } \vert }_{k2=?} \right\} \in {\left\{ o_{k1} \rightarrow o_{k2}^{\prime } \right\} }^{batch,0} \cup {\left\{ o_{k1} \rightarrow o_{k2}^{\prime } \right\} }^{batch,1} \cup \cdots \cup {\left\{ o_{k1} \rightarrow o_{k2}^{\prime } \right\} }^{batch,BS-1} = {\cup }_{k4}^{BS-1}{\left\{ o_{k1} \rightarrow o_{k2}^{\prime } \right\} }_{k4}^{ batch} \),
namely,
\( \forall \left\{ {o_{k1} \rightarrow o_{k2}^{\prime } \vert }_{k2=?} \right\} \in {\left\{ o_{k1} \rightarrow o_{k2}^{\prime } \right\} }^{ overall} \), then there are: \( p\left( {o_{k1} \rightarrow o_{k2}^{\prime } \vert }_{k2=*} \right) > p\left( {o_{k1} \rightarrow }{ o_{k2}^{\prime } \vert }_{k2=?} \right) \),
therefore,
\( max\left\{ p\left( {\left\{ o_{k1} \rightarrow o_{k2}^{\prime } \right\} }^{ overall} \right) \right\} = p\left( {o_{k1} \rightarrow o_{k2}^{\prime } \vert }_{k2=*} \right) \).
The necessity is proved.
1.2 B. Comparison of probability knowledge representation learning models
See Table 17.
Notes:
-
1.
element-wise vector product.
-
2.
replacing h or t in triple only.
-
3.
f(\(\cdot \)): nonlinear function, e.g., RLU(rectified linear units); \({\bar{h}},{\bar{r}}\):2D form of h, r;[,]: concatenating; \(*\):convolution operator \(\omega \): filter; vec(\(\cdot \)): reshaping as a vector;W:linear transformation matrix; replaing t with T(embeddings of multiple entities) when score involving multiple triples.
-
4.
\(h_p,t_p\): projection vectors of head and tail entity.
-
5.
\(\lambda \): scaling coefficient.
-
6.
\(\textbf{p,m}\): embedding of a patient(p), a medicine(m) or a disease(d).
-
7.
M: a set of medicines.
-
8.
N: size of a knowledge graph instance.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Suo, X., Guo, B., Shen, Y. et al. KRL_Match: knowledge graph objects matching for knowledge representation learning. Knowl Inf Syst 65, 641–681 (2023). https://doi.org/10.1007/s10115-022-01764-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-022-01764-8