Skip to main content
Log in

AWML: adaptive weighted margin learning for knowledge graph embedding

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Knowledge representation learning (KRL), exploited by various applications such as question answering and information retrieval, aims to embed the entities and relations contained by the knowledge graph into points of a vector space such that the semantic and structure information of the graph is well preserved in the representing space. However, the previous works mainly learned the embedding representations by treating each entity and relation equally which tends to ignore the inherent imbalance and heterogeneous properties existing in knowledge graph. By visualizing the representation results obtained from classic algorithm TransE in detail, we reveal the disadvantages caused by this homogeneous learning strategy and gain insight of designing policy for the homogeneous representation learning. In this paper, we propose a novel margin-based pairwise representation learning framework to be incorporated into many KRL approaches, with the method of introducing adaptivity according to the degree of knowledge heterogeneity. More specially, an adaptive margin appropriate to separate the real samples from fake samples in the embedding space is first proposed based on the sample’s distribution density, and then an adaptive weight is suggested to explicitly address the trade-off between the different contributions coming from the real and fake samples respectively. The experiments show that our Adaptive Weighted Margin Learning (AWML) framework can help the previous work achieve a better performance on real-world Knowledge Graphs Freebase and WordNet in the tasks of both link prediction and triplet classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. We compute hptr and tphr to classify the relations into 4 types: 1-to-1, 1-to-MANY, MANY-to-1 and MANY-to-MANY, following Bordes et al. (2013). If the average number hptr or tphr is below 1.5 then the argument is labeled as 1 and MANY otherwise.

  2. The Knowledge Graph API lets us search Google Knowledge Graph for entities that match the constraints. This API is available at https://developers.google.com/knowledge-graph/.

  3. The X in CTransX can be replaced by E, R, etc., which refers to CTransE or CTransR respectively.

  4. It is in a sense of average that the \(\boldsymbol {\hat {r}}\) should be close to the r and that the \(\boldsymbol {\hat {r}^{\prime }}\) should be further away from r. And for the synthetics \(\boldsymbol {\hat {r}^{\prime }}\), being further away from r is relative and is compared to the goldens \(\boldsymbol {\hat {r}}\).

  5. We discover in our reproducing experiment that the original construction rule will make the KRL model perform poor in the classification task.

  6. Source code and datasets for reproducing the experiments presented in this paper are available online: https://github.com/orangegcc/AWML/

  7. We only normalize the relation embedding in the first epoch. This is the same as the work of Bordes et al. (2013).

  8. Note that our evaluation results of TransE, TransE(AdaGrad), TransR and CTransR, may be different from the original works. This is because the synthetic-triple replacement rule in the loss function (see (2)) differs a lot from each other. In our framework, the relation is considered additionally in the rule to make the KRL model appropriate also for triplet classification task not merely for the link prediction. What’s more, there exist some differences in the hyper-parameter settings between our framework and other works. We choose the best configuration of hyper-parameters in our experiments.

  9. Different from the formal evaluation metric HITS@10, we add the other two HITS@n to investigate the sensitivity of the performance to the HITS size.

  10. Note that, the models we compare contain CTransX(the baseline) and CTransX+AWL/AML, regardless of the original model TransX, including TransE, TransE(AG) and TransR. So all the evaluating results of TransX are not marked with bold font. Besides, to differentiate numerical values, we keep three decimal places for MeanRank_c of WN18 in the evaluation of Triplet Classification.

  11. Please note here that, the results of Raw setting differ greatly from other papers, this is derived from our modified ranking approach mentioned in the Evaluation protocol. When obtaining the f score for the test triplet with each candidate entity, we use more than one sub-relations to calculate the neighborhood score and choose the best one as the final sub-relation. So the correct head/tail will rank higher than that in the former evaluate method.

  12. We can discover from our evaluating results that “for CTransE model, AML is better than AWL for link prediction and AWL is better than AML for classification, but for some other models, it is contrary.” This is because Link Prediction tends to be performed well by those embeddings that satisfy the condition that the head h is close to the vector of tr, but Triplet Classification tends to be performed well by those embeddings that satisfy the condition that the relation r is being close to the vector of th. The thing worth mentioning is that in the sense of average, the above two conditions are not the sufficient and necessary between each other. Therefore, the performances on these two tasks are not absolutely the same. So for some KRL models, it can perform comparatively in one task but perform not so wonderfully in another.

  13. In Fig. 7, some synthetic triplets indeed spread around the relation embedding r after the CTransE model incorporated with our AWL framework. However, this phenomenon does not violate our expectation of the representation distribution, because it is in a sense of average and in a relative sense that the implicit vector of synthetic triplet should be further away from the relation embedding vector compared with the implicit vector of golden triplet. In the process of KRL training, in order to guarantee the total loss is low enough, the model tends to make a little of synthetic triples contrary to the above statement.

References

  • Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J. (2008). Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD 08 Proceedings of the 2008 ACM SIGMOD international conference on management of data (pp. 1247–1250).

  • Bordes, A., Glorot, X., Weston, J., Bengio, Y. (2012). Joint learning of words and meaning representations for open-text semantic parsing. International Conference on Artificial Intelligence & Statistics, 22, 127–135.

    Google Scholar 

  • Bordes, A., Glorot, X., Weston, J., Bengio, Y. (2014). A semantic matching energy function for learning with multi-relational data: application to word-sense disambiguation. Machine Learning, 94(2), 233–259.

    Article  MathSciNet  MATH  Google Scholar 

  • Bordes, A., Usunier, N., Weston, J., Yakhnenko, O. (2013). Translating embeddings for modeling multi-relational data. Advances in NIPS, 26, 2787–2795.

    Google Scholar 

  • Bordes, A., Weston, J., Collobert, R., Bengio, Y. (2009). Learning structured embeddings of knowledge bases. Aaai Conference on Artificial Intelligence, (Bengio), 301–306.

  • Boser, B.E., Guyon, I.M., Vapnik, V.N. (1992). A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on computational learning theory - COLT ’92 (pp. 144–152).

  • Duchi, J., Hazan, E., Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. The Journal of Machine Learning Research, 12(1532-4435), 2121–2159.

    MathSciNet  MATH  Google Scholar 

  • Ferrȧndez, A., Matė, A., Peral, J., Trujillo, J., De Gregorio, E., Aufaure, M.A. (2016). A framework for enriching data warehouse analysis with question answering systems. Journal of Intelligent Information Systems, 46(1), 61–82.

    Article  Google Scholar 

  • Han, X., Zhang, C., Guo, C. (2018). A generalization of recurrent neural networks for graph embedding. In Proceedings of the 22nd Pacific-Asia conference on knowledge discovery and data mining. Melbourne.

  • He, S., Liu, K., Ji, G., Zhao, J. (2015). Learning to represent knowledge graphs with gaussian embedding. In Proceedings of the 24th ACM international on conference on information and knowledge management - CIKM ’15 (pp. 623–632).

  • Jenatton, R., Bordes, A., Roux, N.L., Obozinski, G. (2012). A latent factor model for highly multi-relational data. Advances in Neural Information Processing Systems, 25, 3167–3175.

    Google Scholar 

  • Ji, G., He, S., Xu, L., Liu, K., Zhao, J. (2015). Knowledge graph embedding via dynamic mapping matrix. In Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (Volume 1: Long Papers, pp. 687–696).

  • Lin, Y., Liu, Z., Luan, H., Sun, M., Rao, S., Liu, S. (2015a). Modeling relation paths for representation learning of knowledge bases. In Proceedings of the 2015 conference on empirical methods in natural language processing (pp. 705–714). Stroudsburg: Association for Computational Linguistics.

  • Lin, Y., Liu, Z., Zhu, X., Zhu, X., Zhu, X. (2015b). Learning entity and relation embeddings for knowledge graph completion. In Twenty-Ninth AAAI conference on artificial intelligence (pp. 2181–2187).

  • Maaten, L.V.D., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research 1, 620(1), 267–284.

    MATH  Google Scholar 

  • Metzger, S., Schenkel, R., Sydow, M. (2017). QBEES: query-by-example entity search in semantic knowledge graphs based on maximal aspects, diversity-awareness and relaxation. Journal of Intelligent Information Systems, 49(3), 333–366.

    Article  Google Scholar 

  • Mikolov, T., Chen, K., Corrado, G., Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In NIPS, (pp. 1–9).

  • Miller, G.A. (1995). WordNet: a lexical database for English. Communications of the ACM, 38(11), 39–41.

    Article  Google Scholar 

  • Minervini, P., D’Amato, C., Fanizzi, N. (2016). Efficient energy-based embedding models for link prediction in knowledge graphs. Journal of Intelligent Information Systems, 47(1), 91–109.

    Article  Google Scholar 

  • Miyamoto, Y., & Cho, K. (2016). Gated word-character recurrent language model, 1992–1997.

  • Nickel, M., & Ring, O. (2012). Factorizing YAGO scalable machine learning for linked data. In Proceedings of the 21st international conference on World Wide Web (pp. 271–280).

  • Nickel, M., Tresp, V., Kriegel, H.-P. (2011). A three-way model for collective learning on multi-relational data. In ICML, (pp. 809–816).

  • Nickel, M., Rosasco, L., Poggio, T. (2015). Holographic embeddings of knowledge graphs. In Thirtieth Aaai conference on artificial intelligence.

  • Shi, B., & Weninger, T. (2017). ProjE: embedding projection for knowledge graph completion. In AAAI.

  • Socher, R., Chen, D., Manning, C., Chen, D., Ng, A. (2013). Reasoning with neural tensor networks for knowledge base completion. In Neural information processing systems 2003 (pp. 926-934).

  • Sutskever, I. (2009). Modelling relational data using Bayesian clustered tensor factorization. Nips, 22, 1–8.

    Google Scholar 

  • Wang, R., Cully, A., Chang, H.J., Demiris, Y. (2017). MAGAN: margin adaptation for generative adversarial networks.

  • Wang, Z., Zhang, J., Feng, J., Chen, Z. (2014). Knowledge graph embedding by translating on hyperplanes. In AAAI conference on artificial intelligence (pp. 1112–1119).

  • Weston, J., & Watkins, C. (1999). Support vector machines for multi-class pattern recognition. In Proceedings of the 7th European symposium on artificial neural networks (ESANN-99) (pp. 219–224).

  • Xiao, H., Huang, M., Hao, Y., Zhu, X. (2015). TransA: an adaptive approach for knowledge graph embedding. arXiv:1509.0.

  • Xiao, H., Huang, M., Yu, H., Zhu, X. (2016). TransG: a generative mixture model for knowledge graph embedding. In Proceedings of ACL (pp. 2316–2325).

  • Xie, R., Liu, Z., Jia, J., Luan, H., Sun, M. (2016). Representation learning of knowledge graphs with entity descriptions. Aaai, 2659–2665.

  • Yang, Z., Dhingra, B., Yuan, Y., Hu, J., Cohen, W.W., Salakhutdinov, R. (2016). Words or characters? Fine-grained gating for reading comprehension.

  • Zhang, C., Zhou, M., Han, X., Hu, Z., Ji, Y. (2017). Knowledge graph embedding for hyper-relational data. Tsinghua Science and Technology, 22(2), 185–197.

    Article  Google Scholar 

  • Zhao, F., Min, M.R., Shen, C., Chakraborty, A. (2017). Convolutional neural knowledge graph learning. arXiv:1710.0.

  • Zhou, M., Zhang, C., Han, X., Ji, Y., Hu, Z., Qiu, X. (2016). Knowledge graph completion for hyper-relational data. In Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) (Vol. 9784, pp. 236–246).

Download references

Acknowledgements

This research is supported by the National Natural Science Foundation of China under Grant No.61602048, No.61601046 and No.61520106007.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chunhong Zhang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guo, C., Zhang, C., Han, X. et al. AWML: adaptive weighted margin learning for knowledge graph embedding. J Intell Inf Syst 53, 167–197 (2019). https://doi.org/10.1007/s10844-018-0535-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-018-0535-2

Keywords

Navigation