Abstract
A knowledge base of triples like (subject entity, predicate relation,object entity) is a very important resource for knowledge management. It is very useful for human-like reasoning, query expansion, question answering (Siri) and other related AI tasks. However, such a knowledge base often suffers from incompleteness due to a large volume of increasing knowledge in the real world and a lack of reasoning capability. In this paper, we propose a Pairwise-interaction Differentiated Embeddings model to embed entities and relations in the knowledge base to low dimensional vector representations and then predict the possible truth of additional facts to extend the knowledge base. In addition, we present a probability-based objective function to improve the model optimization. Finally, we evaluate the model by considering the problem of computing how likely the additional triple is true for the task of knowledge base completion. Experiments on WordNet and Freebase show the excellent performance of our model and algorithm.



Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
google.com/insidesearch/features/search/knowledge.html, 10-05-2015.
dbpedia.org, 10-05-2015.
geneontology.org, 10-05-2015.
Such as (subject entity, object entity), (subject entity, predicate relation) and (object entity, predicate relation).
For simplicity, we use subject refer to subject entity, predicate refer to predicate relation and object refer to object entity in the next.
Total order is a binary relation (here denoted by \(\ge \)) which is antisymmetric, transitive and total.
\(\forall o_1, o_2 \in E: o_1 \ge _{s,p} o_2 \wedge o_2 \ge _{s,p} o_1 \Rightarrow o_1 = o_2\) (antisymmetry).
\(\forall o_1, o_2, o_3 \in E: o_1 \ge _{s,p} o_2 \wedge o_2 \ge _{s,p} o_3 \Rightarrow o_1 \ge _{s,p} o_3\) (transitivity).
\(\forall o_1,o_2 \in E: o_1 \ne o_2 \Rightarrow o_1 \ge _{s,p} o_2 \vee o_2 \ge _{s,p} o_1\) (totality).
\(f_1,f_2,f_3\) denote the pairwise-interaction functions.
We do not replace both subject entity and object entity with random one at the same time.
\([x]_+\) denotes the positive part of x (i.e. \([x]_+:=max\{0,x\}\)).
The entities of WordNet are denoted by the concatenation of a word, its POS tag and a digital number. The number refers to its sense. E.g. “_payment_NN_1” encodes the first meaning of the noun “payment”.
References
Angeli G, Manning CD (2013) Philosophers are mortal: inferring the truth of unseen facts. In: Proceeding of the 2013 Conference on Computational Natural Language Learning, Sofia, Bulgaria, pp 133–142
Berant J, Chou A, Frostig R, Liang P (2013) Semantic parsing on Freebase from question-answer pairs. In: Proceeding of the 2013 Conference on Empirical Methods in Natural Language Processing, pp 1533–1544
Berant J, Liang P (2014) Semantic parsing via paraphrasing. In: Proceeding of the 2014 Annual Meeting of the Association for Computational Linguistics, pp 1415–1425
Bollacker K, Evans C, Paritosh P, Sturge T, Taylor J (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceeding of the 2008 International Conference on Management of Data, Vancouver, BC, Canada, pp 1247–1250
Bordes A, Weston J, Collobert R, Bengio Y (2011) Learning structured embeddings of knowledge bases. In: Proceeding of the 25th Annual Conference on Artificial Intelligence, San Francisco, USA, pp 301–306
Bordes A, Glorot X, Weston J, Bengio Y (2012) Joint learning of words and meaning representations for open-text semantic parsing. In: Proceeding of 2012 International Conference on Artificial Intelligence and Statistics, pp 127–135
Bordes A, Glorot X, Weston J, Bengio Y (2013a) A semantic matching energy function for learning with multi-relational data. Mach Learn 94(2):233–259
Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O (2013b) Translating embeddings for modeling multi-relational data. Proc Adv Neural Inf Process Syst 26:2787–2795
Bordes A, Chopra S, Weston J (2014) Question answering with subgraph embeddings. In: Proceeding of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, pp 615–620
Castells P, Fernandez M, Vallet D (2007) An adaptation of the vector-space model for ontology-based information retrieval. IEEE Trans Knowl Data Eng 19(2):261–272
Fader A, Soderland S, Etzioni O (2011) Identifying relations for open information extraction. In: Proceeding of the 2011 Conference on Empirical Methods in Natural Language Processing, pp 1535–1545
Fader A, Zettlemoyer L, Etzioni O (2014) Open question answering over curated and extracted knowledge bases. In: Proceeding of the 2014 International Conference on Knowledge Discovery and Data Mining, pp 1156–1165
Graupmann J, Schenkel R, Weikum G (2005) The SphereSearch engine for unified ranked retrieval of heterogeneous XML and web documents. In: Proceeding of the 2005 International Conference on Very Large Data Bases, pp 529–540
Huang EH, Socher R, Manning CD, Ng AY (2012) Improving word representations via global context and multiple word prototypes. In: Proceeding of the 2012 Annual Meeting of the Association for Computational Linguistics, pp 873–882
Jenatton R, Roux NL, Bordes A, Obozinski GR (2012) A latent factor model for highly multi-relational data. Proc Adv Neural Inf Process Syst 25:3167–3175
Miller GA (1995) WordNet: a lexical database for English. Commun ACM 38(11):39–41
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceeding of Advances in Neural Information Processing Systems 26:3111–3119
Ng V, Cardie C (2002) Improving machine learning approaches to coreference resolution. In: Proceeding of the 2002 Annual Meeting of the Association for Computational Linguistics, pp 104–111
Rendle S, Marinho LB, Nanopoulos A, Schmidt-Thieme L (2009) Learning optimal ranking with tensor factorization for tag recommendation. In: Proceeding of the 2009 International Conference on Knowledge Discovery and Data Mining, pp 727–736
Robbins H, Monro S (1951) A stochastic approximation method. Ann Math Stat 22:400–407
Snow R, Jurafsky D, Ng AY (2005) Learning syntactic patterns for automatic hypernym discovery. In: Proceeding of Advances in Neural Information Processing Systems 17, MIT Press, Cambridge, MA, pp 1297–1304
Socher R, Chen D, Manning CD, Ng AY (2013) Reasoning with neural tensor networks for knowledge base completion. Proc Adv Neural Inf Process Syst 26:926–934
Suchanek FM, Kasneci G, Weikum G (2007) Yago: a core of semantic knowledge. In: Proceeding of the 2007 International Conference on World Wide Web, pp 697–706
Sutskever I, Salakhutdinov R, Tenenbaum J (2009) Modelling relational data using bayesian clustered tensor factorization. In: Proceeding of Advances in Neural Information Processing Systems 22:1821–1828
Vallet D, Fernandez M, Castells P (2005) An ontology-based information retrieval model. In: The Semantic Web: Research and Applications. Springer, Berlin Heidelberg, pp 455–470
Wang Z, Zhang J, Feng J, Chen Z (2014) Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the 28th AAAI Conference on Artificial Intelligence, pp 1112–1119
Weston J, Bordes A, Yakhnenko O, Usunier N (2013) Connecting language and knowledge bases with embedding models for relation extraction. In: Proceeding of 2013 Conference on Empirical Methods in Natural Language Processing, pp 1366–1371
Yao X, Durme BV (2014) Information extraction over structured data: Question answering with freebase. In: Proceeding of the 2014 Annual Meeting of the Association for Computational Linguistics, Baltimore, Maryland, USA, pp 956–966
Acknowledgments
This work was supported by the Natural Science Foundation of China under Grant No. 61300080, No. 61273217, the 111 Project under Grant No. B08004 and FP7 MobileCloud Project under Grant No. 612212. The authors are partially supported by the Key project of China Ministry of Education under Grant No. MCM20130310, Huawei’s Innovation Research Program and Postgraduate Innovation Fund of SICE, BUPT, 2015. We are thankful to the anonymous reviewers of DMKD whose comments helped us improving this work.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Responsible editors: Joao Gama, Indre Zliobaite, Alipio Jorge, Concha Bielza.
Rights and permissions
About this article
Cite this article
Zhao, Y., Gao, S., Gallinari, P. et al. Knowledge base completion by learning pairwise-interaction differentiated embeddings. Data Min Knowl Disc 29, 1486–1504 (2015). https://doi.org/10.1007/s10618-015-0430-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-015-0430-1