Abstract
Recently, knowledge bases (KBs) have become more and more essential and helpful data source for various applications and researches. Although modern KBs have included thousands of millions of facts, they still suffer from incompleteness compared with the total amount of facts in real world. Furthermore, a lot of inaccurate and outdated facts may be contained in the KBs. Although there have been many studies dealing with incompleteness of the KBs, very few of works have taken into account detecting the errors in the KBs. Broadly speaking, there are three main challenges in detecting errors in the KBs. (1) Symbolic and logical form of the knowledge representations cannot detect the inconsistencies very well on large scale KBs. (2) It is hard to capture the correlations between relations. (3) There is no golden standard to learn or observe the patterns of inaccurate facts. In this work, we propose a Relation Sensitive Embedding Approach (RSEA) to detect the inconsistencies from KBs. We first design two correlation functions to measure the relatedness between two relations. Then, a dynamic cluster algorithm is presented to aggregate highly correlated relations into the same clusters. Finally, we encode discrete knowledge facts with effects of correlated relations into continuous vector space, which can effectively detect errors in KBs. We perform extensive experiments on two benchmark datasets, and the results show that our approach achieves high performance in detecting incorrect knowledge facts in these KBs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: OSDI, pp. 265–283 (2016)
Acosta, M., Zaveri, A., Simperl, E., Kontokostas, D., Auer, S., Lehmann, J.: Crowdsourcing linked data quality assessment. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8219, pp. 260–276. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41338-4_17
Bollacker, K.D., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD (2008)
Bordes, A., Glorot, X., Weston, J., Bengio, Y.: Joint learning of words and meaning representations for open-text semantic parsing. In: AISTATS, pp. 127–135 (2012)
Bordes, A., Glorot, X., Weston, J., Bengio, Y.: A semantic matching energy function for learning with multi-relational data - application to word-sense disambiguation. Mach. Learn. 94(2), 233–259 (2014)
Bordes, A., Usunier, N., GarcÃa-Durán, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: NIPS, pp. 2787–2795 (2013)
Bordes, A., Weston, J., Collobert, R., Bengio, Y.: Learning structured embeddings of knowledge bases. In: AAAI (2011)
Bouma, G.: Normalized (pointwise) mutual information in collocation extraction. In: Proceedings of the Biennial GSCL Conference, pp. 31–40 (2009)
Chu, X., et al.: KATARA: a data cleaning system powered by knowledge bases and crowdsourcing. In: SIGMOD (2015)
Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Comput. Linguist. 16(1), 22–29 (1990)
Deng, D., Jiang, Y., Li, G., Li, J., Yu, C.: Scalable column concept determination for web tables using large knowledge bases. PVLDB 6(13), 1606–1617 (2013)
Dettmers, T., Minervini, P., Stenetorp, P., Riedel, S.: Convolutional 2D knowledge graph embeddings. In: AAAI, pp. 1811–1818 (2018)
Dongo, I., Cardinale, Y., Chbeir, R.: RDF-F: RDF datatype inferring framework: towards better RDF document matching. Data Sci. Eng. 3(2), 115–135 (2018)
Fan, J., Lu, M., Ooi, B.C., Tan, W., Zhang, M.: A hybrid machine-crowdsourcing system for matching web tables. In: ICDE, pp. 976–987 (2014)
Goldberg, Y., Levy, O.: Word2vec explained: deriving Mikolov et al’.s negative-sampling word-embedding method. CoRR abs/1402.3722 (2014)
Hao, S., Tang, N., Li, G., He, J., Ta, N., Feng, J.: A novel cost-based model for data repairing. In: ICDE, pp. 49–50 (2017)
Hao, S., Tang, N., Li, G., Li, J.: Cleaning relations using knowledge bases. In: ICDE, pp. 933–944 (2017)
Ji, G., He, S., Xu, L., Liu, K., Zhao, J.: Knowledge graph embedding via dynamic mapping matrix. In: ACL, pp. 687–696 (2015)
Kim, S., Li, G., Feng, J., Li, K.: Web table understanding by collective inference. In: CIKM, pp. 217–226 (2018)
Lehmann, J., et al.: DBpedia - a large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web 6(2), 167–195 (2015)
Li, K., Li, G.: Approximate query processing: what is new and where to go? A survey on approximate query processing. Data Sci. Eng. 3(4), 379–397 (2018)
Lin, P., Song, Q., Wu, Y.: Fact checking in knowledge graphs with ontological subgraph patterns. Data Sci. Eng. 3(4), 341–358 (2018)
Lin, Y., Liu, Z., Sun, M., Liu, Y., Zhu, X.: Learning entity and relation embeddings for knowledge graph completion. In: AAAI, pp. 2181–2187 (2015)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119 (2013)
Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Nickel, M., Rosasco, L., Poggio, T.A.: Holographic embeddings of knowledge graphs. In: AAAI, pp. 1955–1961 (2016)
Socher, R., Chen, D., Manning, C.D., Ng, A.Y.: Reasoning with neural tensor networks for knowledge base completion. In: NIPS, pp. 926–934 (2013)
Socher, R., Huval, B., Manning, C.D., Ng, A.Y.: Semantic compositionality through recursive matrix-vector spaces. In: EMNLP-CoNLL, pp. 1201–1211 (2012)
Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge. In: WWW (2007)
Töpper, G., Knuth, M., Sack, H.: DBpedia ontology enrichment for inconsistency detection. In: 8th International Conference on Semantic Systems, I-SEMANTICS 2012, pp. 33–40 (2012)
Wang, Z., Zhang, J., Feng, J., Chen, Z.: Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, pp. 1112–1119 (2014)
Wang, Z., Li, J.: Text-enhanced representation learning for knowledge graph. In: ICAI, pp. 1293–1299 (2016)
Xiao, H., Huang, M., Zhu, X.: From one point to a manifold: knowledge graph embedding for precise link prediction. In: IJCAI, pp. 1315–1321 (2016)
Acknowledgement
This work was supported by the 973 Program of China (2015CB358700), NSF of China (61632016, 61521002, 61661166012), and TAL education.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Kim, S., Li, X., Li, K., Feng, J., Huang, Y., Yang, S. (2019). Knowledge Base Error Detection with Relation Sensitive Embedding. In: Li, G., Yang, J., Gama, J., Natwichai, J., Tong, Y. (eds) Database Systems for Advanced Applications. DASFAA 2019. Lecture Notes in Computer Science(), vol 11446. Springer, Cham. https://doi.org/10.1007/978-3-030-18576-3_43
Download citation
DOI: https://doi.org/10.1007/978-3-030-18576-3_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-18575-6
Online ISBN: 978-3-030-18576-3
eBook Packages: Computer ScienceComputer Science (R0)