Knowledge Base Error Detection with Relation Sensitive Embedding

Kim, San; Li, Xiuxing; Li, Kaiyu; Feng, Jianhua; Huang, Yan; Yang, Songfan

doi:10.1007/978-3-030-18576-3_43

San Kim¹⁹,
Xiuxing Li¹⁹,
Kaiyu Li¹⁹,
Jianhua Feng¹⁹,
Yan Huang²⁰ &
…
Songfan Yang²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11446))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

3570 Accesses

Abstract

Recently, knowledge bases (KBs) have become more and more essential and helpful data source for various applications and researches. Although modern KBs have included thousands of millions of facts, they still suffer from incompleteness compared with the total amount of facts in real world. Furthermore, a lot of inaccurate and outdated facts may be contained in the KBs. Although there have been many studies dealing with incompleteness of the KBs, very few of works have taken into account detecting the errors in the KBs. Broadly speaking, there are three main challenges in detecting errors in the KBs. (1) Symbolic and logical form of the knowledge representations cannot detect the inconsistencies very well on large scale KBs. (2) It is hard to capture the correlations between relations. (3) There is no golden standard to learn or observe the patterns of inaccurate facts. In this work, we propose a Relation Sensitive Embedding Approach (RSEA) to detect the inconsistencies from KBs. We first design two correlation functions to measure the relatedness between two relations. Then, a dynamic cluster algorithm is presented to aggregate highly correlated relations into the same clusters. Finally, we encode discrete knowledge facts with effects of correlated relations into continuous vector space, which can effectively detect errors in KBs. We perform extensive experiments on two benchmark datasets, and the results show that our approach achieves high performance in detecting incorrect knowledge facts in these KBs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: OSDI, pp. 265–283 (2016)
Google Scholar
Acosta, M., Zaveri, A., Simperl, E., Kontokostas, D., Auer, S., Lehmann, J.: Crowdsourcing linked data quality assessment. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8219, pp. 260–276. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41338-4_17
Chapter Google Scholar
Bollacker, K.D., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD (2008)
Google Scholar
Bordes, A., Glorot, X., Weston, J., Bengio, Y.: Joint learning of words and meaning representations for open-text semantic parsing. In: AISTATS, pp. 127–135 (2012)
Google Scholar
Bordes, A., Glorot, X., Weston, J., Bengio, Y.: A semantic matching energy function for learning with multi-relational data - application to word-sense disambiguation. Mach. Learn. 94(2), 233–259 (2014)
Article MathSciNet Google Scholar
Bordes, A., Usunier, N., García-Durán, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: NIPS, pp. 2787–2795 (2013)
Google Scholar
Bordes, A., Weston, J., Collobert, R., Bengio, Y.: Learning structured embeddings of knowledge bases. In: AAAI (2011)
Google Scholar
Bouma, G.: Normalized (pointwise) mutual information in collocation extraction. In: Proceedings of the Biennial GSCL Conference, pp. 31–40 (2009)
Google Scholar
Chu, X., et al.: KATARA: a data cleaning system powered by knowledge bases and crowdsourcing. In: SIGMOD (2015)
Google Scholar
Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Comput. Linguist. 16(1), 22–29 (1990)
Google Scholar
Deng, D., Jiang, Y., Li, G., Li, J., Yu, C.: Scalable column concept determination for web tables using large knowledge bases. PVLDB 6(13), 1606–1617 (2013)
Google Scholar
Dettmers, T., Minervini, P., Stenetorp, P., Riedel, S.: Convolutional 2D knowledge graph embeddings. In: AAAI, pp. 1811–1818 (2018)
Google Scholar
Dongo, I., Cardinale, Y., Chbeir, R.: RDF-F: RDF datatype inferring framework: towards better RDF document matching. Data Sci. Eng. 3(2), 115–135 (2018)
Article Google Scholar
Fan, J., Lu, M., Ooi, B.C., Tan, W., Zhang, M.: A hybrid machine-crowdsourcing system for matching web tables. In: ICDE, pp. 976–987 (2014)
Google Scholar
Goldberg, Y., Levy, O.: Word2vec explained: deriving Mikolov et al’.s negative-sampling word-embedding method. CoRR abs/1402.3722 (2014)
Google Scholar
Hao, S., Tang, N., Li, G., He, J., Ta, N., Feng, J.: A novel cost-based model for data repairing. In: ICDE, pp. 49–50 (2017)
Google Scholar
Hao, S., Tang, N., Li, G., Li, J.: Cleaning relations using knowledge bases. In: ICDE, pp. 933–944 (2017)
Google Scholar
Ji, G., He, S., Xu, L., Liu, K., Zhao, J.: Knowledge graph embedding via dynamic mapping matrix. In: ACL, pp. 687–696 (2015)
Google Scholar
Kim, S., Li, G., Feng, J., Li, K.: Web table understanding by collective inference. In: CIKM, pp. 217–226 (2018)
Google Scholar
Lehmann, J., et al.: DBpedia - a large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web 6(2), 167–195 (2015)
Google Scholar
Li, K., Li, G.: Approximate query processing: what is new and where to go? A survey on approximate query processing. Data Sci. Eng. 3(4), 379–397 (2018)
Article Google Scholar
Lin, P., Song, Q., Wu, Y.: Fact checking in knowledge graphs with ontological subgraph patterns. Data Sci. Eng. 3(4), 341–358 (2018)
Article Google Scholar
Lin, Y., Liu, Z., Sun, M., Liu, Y., Zhu, X.: Learning entity and relation embeddings for knowledge graph completion. In: AAAI, pp. 2181–2187 (2015)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119 (2013)
Google Scholar
Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Nickel, M., Rosasco, L., Poggio, T.A.: Holographic embeddings of knowledge graphs. In: AAAI, pp. 1955–1961 (2016)
Google Scholar
Socher, R., Chen, D., Manning, C.D., Ng, A.Y.: Reasoning with neural tensor networks for knowledge base completion. In: NIPS, pp. 926–934 (2013)
Google Scholar
Socher, R., Huval, B., Manning, C.D., Ng, A.Y.: Semantic compositionality through recursive matrix-vector spaces. In: EMNLP-CoNLL, pp. 1201–1211 (2012)
Google Scholar
Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge. In: WWW (2007)
Google Scholar
Töpper, G., Knuth, M., Sack, H.: DBpedia ontology enrichment for inconsistency detection. In: 8th International Conference on Semantic Systems, I-SEMANTICS 2012, pp. 33–40 (2012)
Google Scholar
Wang, Z., Zhang, J., Feng, J., Chen, Z.: Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, pp. 1112–1119 (2014)
Google Scholar
Wang, Z., Li, J.: Text-enhanced representation learning for knowledge graph. In: ICAI, pp. 1293–1299 (2016)
Google Scholar
Xiao, H., Huang, M., Zhu, X.: From one point to a manifold: knowledge graph embedding for precise link prediction. In: IJCAI, pp. 1315–1321 (2016)
Google Scholar

Download references

Acknowledgement

This work was supported by the 973 Program of China (2015CB358700), NSF of China (61632016, 61521002, 61661166012), and TAL education.

Author information

Authors and Affiliations

Tsinghua University, Beijing, China
San Kim, Xiuxing Li, Kaiyu Li & Jianhua Feng
TAL Education Group, Beijing, China
Yan Huang & Songfan Yang

Authors

San Kim
View author publications
You can also search for this author in PubMed Google Scholar
Xiuxing Li
View author publications
You can also search for this author in PubMed Google Scholar
Kaiyu Li
View author publications
You can also search for this author in PubMed Google Scholar
Jianhua Feng
View author publications
You can also search for this author in PubMed Google Scholar
Yan Huang
View author publications
You can also search for this author in PubMed Google Scholar
Songfan Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to San Kim .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Guoliang Li
Duke University, Durham, NC, USA
Jun Yang
University of Porto, Porto, Portugal
Joao Gama
Chiang Mai University, Chiang Mai, Thailand
Juggapong Natwichai
Beihang University, Beijing, China
Yongxin Tong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kim, S., Li, X., Li, K., Feng, J., Huang, Y., Yang, S. (2019). Knowledge Base Error Detection with Relation Sensitive Embedding. In: Li, G., Yang, J., Gama, J., Natwichai, J., Tong, Y. (eds) Database Systems for Advanced Applications. DASFAA 2019. Lecture Notes in Computer Science(), vol 11446. Springer, Cham. https://doi.org/10.1007/978-3-030-18576-3_43

Download citation

DOI: https://doi.org/10.1007/978-3-030-18576-3_43
Published: 24 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-18575-6
Online ISBN: 978-3-030-18576-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics