Abstract
An open knowledge base (OKB) is a repository of facts, which are typically represented in the form of \(\langle \)subject; relation; object\(\rangle \) triples. The problem of canonicalizing OKB triples is to map different names mentioned in the triples that refer to the same entity into a basic canonical form. We propose the algorithm Multi-Level Canonicalization with Embeddings (MULCE) to perform canonicalization. MULCE executes in two steps. The first step performs word-level canonicalization to coarsely group subject names based on their GloVe vectors into semantically similar clusters. The second step performs sentence-level canonicalization to refine the clusters by employing BERT embedding to model relation and object information. Our experimental results show that MULCE outperforms state-of-the-art methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
Using BERT-Base Uncased Model: https://github.com/google-research/bert.
- 4.
References
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52
Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. IJCAI 7, 2670–2676 (2007)
Benjelloun, O., Garcia-Molina, H., Menestrina, D., Su, Q., Whang, S.E., Widom, J.: Swoosh: a generic approach to entity resolution. VLDB 18(1), 255–276 (2009)
Bhattacharya, I., Getoor, L.: A latent Dirichlet model for unsupervised entity resolution. In: ICDM, pp. 47–58. SIAM (2006)
Bollacker, K., et al.: Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD, pp. 1247–1250. ACM (2008)
Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka, E.R., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: AAAI (2010)
Corro, L.D., Gemulla, R.: Clausie: clause-based open information extraction. In: WWW, pp. 355–366. ACM (2013)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL. ACL (2019)
Dredze, M., McNamee, P., Rao, D., Gerber, A., Finin, T.: Entity disambiguation for knowledge base population. In: COLING, pp. 277–285. ACL (2010)
Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: EMNLP, pp. 1535–1545. ACL (2011)
Fader, A., Zettlemoyer, L., Etzioni, O.: Open question answering over curated and extracted knowledge bases. In: KDD, pp. 1156–1165. ACM (2014)
Gabrilovich, E., Ringgaard, M., Subramanya, A.: FACC1: freebase annotation of clueweb corpora (2013). http://lemurproject.org/clueweb09/FACC1/
Galárraga, L.A., Heitz, G., Murphy, K., Suchanek, F.M.: Canonicalizing open knowledge bases. In: CIKM, pp. 1679–1688. ACM (2014)
Hachey, B., Radford, W., Nothman, J., Honnibal, M., Curran, J.R.: Evaluating entity linking with Wikipedia. Artif. Intell. 194, 130–150 (2013)
Krishnamurthy, J., Mitchell, T.M.: Which noun phrases denote which concepts? In: HLT, vol. 1, pp. 570–580. ACL (2011)
Lin, T., Etzioni, O.: Entity linking at web scale. In: Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction, pp. 84–88. ACL (2012)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS (2013)
Nickel, M., Rosasco, L., Poggio, T.: Holographic embeddings of knowledge graphs. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)
Page, L., Brin, S., Motwani, R., Winograd, T.: The page rank citation ranking: bringing order to the web. Technical report, Stanford InfoLab (1999)
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014)
Ratinov, L., Roth, D., Downey, D., Anderson, M.: Local and global algorithms for disambiguation to Wikipedia. In: HLT, vol. 1, pp. 1375–1384. ACL (2011)
Schmitz, M., Bart, R., Soderland, S., Etzioni, O., et al.: Open language learning for information extraction. In: EMNLP-CoNLL, pp. 523–534. ACL, July 2012
Shen, W., Wang, J., Han, J.: Entity linking with a knowledge base: issues, techniques, and solutions. TKDE 27(2), 443–460 (2015)
Shen, Wei, et al.: Shine+: A general framework for domain-specific entity linking with heterogeneous information networks. TKDE 30(2), 353–366 (2018)
Spitkovsky, V.I., Chang, A.X.: A cross-lingual dictionary for English Wikipedia concepts. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (2012)
Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge. In: WWW, pp. 697–706. ACM (2007)
Vashishth, S., Jain, P., Talukdar, P.: CESI: canonicalizing open knowledge bases using embeddings and side information. In: WWW, pp. 1317–1327. IW3C2 (2018)
Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledge base (2014)
Wu, G., He, Y., Hu, X.: Entity linking: an issue to extract corresponding entity with knowledge base. IEEE Access 6, 6220–6231 (2018)
Yates, A.P., Etzioni, O.: Unsupervised methods for determining object and relation synonyms on the web. JAIR (2009)
Yin, P., Duan, N., Kao, B., Bao, J., Zhou, M.: Answering questions with complex semantic constraints on open knowledge bases. In: CIKM (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Wu, TH., Kao, B., Wu, Z., Feng, X., Song, Q., Chen, C. (2020). MULCE: Multi-level Canonicalization with Embeddings of Open Knowledge Bases. In: Huang, Z., Beek, W., Wang, H., Zhou, R., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2020. WISE 2020. Lecture Notes in Computer Science(), vol 12342. Springer, Cham. https://doi.org/10.1007/978-3-030-62005-9_23
Download citation
DOI: https://doi.org/10.1007/978-3-030-62005-9_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-62004-2
Online ISBN: 978-3-030-62005-9
eBook Packages: Computer ScienceComputer Science (R0)