skip to main content
10.1145/3589334.3645700acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Jointly Canonicalizing and Linking Open Knowledge Base via Unified Embedding Learning

Published: 13 May 2024 Publication History

Abstract

Recent years have witnessed increasing attention on the semantic knowledge integration between curated knowledge bases (CKBs) and open knowledge bases (OKBs), which is non-trivial due to the intrinsically heterogeneous features involved in CKBs and OKBs. OKB canonicalization and OKB linking are regarded as two vital tasks to achieve the knowledge integration. Although these two tasks are inherently complementary with each other, previous studies just solve them separately or via superficial interaction. To address this issue, we propose CLUE, a novel framework that jointly encodes the OKB and CKB into a unified embedding space, to tackle OKB canonicalization and OKB linking simultaneously and make them benefit each other reciprocally. We design an expectation-maximization (EM) based approach to iteratively refine the unified embedding space via performing seed generation and embedding refinement alternately, by leveraging the deep interaction between OKB canonicalization and OKB linking. Curriculum learning is employed to yield high-quality canonicalization seeds and linking seeds adaptively, according to two elaborately designed metrics (i.e., a margin-based linking metric and an entropy-based cluster metric). A thorough experimental study over two public benchmark data sets demonstrates that our proposed CLUE consistently outperforms state-of-the-art baselines for the task of OKB canonicalization (resp. OKB linking) in terms of average F1 (resp. accuracy).

Supplemental Material

MP4 File
video presentation
MP4 File
Supplemental video

References

[1]
Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. Dbpedia: A nucleus for a web of open data. In The semantic web. 722--735.
[2]
Michele Banko, Michael J. Cafarella, Stephen Soderland, Matthew Broadhead, and Oren Etzioni. 2007. Open information extraction from the web. In IJCAI. 2670--2676.
[3]
Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD. 1247--1250.
[4]
Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In NIPS. 2787--2795.
[5]
Claudio Delli Bovi, Luca Telesca, and Roberto Navigli. 2015. Large-scale information extraction from textual definitions through deep syntactic and semantic analysis. TACL, Vol. 3 (2015), 529--543.
[6]
Jamie Callan, Mark Hoy, Changkuk Yoo, and Le Zhao. 2009. Clueweb09 data set.
[7]
Rich Caruana. 1994. Learning many related tasks at the same time with backpropagation. In NIPS. 657--664.
[8]
Rich Caruana. 1997. Multitask learning. Machine learning, Vol. 28, 1 (1997), 41--75.
[9]
Chia-Hui Chang, Mohammed Kayed, Moheb R Girgis, and Khaled F Shaalan. 2006. A survey of web information extraction systems. IEEE TKDE, Vol. 18, 10 (2006), 1411--1428.
[10]
Sarthak Dash, Gaetano Rossiello, Nandana Mihindukulasooriya, Sugato Bagchi, and Alfio Gliozzo. 2021. Open Knowledge Graphs Canonicalization using Variational Autoencoders. In EMNLP. 10379--10394.
[11]
David L Davies and Donald W Bouldin. 1979. A cluster separation measure. IEEE TPAMI 2 (1979), 224--227.
[12]
Arthur P Dempster, Nan M Laird, and Donald B Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), Vol. 39, 1 (1977), 1--22.
[13]
Oren Etzioni, Anthony Fader, Janara Christensen, Stephen Soderland, et al. 2011. Open information extraction: The second generation. In IJCAI. 3--10.
[14]
Anthony Fader, Stephen Soderland, and Oren Etzioni. 2011. Identifying relations for open information extraction. In EMNLP. 1535--1545.
[15]
Anthony Fader, Luke Zettlemoyer, and Oren Etzioni. 2014. Open Question Answering over Curated and Extracted Knowledge Bases. In SIGKDD. 1156--1165.
[16]
Paolo Ferragina and Ugo Scaiella. 2010. Tagme: on-the-fly annotation of short text fragments (by wikipedia entities). In CIKM. 1625--1628.
[17]
Luis Galárraga, Geremy Heitz, Kevin Murphy, and Fabian M Suchanek. 2014. Canonicalizing open knowledge bases. In CIKM. 1679--1688.
[18]
Luis Antonio Galárraga, Christina Teflioudi, Katja Hose, and Fabian Suchanek. 2013. AMIE: association rule mining under incomplete evidence in ontological knowledge bases. In WWW. 413--422.
[19]
Kiril Gashteovski, Sebastian Wanner, Sven Hertling, Samuel Broscheit, and Rainer Gemulla. 2019. OPIEC: an open information extraction corpus. In AKBC.
[20]
Edouard Grave, Piotr Bojanowski, Prakhar Gupta, Armand Joulin, and Tomas Mikolov. 2018. Learning word vectors for 157 languages. In LREC.
[21]
Xiao Huang, Jingyuan Zhang, Dingcheng Li, and Ping Li. 2019. Knowledge graph embedding based question answering. In WSDM. 105--113.
[22]
Shaoxiong Ji, Shirui Pan, Erik Cambria, Pekka Marttinen, and S Yu Philip. 2021. A survey on knowledge graphs: Representation, acquisition, and applications. IEEE TNNLS, Vol. 33, 2 (2021), 494--514.
[23]
Lu Jiang, Deyu Meng, Teruko Mitamura, and Alexander G Hauptmann. 2014. Easy samples first: Self-paced reranking for zero-example multimedia search. In ACM MM. 547--556.
[24]
Zhuxi Jiang, Yin Zheng, Huachun Tan, Bangsheng Tang, and Hanning Zhou. 2017. Variational deep embedding: An unsupervised and generative approach to clustering. In IJCAI. 1965--1972.
[25]
Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick Van Kleef, Sören Auer, et al. 2015. Dbpedia--a large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web, Vol. 6, 2 (2015), 167--195.
[26]
Manuel Leone, Stefano Huber, Akhil Arora, Alberto Garc'ia-Durán, and Robert West. 2022. A critical re-evaluation of neural methods for entity alignment. PVLDB, Vol. 15, 8 (2022), 1712--1725.
[27]
Xueling Lin and Lei Chen. 2019. Canonicalization of open knowledge bases with side information from the source text. In ICDE. 950--961.
[28]
Xueling Lin, Haoyang Li, Hao Xin, Zijian Li, and Lei Chen. 2020. KBPearl: a knowledge base population system supported by joint entity and relation linking. PVLDB, Vol. 13, 7 (2020), 1035--1049.
[29]
Bing Liu, Harrisen Scells, Guido Zuccon, Wen Hua, and Genghong Zhao. 2021a. ActiveEA: Active learning for neural entity alignment. In EMNLP. 3364--3374.
[30]
Guiliang Liu, Xu Li, Jiakang Wang, Mingming Sun, and Ping Li. 2020. Extracting knowledge from web text with monte carlo tree search. In WWW. 2585--2591.
[31]
Yinan Liu, Wei Shen, Yuanfei Wang, Jianyong Wang, Zhenglu Yang, and Xiaojie Yuan. 2021b. Joint open knowledge base canonicalization and linking. In SIGMOD. 2253--2261.
[32]
Denis Lukovnikov, Asja Fischer, Jens Lehmann, and Sören Auer. 2017. Neural network-based question answering over knowledge graphs on word and character level. In WWW. 1211--1220.
[33]
Xin Mao, Wenting Wang, Yuanbin Wu, and Man Lan. 2021a. Boosting the speed of entity alignment 10×: Dual attention matching network with normalized hard sample mining. In WWW. 821--832.
[34]
Xin Mao, Wenting Wang, Yuanbin Wu, and Man Lan. 2021b. From alignment to assignment: Frustratingly simple unsupervised entity alignment. In EMNLP. 2843--2853.
[35]
Pablo N Mendes, Joachim Daiber, Max Jakob, and Christian Bizer. 2011a. Evaluating dbpedia spotlight for the tac-kbp entity linking task. In TAC-KBP. 118--120.
[36]
Pablo N Mendes, Max Jakob, Andrés Garc'ia-Silva, and Christian Bizer. 2011b. DBpedia spotlight: shedding light on the web of documents. In I-SEMANTICS. 1--8.
[37]
Isaiah Onando Mulang, Kuldeep Singh, and Fabrizio Orlandi. 2017. Matching natural language relations to knowledge graph properties for question answering. In I-SEMANTICS. 89--96.
[38]
Ndapandula Nakashole, Gerhard Weikum, and Fabian Suchanek. 2012. PATTY: A taxonomy of relational patterns with semantic types. In EMNLP. 1135--1145.
[39]
Dat Ba Nguyen, Abdalghani Abujabal, Nam Khanh Tran, Martin Theobald, and Gerhard Weikum. 2017. Query-Driven On-The-Fly Knowledge Base Construction. PVLDB, Vol. 11, 1 (2017), 66--79.
[40]
Maximilian Nickel, Lorenzo Rosasco, and Tomaso Poggio. 2016. Holographic embeddings of knowledge graphs. In AAAI. 1955--1961.
[41]
Ellie Pavlick, Pushpendre Rastogi, Juri Ganitkevitch, Benjamin Van Durme, and Chris Callison-Burch. 2015. PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification. In ACL. 425--430.
[42]
Shichao Pei, Lu Yu, Guoxian Yu, and Xiangliang Zhang. 2022. Graph Alignment with Noisy Supervision. In WWW. 1104--1114.
[43]
Chenwei Ran, Wei Shen, Jianbo Gao, Yuhan Li, Jianyong Wang, and Yantao Jia. 2023. Learning Entity Linking Features for Emerging Entities. IEEE TKDE, Vol. 35, 7 (2023), 7088--7102.
[44]
Ahmad Sakor, Isaiah Onando Mulang, Kuldeep Singh, Saeedeh Shekarpour, Maria Esther Vidal, Jens Lehmann, and Sören Auer. 2019. Old is gold: linguistic driven approach for entity and relation linking of short text. In NAACL. 2336--2346.
[45]
Wei Shen, Yuhan Li, Yinan Liu, Jiawei Han, Jianyong Wang, and Xiaojie Yuan. 2023. Entity linking meets deep learning: Techniques and solutions. IEEE TKDE, Vol. 35, 3 (2023), 2556--2578.
[46]
Wei Shen, Yang Yang, and Yinan Liu. 2022a. Multi-View Clustering for Open Knowledge Base Canonicalization. In SIGKDD. 1578--1588.
[47]
Wei Shen, Yuwei Yin, Yang Yang, Jiawei Han, Jianyong Wang, and Xiaojie Yuan. 2022b. Toward tweet entity linking with heterogeneous information networks. IEEE TKDE, Vol. 34, 12 (2022), 6003--6017.
[48]
Gabriel Stanovsky, Julian Michael, Luke Zettlemoyer, and Ido Dagan. 2018. Supervised open information extraction. In NAACL. 885--895.
[49]
Fabian M Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: a core of semantic knowledge. In WWW. 697--706.
[50]
Zequn Sun, Wei Hu, Chengming Wang, Yuxin Wang, and Yuzhong Qu. 2022. Revisiting Embedding-based Entity Alignment: A Robust and Adaptive Method. IEEE TKDE (2022).
[51]
Zequn Sun, Wei Hu, Qingheng Zhang, and Yuzhong Qu. 2018. Bootstrapping Entity Alignment with Knowledge Graph Embedding. In IJCAI. 4396--4402.
[52]
Zequn Sun, Qingheng Zhang, Wei Hu, Chengming Wang, Muhao Chen, Farahnaz Akrami, and Chengkai Li. 2020. A benchmarking study of embedding-based entity alignment for knowledge graphs. PVLDB, Vol. 13, 11 (2020), 2326--2340.
[53]
Mihai Surdeanu, Julie Tibshirani, Ramesh Nallapati, and Christopher D Manning. 2012. Multi-instance multi-label learning for relation extraction. In EMNLP. 455--465.
[54]
Xiaobin Tang, Jing Zhang, Bo Chen, Yang Yang, Hong Chen, and Cuiping Li. 2020. BERT-INT: a BERT-based interaction model for knowledge graph alignment. In IJCAI. 3174--3180.
[55]
Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard. 2016. Complex embeddings for simple link prediction. In ICML. 2071--2080.
[56]
Johannes M van Hulst, Faegheh Hasibi, Koen Dercksen, Krisztian Balog, and Arjen P de Vries. 2020. Rel: An entity linker standing on the shoulders of giants. In SIGIR. 2197--2200.
[57]
Shikhar Vashishth, Prince Jain, and Partha Talukdar. 2018. Cesi: Canonicalizing open knowledge bases using embeddings and side information. In WWW. 1317--1327.
[58]
Denny Vrandevc ić. 2012. Wikidata: A new platform for collaborative data collection. In WWW. 1063--1064.
[59]
Denny Vrandevc ić and Markus Krötzsch. 2014. Wikidata: a free collaborative knowledgebase. Commun. ACM, Vol. 57, 10 (2014), 78--85.
[60]
Hongwei Wang, Fuzheng Zhang, Jialin Wang, Miao Zhao, Wenjie Li, Xing Xie, and Minyi Guo. 2018b. Ripplenet: Propagating user preferences on the knowledge graph for recommender systems. In CIKM. 417--426.
[61]
Quan Wang, Zhendong Mao, Bin Wang, and Li Guo. 2017. Knowledge graph embedding: A survey of approaches and applications. IEEE TKDE, Vol. 29, 12 (2017), 2724--2743.
[62]
Xin Wang, Yudong Chen, and Wenwu Zhu. 2021. A survey on curriculum learning. IEEE TPAMI, Vol. 44, 9 (2021), 4555--4576.
[63]
Xiang Wang, Dingxian Wang, Canran Xu, Xiangnan He, Yixin Cao, and Tat-Seng Chua. 2019. Explainable reasoning over knowledge graphs for recommendation. In AAAI. 5329--5336.
[64]
Zhichun Wang, Qingsong Lv, Xiaohan Lan, and Yu Zhang. 2018a. Cross-lingual knowledge graph alignment via graph convolutional networks. In EMNLP. 349--357.
[65]
William E Winkler. 1999. The state of record linkage and current research problems. Statistical Research Division (1999).
[66]
Hao Xin, Rui Meng, and Lei Chen. 2018. Subjective knowledge base construction powered by crowdsourcing and knowledge base. In SIGMOD. 1349--1361.
[67]
Kexuan Xin, Zequn Sun, Wen Hua, Bing Liu, Wei Hu, Jianfeng Qu, and Xiaofang Zhou. 2022. Ensemble Semi-supervised Entity Alignment via Cycle-teaching. In AAAI. 4281--4289.
[68]
Alexander Yates, Michele Banko, Matthew Broadhead, Michael J Cafarella, Oren Etzioni, and Stephen Soderland. 2007. Textrunner: open information extraction on the web. In NAACL. 25--26.
[69]
Weixin Zeng, Xiang Zhao, Jiuyang Tang, and Xuemin Lin. 2020. Collective entity alignment via adaptive features. In ICDE. 1870--1873.
[70]
Qingheng Zhang, Zequn Sun, Wei Hu, Muhao Chen, Lingbing Guo, and Yuzhong Qu. 2019. Multi-view knowledge graph embedding for entity alignment. In IJCAI. 5429--5435.
[71]
Wei Emma Zhang, Quan Z Sheng, Lina Yao, Kerry Taylor, Ali Shemshadi, and Yongrui Qin. 2018. A learning-based framework for improving querying on web interfaces of curated knowledge bases. ACM TOIT, Vol. 18, 3 (2018), 1--20.
[72]
Hao Zhu, Ruobing Xie, Zhiyuan Liu, and Maosong Sun. 2017. Iterative entity alignment via knowledge embeddings. In IJCAI. 4258--4264.

Index Terms

  1. Jointly Canonicalizing and Linking Open Knowledge Base via Unified Embedding Learning

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      WWW '24: Proceedings of the ACM Web Conference 2024
      May 2024
      4826 pages
      ISBN:9798400701719
      DOI:10.1145/3589334
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 13 May 2024

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. open knowledge base canonicalization
      2. open knowledge base linking
      3. unified embedding learning

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      WWW '24
      Sponsor:
      WWW '24: The ACM Web Conference 2024
      May 13 - 17, 2024
      Singapore, Singapore

      Acceptance Rates

      Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 146
        Total Downloads
      • Downloads (Last 12 months)146
      • Downloads (Last 6 weeks)9
      Reflects downloads up to 14 Feb 2025

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media