research-article

Jointly Canonicalizing and Linking Open Knowledge Base via Unified Embedding Learning

Authors:

Yinan LiuAuthors Info & Claims

WWW '24: Proceedings of the ACM Web Conference 2024

Pages 2304 - 2314

https://doi.org/10.1145/3589334.3645700

Published: 13 May 2024 Publication History

Abstract

Recent years have witnessed increasing attention on the semantic knowledge integration between curated knowledge bases (CKBs) and open knowledge bases (OKBs), which is non-trivial due to the intrinsically heterogeneous features involved in CKBs and OKBs. OKB canonicalization and OKB linking are regarded as two vital tasks to achieve the knowledge integration. Although these two tasks are inherently complementary with each other, previous studies just solve them separately or via superficial interaction. To address this issue, we propose CLUE, a novel framework that jointly encodes the OKB and CKB into a unified embedding space, to tackle OKB canonicalization and OKB linking simultaneously and make them benefit each other reciprocally. We design an expectation-maximization (EM) based approach to iteratively refine the unified embedding space via performing seed generation and embedding refinement alternately, by leveraging the deep interaction between OKB canonicalization and OKB linking. Curriculum learning is employed to yield high-quality canonicalization seeds and linking seeds adaptively, according to two elaborately designed metrics (i.e., a margin-based linking metric and an entropy-based cluster metric). A thorough experimental study over two public benchmark data sets demonstrates that our proposed CLUE consistently outperforms state-of-the-art baselines for the task of OKB canonicalization (resp. OKB linking) in terms of average F1 (resp. accuracy).

Supplemental Material

MP4 File

video presentation

Download
1212.79 MB

MP4 File

Supplemental video

Download
10.57 MB

References

[1]

Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. Dbpedia: A nucleus for a web of open data. In The semantic web. 722--735.

[2]

Michele Banko, Michael J. Cafarella, Stephen Soderland, Matthew Broadhead, and Oren Etzioni. 2007. Open information extraction from the web. In IJCAI. 2670--2676.

[3]

Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD. 1247--1250.

[4]

Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In NIPS. 2787--2795.

[5]

Claudio Delli Bovi, Luca Telesca, and Roberto Navigli. 2015. Large-scale information extraction from textual definitions through deep syntactic and semantic analysis. TACL, Vol. 3 (2015), 529--543.

[6]

Jamie Callan, Mark Hoy, Changkuk Yoo, and Le Zhao. 2009. Clueweb09 data set.

[7]

Rich Caruana. 1994. Learning many related tasks at the same time with backpropagation. In NIPS. 657--664.

[8]

Rich Caruana. 1997. Multitask learning. Machine learning, Vol. 28, 1 (1997), 41--75.

[9]

Chia-Hui Chang, Mohammed Kayed, Moheb R Girgis, and Khaled F Shaalan. 2006. A survey of web information extraction systems. IEEE TKDE, Vol. 18, 10 (2006), 1411--1428.

[10]

Sarthak Dash, Gaetano Rossiello, Nandana Mihindukulasooriya, Sugato Bagchi, and Alfio Gliozzo. 2021. Open Knowledge Graphs Canonicalization using Variational Autoencoders. In EMNLP. 10379--10394.

[11]

David L Davies and Donald W Bouldin. 1979. A cluster separation measure. IEEE TPAMI 2 (1979), 224--227.

Digital Library

[12]

Arthur P Dempster, Nan M Laird, and Donald B Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), Vol. 39, 1 (1977), 1--22.

[13]

Oren Etzioni, Anthony Fader, Janara Christensen, Stephen Soderland, et al. 2011. Open information extraction: The second generation. In IJCAI. 3--10.

[14]

Anthony Fader, Stephen Soderland, and Oren Etzioni. 2011. Identifying relations for open information extraction. In EMNLP. 1535--1545.

[15]

Anthony Fader, Luke Zettlemoyer, and Oren Etzioni. 2014. Open Question Answering over Curated and Extracted Knowledge Bases. In SIGKDD. 1156--1165.

[16]

Paolo Ferragina and Ugo Scaiella. 2010. Tagme: on-the-fly annotation of short text fragments (by wikipedia entities). In CIKM. 1625--1628.

[17]

Luis Galárraga, Geremy Heitz, Kevin Murphy, and Fabian M Suchanek. 2014. Canonicalizing open knowledge bases. In CIKM. 1679--1688.

[18]

Luis Antonio Galárraga, Christina Teflioudi, Katja Hose, and Fabian Suchanek. 2013. AMIE: association rule mining under incomplete evidence in ontological knowledge bases. In WWW. 413--422.

[19]

Kiril Gashteovski, Sebastian Wanner, Sven Hertling, Samuel Broscheit, and Rainer Gemulla. 2019. OPIEC: an open information extraction corpus. In AKBC.

[20]

Edouard Grave, Piotr Bojanowski, Prakhar Gupta, Armand Joulin, and Tomas Mikolov. 2018. Learning word vectors for 157 languages. In LREC.

[21]

Xiao Huang, Jingyuan Zhang, Dingcheng Li, and Ping Li. 2019. Knowledge graph embedding based question answering. In WSDM. 105--113.

[22]

Shaoxiong Ji, Shirui Pan, Erik Cambria, Pekka Marttinen, and S Yu Philip. 2021. A survey on knowledge graphs: Representation, acquisition, and applications. IEEE TNNLS, Vol. 33, 2 (2021), 494--514.

[23]

Lu Jiang, Deyu Meng, Teruko Mitamura, and Alexander G Hauptmann. 2014. Easy samples first: Self-paced reranking for zero-example multimedia search. In ACM MM. 547--556.

[24]

Zhuxi Jiang, Yin Zheng, Huachun Tan, Bangsheng Tang, and Hanning Zhou. 2017. Variational deep embedding: An unsupervised and generative approach to clustering. In IJCAI. 1965--1972.

[25]

Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick Van Kleef, Sören Auer, et al. 2015. Dbpedia--a large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web, Vol. 6, 2 (2015), 167--195.

[26]

Manuel Leone, Stefano Huber, Akhil Arora, Alberto Garc'ia-Durán, and Robert West. 2022. A critical re-evaluation of neural methods for entity alignment. PVLDB, Vol. 15, 8 (2022), 1712--1725.

Digital Library

[27]

Xueling Lin and Lei Chen. 2019. Canonicalization of open knowledge bases with side information from the source text. In ICDE. 950--961.

[28]

Xueling Lin, Haoyang Li, Hao Xin, Zijian Li, and Lei Chen. 2020. KBPearl: a knowledge base population system supported by joint entity and relation linking. PVLDB, Vol. 13, 7 (2020), 1035--1049.

Digital Library

[29]

Bing Liu, Harrisen Scells, Guido Zuccon, Wen Hua, and Genghong Zhao. 2021a. ActiveEA: Active learning for neural entity alignment. In EMNLP. 3364--3374.

[30]

Guiliang Liu, Xu Li, Jiakang Wang, Mingming Sun, and Ping Li. 2020. Extracting knowledge from web text with monte carlo tree search. In WWW. 2585--2591.

[31]

Yinan Liu, Wei Shen, Yuanfei Wang, Jianyong Wang, Zhenglu Yang, and Xiaojie Yuan. 2021b. Joint open knowledge base canonicalization and linking. In SIGMOD. 2253--2261.

[32]

Denis Lukovnikov, Asja Fischer, Jens Lehmann, and Sören Auer. 2017. Neural network-based question answering over knowledge graphs on word and character level. In WWW. 1211--1220.

[33]

Xin Mao, Wenting Wang, Yuanbin Wu, and Man Lan. 2021a. Boosting the speed of entity alignment 10×: Dual attention matching network with normalized hard sample mining. In WWW. 821--832.

[34]

Xin Mao, Wenting Wang, Yuanbin Wu, and Man Lan. 2021b. From alignment to assignment: Frustratingly simple unsupervised entity alignment. In EMNLP. 2843--2853.

[35]

Pablo N Mendes, Joachim Daiber, Max Jakob, and Christian Bizer. 2011a. Evaluating dbpedia spotlight for the tac-kbp entity linking task. In TAC-KBP. 118--120.

[36]

Pablo N Mendes, Max Jakob, Andrés Garc'ia-Silva, and Christian Bizer. 2011b. DBpedia spotlight: shedding light on the web of documents. In I-SEMANTICS. 1--8.

[37]

Isaiah Onando Mulang, Kuldeep Singh, and Fabrizio Orlandi. 2017. Matching natural language relations to knowledge graph properties for question answering. In I-SEMANTICS. 89--96.

[38]

Ndapandula Nakashole, Gerhard Weikum, and Fabian Suchanek. 2012. PATTY: A taxonomy of relational patterns with semantic types. In EMNLP. 1135--1145.

[39]

Dat Ba Nguyen, Abdalghani Abujabal, Nam Khanh Tran, Martin Theobald, and Gerhard Weikum. 2017. Query-Driven On-The-Fly Knowledge Base Construction. PVLDB, Vol. 11, 1 (2017), 66--79.

Digital Library

[40]

Maximilian Nickel, Lorenzo Rosasco, and Tomaso Poggio. 2016. Holographic embeddings of knowledge graphs. In AAAI. 1955--1961.

[41]

Ellie Pavlick, Pushpendre Rastogi, Juri Ganitkevitch, Benjamin Van Durme, and Chris Callison-Burch. 2015. PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification. In ACL. 425--430.

[42]

Shichao Pei, Lu Yu, Guoxian Yu, and Xiangliang Zhang. 2022. Graph Alignment with Noisy Supervision. In WWW. 1104--1114.

[43]

Chenwei Ran, Wei Shen, Jianbo Gao, Yuhan Li, Jianyong Wang, and Yantao Jia. 2023. Learning Entity Linking Features for Emerging Entities. IEEE TKDE, Vol. 35, 7 (2023), 7088--7102.

[44]

Ahmad Sakor, Isaiah Onando Mulang, Kuldeep Singh, Saeedeh Shekarpour, Maria Esther Vidal, Jens Lehmann, and Sören Auer. 2019. Old is gold: linguistic driven approach for entity and relation linking of short text. In NAACL. 2336--2346.

[45]

Wei Shen, Yuhan Li, Yinan Liu, Jiawei Han, Jianyong Wang, and Xiaojie Yuan. 2023. Entity linking meets deep learning: Techniques and solutions. IEEE TKDE, Vol. 35, 3 (2023), 2556--2578.

[46]

Wei Shen, Yang Yang, and Yinan Liu. 2022a. Multi-View Clustering for Open Knowledge Base Canonicalization. In SIGKDD. 1578--1588.

[47]

Wei Shen, Yuwei Yin, Yang Yang, Jiawei Han, Jianyong Wang, and Xiaojie Yuan. 2022b. Toward tweet entity linking with heterogeneous information networks. IEEE TKDE, Vol. 34, 12 (2022), 6003--6017.

[48]

Gabriel Stanovsky, Julian Michael, Luke Zettlemoyer, and Ido Dagan. 2018. Supervised open information extraction. In NAACL. 885--895.

[49]

Fabian M Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: a core of semantic knowledge. In WWW. 697--706.

[50]

Zequn Sun, Wei Hu, Chengming Wang, Yuxin Wang, and Yuzhong Qu. 2022. Revisiting Embedding-based Entity Alignment: A Robust and Adaptive Method. IEEE TKDE (2022).

[51]

Zequn Sun, Wei Hu, Qingheng Zhang, and Yuzhong Qu. 2018. Bootstrapping Entity Alignment with Knowledge Graph Embedding. In IJCAI. 4396--4402.

[52]

Zequn Sun, Qingheng Zhang, Wei Hu, Chengming Wang, Muhao Chen, Farahnaz Akrami, and Chengkai Li. 2020. A benchmarking study of embedding-based entity alignment for knowledge graphs. PVLDB, Vol. 13, 11 (2020), 2326--2340.

Digital Library

[53]

Mihai Surdeanu, Julie Tibshirani, Ramesh Nallapati, and Christopher D Manning. 2012. Multi-instance multi-label learning for relation extraction. In EMNLP. 455--465.

[54]

Xiaobin Tang, Jing Zhang, Bo Chen, Yang Yang, Hong Chen, and Cuiping Li. 2020. BERT-INT: a BERT-based interaction model for knowledge graph alignment. In IJCAI. 3174--3180.

[55]

Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard. 2016. Complex embeddings for simple link prediction. In ICML. 2071--2080.

[56]

Johannes M van Hulst, Faegheh Hasibi, Koen Dercksen, Krisztian Balog, and Arjen P de Vries. 2020. Rel: An entity linker standing on the shoulders of giants. In SIGIR. 2197--2200.

[57]

Shikhar Vashishth, Prince Jain, and Partha Talukdar. 2018. Cesi: Canonicalizing open knowledge bases using embeddings and side information. In WWW. 1317--1327.

[58]

Denny Vrandevc ić. 2012. Wikidata: A new platform for collaborative data collection. In WWW. 1063--1064.

[59]

Denny Vrandevc ić and Markus Krötzsch. 2014. Wikidata: a free collaborative knowledgebase. Commun. ACM, Vol. 57, 10 (2014), 78--85.

Digital Library

[60]

Hongwei Wang, Fuzheng Zhang, Jialin Wang, Miao Zhao, Wenjie Li, Xing Xie, and Minyi Guo. 2018b. Ripplenet: Propagating user preferences on the knowledge graph for recommender systems. In CIKM. 417--426.

Digital Library

[61]

Quan Wang, Zhendong Mao, Bin Wang, and Li Guo. 2017. Knowledge graph embedding: A survey of approaches and applications. IEEE TKDE, Vol. 29, 12 (2017), 2724--2743.

[62]

Xin Wang, Yudong Chen, and Wenwu Zhu. 2021. A survey on curriculum learning. IEEE TPAMI, Vol. 44, 9 (2021), 4555--4576.

[63]

Xiang Wang, Dingxian Wang, Canran Xu, Xiangnan He, Yixin Cao, and Tat-Seng Chua. 2019. Explainable reasoning over knowledge graphs for recommendation. In AAAI. 5329--5336.

[64]

Zhichun Wang, Qingsong Lv, Xiaohan Lan, and Yu Zhang. 2018a. Cross-lingual knowledge graph alignment via graph convolutional networks. In EMNLP. 349--357.

[65]

William E Winkler. 1999. The state of record linkage and current research problems. Statistical Research Division (1999).

[66]

Hao Xin, Rui Meng, and Lei Chen. 2018. Subjective knowledge base construction powered by crowdsourcing and knowledge base. In SIGMOD. 1349--1361.

[67]

Kexuan Xin, Zequn Sun, Wen Hua, Bing Liu, Wei Hu, Jianfeng Qu, and Xiaofang Zhou. 2022. Ensemble Semi-supervised Entity Alignment via Cycle-teaching. In AAAI. 4281--4289.

[68]

Alexander Yates, Michele Banko, Matthew Broadhead, Michael J Cafarella, Oren Etzioni, and Stephen Soderland. 2007. Textrunner: open information extraction on the web. In NAACL. 25--26.

[69]

Weixin Zeng, Xiang Zhao, Jiuyang Tang, and Xuemin Lin. 2020. Collective entity alignment via adaptive features. In ICDE. 1870--1873.

[70]

Qingheng Zhang, Zequn Sun, Wei Hu, Muhao Chen, Lingbing Guo, and Yuzhong Qu. 2019. Multi-view knowledge graph embedding for entity alignment. In IJCAI. 5429--5435.

[71]

Wei Emma Zhang, Quan Z Sheng, Lina Yao, Kerry Taylor, Ali Shemshadi, and Yongrui Qin. 2018. A learning-based framework for improving querying on web interfaces of curated knowledge bases. ACM TOIT, Vol. 18, 3 (2018), 1--20.

Digital Library

[72]

Hao Zhu, Ruobing Xie, Zhiyuan Liu, and Maosong Sun. 2017. Iterative entity alignment via knowledge embeddings. In IJCAI. 4258--4264.

Index Terms

Jointly Canonicalizing and Linking Open Knowledge Base via Unified Embedding Learning
1. Information systems
  1. Data management systems
    1. Information integration
  2. Information systems applications
    1. Data mining
      1. Data cleaning

Recommendations

Multi-View Clustering for Open Knowledge Base Canonicalization
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Open information extraction (OIE) methods extract plenty of OIE triples <noun phrase, relation phrase, noun phrase> from unstructured text, which compose large open knowledge bases (OKBs). Noun phrases and relation phrases in such OKBs are not ...
Joint Open Knowledge Base Canonicalization and Linking
SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data

Open Information Extraction (OIE) methods extract a large number of OIE triples (noun phrase, relation phrase, noun phrase) from text, which compose large Open Knowledge Bases (OKBs). However, noun phrases (NPs) and relation phrases (RPs) in OKBs are not ...
Knowledge graph embedding via multiplicative interaction
ICIAI '18: Proceedings of the 2nd International Conference on Innovation in Artificial Intelligence

Knowledge graphs are playing a crucial role in many machine learning applications. Since most of the knowledge graphs are far from complete, many knowledge graph completion models have been proposed. TransE and its extended models all model knowledge ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '24: Proceedings of the ACM Web Conference 2024

May 2024

4826 pages

ISBN:9798400701719

DOI:10.1145/3589334

General Chairs:
Tat-Seng Chua
National University of Singapore
,
Chong-Wah Ngo
Singapore Management University
,
Proceedings Chair:
Roy Ka-Wei Lee
Singapore University of Technology and Design
,
Program Chairs:
Ravi Kumar
Google
,
Hady W. Lauw
Singapore Management University

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

CAAI-Huawei MindSpore Open Fund
National Natural Science Foundation of China

Conference

WWW '24

Sponsor:

SIGWEB

WWW '24: The ACM Web Conference 2024

May 13 - 17, 2024

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
146
Total Downloads

Downloads (Last 12 months)146
Downloads (Last 6 weeks)9

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten