Self-learning and embedding based entity alignment

Guan, Saiping; Jin, Xiaolong; Wang, Yuanzhuo; Jia, Yantao; Shen, Huawei; Li, Zixuan; Cheng, Xueqi

doi:10.1007/s10115-018-1191-0

Self-learning and embedding based entity alignment

Regular Paper
Published: 26 April 2018

Volume 59, pages 361–386, (2019)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Saiping Guan^1,2,
Xiaolong Jin^1,2,
Yuanzhuo Wang^1,2,
Yantao Jia^1,2,
Huawei Shen^1,2,
Zixuan Li^1,2 &
…
Xueqi Cheng^1,2

1325 Accesses
14 Citations
Explore all metrics

Abstract

Entity alignment aims to identify semantical matchings between entities from different groups. Traditional methods (e.g., attribute comparison-based methods, graph operation-based methods and active learning ones) are usually supervised by labeled data as prior knowledge. Since it is not trivial to label data for training, researchers have then turned to unsupervised methods, and have thus developed similarity-based methods, probabilistic methods, graphical model-based methods, etc. In addition, structure or class information is further explored. As an important part of a knowledge graph, entities contain rich semantical information that can be well learned by knowledge graph embedding methods in low-dimensional vector spaces. However, existing methods for entity alignment have paid little attention to knowledge graph embedding. In this paper, we propose a self-learning and embedding based method for entity alignment, thus called SEEA, to iteratively find semantically aligned entity pairs, which makes full use of semantical information contained in the attributes of entities. Experiments on three realistic datasets and comparison with a few baseline methods validate the effectiveness and merits of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Knowledge Graphs: Opportunities and Challenges

Article Open access 03 April 2023

Enhancing Recommender System with Multi-modal Knowledge Graph

Modeling Relational Data with Graph Convolutional Networks

Notes

References

Algergawy A, Nayak R, Saake G (2010) Element similarity measures in xml schema matching. Inf Sci 180(24):4975–4998
Article Google Scholar
Arasu A, Götz M, Kaushik R (2010) On active learning of record matching packages. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data (SIGMOD’10), pp 783–794
Araujo S, Tran DT, de Vries AP, Schwabe D (2015) SERIMI: class-based matching for instance matching across heterogeneous datasets. IEEE Trans Knowl Data Eng 27(5):1397–1440
Article Google Scholar
Bibby J (1974) Axiomatisations of the average and a further generalisation of monotonic sequences. Glasg Math J 15(1):63–65
Article MathSciNet MATH Google Scholar
Bilenko M, Mooney RJ (2003) Adaptive duplicate detection using learnable string similarity measures. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’03), pp 39–48
Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. In: Proceedings of the 26th international conference on neural information processing systems (NIPS’13), pp 2787–2795
Cai P, Li W, Feng Y, Wang Y, Jia Y (2017) Learning knowledge representation across knowledge graphs. In: AAAI 2017 workshop on knowledge-based techniques for problem solving and reasoning (KnowProS’17)’
Chen M, Tian Y, Yang M, Zaniolo C (2016) Multi-lingual knowledge graph embeddings for cross-lingual knowledge alignment. arXiv preprint arXiv:1611.03954
Chen Z, Kalashnikov DV, Mehrotra S (2009) Exploiting context analysis for combining multiple entity resolution systems. In: Proceedings of the 2009 ACM SIGMOD international conference on management of data (SIGMOD’09), pp 207–218
Cohen WW, Richman J (2002) Learning to match and cluster large high-dimensional data sets for data integration. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’02), pp 475–480
Cook RD, Yin X (2001) Theory & methods: special invited paper: dimension reduction and visualization in discriminant analysis (with discussion). Aust N Z J Stat 43(2):147–199
Article MathSciNet MATH Google Scholar
Elfeky MG, Verykios VS, Elmagarmid AK (2002) Tailor: a record linkage toolbox. In: Proceedings of the 18th international conference on data engineering (ICDE’02), pp 17–28
Feng J, Huang M, Wang M, Zhou M, Hao Y, Zhu X (2016) Knowledge graph embedding by flexible translation. In: Proceedings of the 15th international conference on principles of knowledge representation and reasoning (KR’16), pp 557–560
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Hum Genet 7(2):179–188
Google Scholar
Freund Y, Schapire RE (1999) A short introduction to boosting. J Jpn Soc Artif Intell 14(5):771–780
Google Scholar
Goldberg Y, Levy O (2014) Word2vec explained: Deriving mikolov et al.’s negative-sampling word-embedding method, arXiv preprint. arXiv:1402.3722
Hao Y, Zhang Y, He S, Liu K, Zhao J (2016) A joint embedding method for entity alignment of knowledge bases. In: Proceedings of the 1st China conference on knowledge graph and semantic computing (CCKS’16). Springer, pp 3–14
He W, Feng Y, Zou L, Zhao D (2015) Knowledge base completion using matrix factorization. In: Proceedings of the 17th Asia-Pacific web conference (APWeb’15), pp 256–267
Jenatton R, Roux NL, Bordes A, Obozinski G (2012) A latent factor model for highly multi-relational data. In: Proceedings of the 25th international conference on neural information processing systems (NIPS’12), pp 3167–3175
Ji G, He S, Xu L, Liu K, Zhao J (2015) Knowledge graph embedding via dynamic mapping matrix. In: Proceedings of the 53rd annual meeting of the association for computational linguistics (ACL’15), pp 687–696
Ji G, Liu K, He S, Zhao J (2016) Knowledge graph completion with adaptive sparse transfer matrix. In: Proceedings of the 30th AAAI conference on artificial intelligence (AAAI’16), pp 985–991
Jia Y, Wang Y, Lin H, Jin X, Cheng X (2016) Locally adaptive translation for knowledge graph embedding. In: Proceedings of the 30th AAAI conference on artificial intelligence (AAAI’16), pp 992–998
Jiménez-Ruiz E, Grau BC (2011) Logmap: Logic-based and scalable ontology matching. In: Proceedings of the 10th international conference on the Semantic Web-volume part I (ISWC’11), pp 273–288
Lacoste-Julien S, Palla K, Davies A, Kasneci G, Graepel T, Ghahramani Z (2013) Sigma: simple greedy matching for aligning large knowledge bases. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’13), pp 572–580
Lin H, Wang Y, Jia Y, Xiong J, Zhang P, Cheng X (2015) An ensemble matchers based rank aggregation method for taxonomy matching. In: Proceedings of the 17th Asia-Pacific Web conference (APWeb’15), pp 190–202
Lin Y, Liu Z, Luan H, Sun M, Rao S, Liu S (2015) Modeling relation paths for representation learning of knowledge bases. In: Proceedings of the 2015 conference on empirical methods in natural language processing (EMNLP’15), pp 705–714
Lin Y, Liu Z, Sun M (2016) Knowledge representation learning with entities, attributes and relations. In: Proceedings of the 25th international joint conference on artificial intelligence (IJCAI’16), pp 2866–2872
Lin Y, Liu Z, Sun M, Liu Y, Zhu X (2015) Learning entity and relation embeddings for knowledge graph completion. In: Proceedings of the 29th AAAI conference on artificial intelligence (AAAI’15), pp 2181–2187
Marie A, Gal A (2008) Boosting schema matchers. In: Proceedings of the OTM 2008 confederated international conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part I on the move to meaningful internet systems (OTM’08), pp 283–300
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th international conference on neural information processing systems (NIPS’13), pp 3111–3119
Ngo D, Bellahsene Z (2016) Overview of YAM++(not) yet another matcher for ontology alignment task. Web Semant Sci Serv Agents World Wide Web 41:30–49
Article Google Scholar
Ngomo A-CN, Lyko K (2013) Unsupervised learning of link specifications: Deterministic vs. non-deterministic. In: Proceedings of the 8th international conference on ontology matching-volume 1111 (OM’13), pp 25–36
Nguyen DQ, Sirts K, Qu L, Johnson M (2016) Stranse: a novel embedding model of entities and relationships in knowledge bases. In: Proceedings of the 15th conference of North American chapter of the Association for Computational Linguistics: human language technologies (NAACL-HLT’16), pp 460–466
Nickel M, Tresp V, Kriegel H-P (2011) A three-way model for collective learning on multi-relational data. In: Proceedings of the 28th international conference on machine learning (ICML’11), pp 809–816
Nikolov A, d’Aquin M, Motta E (2012) Unsupervised learning of link discovery configuration. In: Proceedings of the 9th international conference on the Semantic Web: research and applications (ESWC’12), pp 119–133
Peukert E, Massmann S, Koenig K (2010) Comparing similarity combination methods for schema matching. GI Jahrestag 1(175):692–701
Google Scholar
Ravikumar P, Cohen WW (2004) A hierarchical graphical model for record linkage. In: Proceedings of the 20th conference on uncertainty in artificial intelligence (UAI’04), pp 454–461
Saleem K, Bellahsene Z, Hunt E (2008) PORSCHE: performance oriented schema mediation. Inf Syst 33(7):637–657
Article Google Scholar
Sarawagi S, Bhamidipaty A (2002) Interactive deduplication using active learning. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’02), pp 269–278
Suchanek FM, Abiteboul S, Senellart P (2011) Paris: probabilistic alignment of relations, instances, and schema. Proc VLDB Endow 5(3):157–168
Article Google Scholar
Sun Z, Hu W, Li C (2017) Cross-lingual entity alignment via joint attribute-preserving embedding, arXiv preprint. arXiv:1708.05045
Tekli J, Chbeir R (2012) Minimizing user effort in xml grammar matching. Inf Sci 210:1–40
Article Google Scholar
Trouillon T, Welbl J, Riedel S, Gaussier E, Bouchard G (2016) Complex embeddings for simple link prediction. In: Proceedings of the 33rd international conference on machine learning (ICML’16), vol 48, pp 2071–2080
Wang Z, Zhang J, Feng J, Chen Z (2014) Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the 28th AAAI conference on artificial intelligence (AAAI’14), pp 1112–1119
Zhang K, Shasha D (1989) Simple fast algorithms for the editing distance between trees and related problems. SIAM J Comput 18(6):1245–1262
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work is supported by National Key Research and Development Program of China under Grants 2016YFB1000902 and 2017YFC0820404, and National Natural Science Foundation of China under Grants 61772501, 61572473, 61572469, and 91646120.

Author information

Authors and Affiliations

CAS Key Laboratory of Network Data Science and Technology, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Saiping Guan, Xiaolong Jin, Yuanzhuo Wang, Yantao Jia, Huawei Shen, Zixuan Li & Xueqi Cheng
School of Computer and Control Engineering, University of Chinese Academy of Sciences, Beijing, China
Saiping Guan, Xiaolong Jin, Yuanzhuo Wang, Yantao Jia, Huawei Shen, Zixuan Li & Xueqi Cheng

Authors

Saiping Guan
View author publications
You can also search for this author in PubMed Google Scholar
Xiaolong Jin
View author publications
You can also search for this author in PubMed Google Scholar
Yuanzhuo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yantao Jia
View author publications
You can also search for this author in PubMed Google Scholar
Huawei Shen
View author publications
You can also search for this author in PubMed Google Scholar
Zixuan Li
View author publications
You can also search for this author in PubMed Google Scholar
Xueqi Cheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaolong Jin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guan, S., Jin, X., Wang, Y. et al. Self-learning and embedding based entity alignment. Knowl Inf Syst 59, 361–386 (2019). https://doi.org/10.1007/s10115-018-1191-0

Download citation

Received: 02 October 2017
Revised: 10 March 2018
Accepted: 18 April 2018
Published: 26 April 2018
Issue Date: 07 May 2019
DOI: https://doi.org/10.1007/s10115-018-1191-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Self-learning and embedding based entity alignment

Abstract

Access this article

Similar content being viewed by others

Knowledge Graphs: Opportunities and Challenges

Enhancing Recommender System with Multi-modal Knowledge Graph

Modeling Relational Data with Graph Convolutional Networks

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Self-learning and embedding based entity alignment

Abstract

Access this article

Similar content being viewed by others

Knowledge Graphs: Opportunities and Challenges

Enhancing Recommender System with Multi-modal Knowledge Graph

Modeling Relational Data with Graph Convolutional Networks

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation