Skip to main content
Log in

Self-learning and embedding based entity alignment

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Entity alignment aims to identify semantical matchings between entities from different groups. Traditional methods (e.g., attribute comparison-based methods, graph operation-based methods and active learning ones) are usually supervised by labeled data as prior knowledge. Since it is not trivial to label data for training, researchers have then turned to unsupervised methods, and have thus developed similarity-based methods, probabilistic methods, graphical model-based methods, etc. In addition, structure or class information is further explored. As an important part of a knowledge graph, entities contain rich semantical information that can be well learned by knowledge graph embedding methods in low-dimensional vector spaces. However, existing methods for entity alignment have paid little attention to knowledge graph embedding. In this paper, we propose a self-learning and embedding based method for entity alignment, thus called SEEA, to iteratively find semantically aligned entity pairs, which makes full use of semantical information contained in the attributes of entities. Experiments on three realistic datasets and comparison with a few baseline methods validate the effectiveness and merits of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. http://movie.baidu.com/.

  2. https://movie.douban.com/.

  3. https://hpi.de/naumann/projects/data-quality-and-cleansing/dude-duplicate-detection.html#c115302.

  4. http://webdam.inria.fr/paris/.

  5. https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/.

  6. http://www.imdb.com/interfaces/#plain.

References

  1. Algergawy A, Nayak R, Saake G (2010) Element similarity measures in xml schema matching. Inf Sci 180(24):4975–4998

    Article  Google Scholar 

  2. Arasu A, Götz M, Kaushik R (2010) On active learning of record matching packages. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data (SIGMOD’10), pp 783–794

  3. Araujo S, Tran DT, de Vries AP, Schwabe D (2015) SERIMI: class-based matching for instance matching across heterogeneous datasets. IEEE Trans Knowl Data Eng 27(5):1397–1440

    Article  Google Scholar 

  4. Bibby J (1974) Axiomatisations of the average and a further generalisation of monotonic sequences. Glasg Math J 15(1):63–65

    Article  MathSciNet  MATH  Google Scholar 

  5. Bilenko M, Mooney RJ (2003) Adaptive duplicate detection using learnable string similarity measures. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’03), pp 39–48

  6. Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. In: Proceedings of the 26th international conference on neural information processing systems (NIPS’13), pp 2787–2795

  7. Cai P, Li W, Feng Y, Wang Y, Jia Y (2017) Learning knowledge representation across knowledge graphs. In: AAAI 2017 workshop on knowledge-based techniques for problem solving and reasoning (KnowProS’17)’

  8. Chen M, Tian Y, Yang M, Zaniolo C (2016) Multi-lingual knowledge graph embeddings for cross-lingual knowledge alignment. arXiv preprint arXiv:1611.03954

  9. Chen Z, Kalashnikov DV, Mehrotra S (2009) Exploiting context analysis for combining multiple entity resolution systems. In: Proceedings of the 2009 ACM SIGMOD international conference on management of data (SIGMOD’09), pp 207–218

  10. Cohen WW, Richman J (2002) Learning to match and cluster large high-dimensional data sets for data integration. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’02), pp 475–480

  11. Cook RD, Yin X (2001) Theory & methods: special invited paper: dimension reduction and visualization in discriminant analysis (with discussion). Aust N Z J Stat 43(2):147–199

    Article  MathSciNet  MATH  Google Scholar 

  12. Elfeky MG, Verykios VS, Elmagarmid AK (2002) Tailor: a record linkage toolbox. In: Proceedings of the 18th international conference on data engineering (ICDE’02), pp 17–28

  13. Feng J, Huang M, Wang M, Zhou M, Hao Y, Zhu X (2016) Knowledge graph embedding by flexible translation. In: Proceedings of the 15th international conference on principles of knowledge representation and reasoning (KR’16), pp 557–560

  14. Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Hum Genet 7(2):179–188

    Google Scholar 

  15. Freund Y, Schapire RE (1999) A short introduction to boosting. J Jpn Soc Artif Intell 14(5):771–780

    Google Scholar 

  16. Goldberg Y, Levy O (2014) Word2vec explained: Deriving mikolov et al.’s negative-sampling word-embedding method, arXiv preprint. arXiv:1402.3722

  17. Hao Y, Zhang Y, He S, Liu K, Zhao J (2016) A joint embedding method for entity alignment of knowledge bases. In: Proceedings of the 1st China conference on knowledge graph and semantic computing (CCKS’16). Springer, pp 3–14

  18. He W, Feng Y, Zou L, Zhao D (2015) Knowledge base completion using matrix factorization. In: Proceedings of the 17th Asia-Pacific web conference (APWeb’15), pp 256–267

  19. Jenatton R, Roux NL, Bordes A, Obozinski G (2012) A latent factor model for highly multi-relational data. In: Proceedings of the 25th international conference on neural information processing systems (NIPS’12), pp 3167–3175

  20. Ji G, He S, Xu L, Liu K, Zhao J (2015) Knowledge graph embedding via dynamic mapping matrix. In: Proceedings of the 53rd annual meeting of the association for computational linguistics (ACL’15), pp 687–696

  21. Ji G, Liu K, He S, Zhao J (2016) Knowledge graph completion with adaptive sparse transfer matrix. In: Proceedings of the 30th AAAI conference on artificial intelligence (AAAI’16), pp 985–991

  22. Jia Y, Wang Y, Lin H, Jin X, Cheng X (2016) Locally adaptive translation for knowledge graph embedding. In: Proceedings of the 30th AAAI conference on artificial intelligence (AAAI’16), pp 992–998

  23. Jiménez-Ruiz E, Grau BC (2011) Logmap: Logic-based and scalable ontology matching. In: Proceedings of the 10th international conference on the Semantic Web-volume part I (ISWC’11), pp 273–288

  24. Lacoste-Julien S, Palla K, Davies A, Kasneci G, Graepel T, Ghahramani Z (2013) Sigma: simple greedy matching for aligning large knowledge bases. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’13), pp 572–580

  25. Lin H, Wang Y, Jia Y, Xiong J, Zhang P, Cheng X (2015) An ensemble matchers based rank aggregation method for taxonomy matching. In: Proceedings of the 17th Asia-Pacific Web conference (APWeb’15), pp 190–202

  26. Lin Y, Liu Z, Luan H, Sun M, Rao S, Liu S (2015) Modeling relation paths for representation learning of knowledge bases. In: Proceedings of the 2015 conference on empirical methods in natural language processing (EMNLP’15), pp 705–714

  27. Lin Y, Liu Z, Sun M (2016) Knowledge representation learning with entities, attributes and relations. In: Proceedings of the 25th international joint conference on artificial intelligence (IJCAI’16), pp 2866–2872

  28. Lin Y, Liu Z, Sun M, Liu Y, Zhu X (2015) Learning entity and relation embeddings for knowledge graph completion. In: Proceedings of the 29th AAAI conference on artificial intelligence (AAAI’15), pp 2181–2187

  29. Marie A, Gal A (2008) Boosting schema matchers. In: Proceedings of the OTM 2008 confederated international conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part I on the move to meaningful internet systems (OTM’08), pp 283–300

  30. Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th international conference on neural information processing systems (NIPS’13), pp 3111–3119

  31. Ngo D, Bellahsene Z (2016) Overview of YAM++(not) yet another matcher for ontology alignment task. Web Semant Sci Serv Agents World Wide Web 41:30–49

    Article  Google Scholar 

  32. Ngomo A-CN, Lyko K (2013) Unsupervised learning of link specifications: Deterministic vs. non-deterministic. In: Proceedings of the 8th international conference on ontology matching-volume 1111 (OM’13), pp 25–36

  33. Nguyen DQ, Sirts K, Qu L, Johnson M (2016) Stranse: a novel embedding model of entities and relationships in knowledge bases. In: Proceedings of the 15th conference of North American chapter of the Association for Computational Linguistics: human language technologies (NAACL-HLT’16), pp 460–466

  34. Nickel M, Tresp V, Kriegel H-P (2011) A three-way model for collective learning on multi-relational data. In: Proceedings of the 28th international conference on machine learning (ICML’11), pp 809–816

  35. Nikolov A, d’Aquin M, Motta E (2012) Unsupervised learning of link discovery configuration. In: Proceedings of the 9th international conference on the Semantic Web: research and applications (ESWC’12), pp 119–133

  36. Peukert E, Massmann S, Koenig K (2010) Comparing similarity combination methods for schema matching. GI Jahrestag 1(175):692–701

    Google Scholar 

  37. Ravikumar P, Cohen WW (2004) A hierarchical graphical model for record linkage. In: Proceedings of the 20th conference on uncertainty in artificial intelligence (UAI’04), pp 454–461

  38. Saleem K, Bellahsene Z, Hunt E (2008) PORSCHE: performance oriented schema mediation. Inf Syst 33(7):637–657

    Article  Google Scholar 

  39. Sarawagi S, Bhamidipaty A (2002) Interactive deduplication using active learning. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’02), pp 269–278

  40. Suchanek FM, Abiteboul S, Senellart P (2011) Paris: probabilistic alignment of relations, instances, and schema. Proc VLDB Endow 5(3):157–168

    Article  Google Scholar 

  41. Sun Z, Hu W, Li C (2017) Cross-lingual entity alignment via joint attribute-preserving embedding, arXiv preprint. arXiv:1708.05045

  42. Tekli J, Chbeir R (2012) Minimizing user effort in xml grammar matching. Inf Sci 210:1–40

    Article  Google Scholar 

  43. Trouillon T, Welbl J, Riedel S, Gaussier E, Bouchard G (2016) Complex embeddings for simple link prediction. In: Proceedings of the 33rd international conference on machine learning (ICML’16), vol 48, pp 2071–2080

  44. Wang Z, Zhang J, Feng J, Chen Z (2014) Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the 28th AAAI conference on artificial intelligence (AAAI’14), pp 1112–1119

  45. Zhang K, Shasha D (1989) Simple fast algorithms for the editing distance between trees and related problems. SIAM J Comput 18(6):1245–1262

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work is supported by National Key Research and Development Program of China under Grants 2016YFB1000902 and 2017YFC0820404, and National Natural Science Foundation of China under Grants 61772501, 61572473, 61572469, and 91646120.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaolong Jin.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guan, S., Jin, X., Wang, Y. et al. Self-learning and embedding based entity alignment. Knowl Inf Syst 59, 361–386 (2019). https://doi.org/10.1007/s10115-018-1191-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-018-1191-0

Keywords

Navigation