Skip to main content
Log in

On entity alignment at scale

  • Special Issue Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Knowledge graph (KG), as an effective approach of organizing and storing data, has received growing attention over the last decade. A KG can hardly reach completeness since there are always a large amount of new data emerging. To increase the scale and coverage of KGs, a possible solution is to incorporate data from other KGs, and entity alignment (EA) plays a vital role during this process. EA is the task of detecting the entities that refer to the same real-world object but come from different KGs. Although a pile of approaches have been put forward to tackle this task, they are mostly evaluated on datasets in small size and cannot deal with large-scale data in practice. In this work, we study the task of EA at scale and put forward a novel solution that can manage large-scale KG pairs and meanwhile achieve promising alignment performance. First, we devise seed-oriented graph partition strategies to divide large-scale KG pairs into smaller subgraph pairs. Next, within each subgraph pair, we learn the unified entity representations using existing methods and conceive a novel reciprocal alignment inference strategy to model the bi-directional alignment interactions, which can lead to more accurate alignment results. To further improve the scalability of reciprocal alignment inference, we put forward two variant strategies that can significantly reduce the memory and time costs at the expense of a small drop of effectiveness. Our proposal is generic and can be applied to existing representation learning-based EA models to improve their capability of dealing with large-scale KG pairs. Finally, we build a new EA dataset with millions of entities and conduct detailed experiments to validate that our proposed model can effectively cope with EA at scale. We also evaluate our proposed model against state-of-the-art baselines on popular EA datasets, and the extensive experiments demonstrate its effectiveness and superiority.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. Note that the incorporation of additional information does not modify the nature of EA problem. Thus, to concentrate on the major challenges while keeping the problem simple, we focus on using only the structural information for EA.

  2. In the rest of the paper, we may use “distance” and “similarity” with obvious adaptation.

  3. Currently, in the mainstream EA benchmarks [14, 41], the number of source entities is equal to that of target entities, and the ground truth mappings conform to the 1-to-1 constraint. Nevertheless, in more general scenarios, these conditions do not necessarily hold [63].

  4. Notice that LIME is agnostic to the choice of structural learning models.

  5. The similarity measure sim is usually chosen from cosine similarity [41,42,43], Euclidean distance [8, 56, 64], or Manhattan distance [50,51,52].

  6. For tied elements, we follow the common practice and denote their ranks as the average of the ranks that would have been assigned to them.

  7. https://www.dbpedia.org/blog/dbpedia-is-now-interlinked-with-freebase-links-to-opencyc-updated/.

  8. Note that we do not compare with I-SBP, since it selects confident EA pairs to augment training data, which improves both the partition and the representation learning process. It corresponds to the semi-supervised setting in previous EA works [29, 43], which is usually not compared with the methods without the semi-supervised setting (e.g., LIME in this work) for fairness.

  9. We omit the efficiency comparison of methods using additional information since the processing of additional information is complicated and it is difficult to provide a fair assessment.

References

  1. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: Dbpedia: a nucleus for a web of open data. In: ISWC, pp. 722–735 (2007)

  2. Bollacker, K.D., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD, pp. 1247–1250 (2008)

  3. Bordes, A., Usunier, N., García-Durán, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: NIPS, pp. 2787–2795 (2013)

  4. Bourse, F., Lelarge, M., Vojnovic, M.: Balanced graph edge partition. In: KDD pp. 1456–1465. ACM (2014)

  5. Cao, Y., Liu, Z., Li, C., Liu, Z., Li, J., Chua, T.: Multi-channel graph neural network for entity alignment. In: ACL, pp. 1452–1461 (2019)

  6. Chen, J., Li, Z., Zhao, P., Liu, A., Zhao, L., Chen, Z., Zhang, X.: Learning short-term differences and long-term dependencies for entity alignment. In: ISWC, vol. 12506, pp. 92–109 (2020)

  7. Chen, M., Tian, Y., Chang, K., Skiena, S., Zaniolo, C.: Co-training embeddings of knowledge graphs and entity descriptions for cross-lingual entity alignment. In: IJCAI, pp. 3998–4004 (2018)

  8. Chen, M., Tian, Y., Yang, M., Zaniolo, C.: Multilingual knowledge graph embeddings for cross-lingual knowledge alignment. In: IJCAI, pp. 1511–1517 (2017)

  9. Christophides, V., Efthymiou, V., Palpanas, T., Papadakis, G., Stefanidis, K.: An overview of end-to-end entity resolution for big data. ACM Comput. Surv. 53(6), 127:1–127:42 (2020)

  10. Cui, W., Xiao, Y., Wang, H., Song, Y., Hwang, S., Wang, W.: KBQA: learning question answering over QA corpora and knowledge bases. Proc. VLDB Endow. 10(5), 565–576 (2017)

    Article  Google Scholar 

  11. Fey, M., Lenssen, J.E., Morris, C., Masci, J., Kriege, N.M.: Deep graph matching consensus. In: ICLR. OpenReview.net (2020)

  12. Flamino, J., Abriola, C., Zimmerman, B., Li, Z., Douglas, J.: Robust and scalable entity alignment in big data. In: IEEE Big Data, pp. 2526–2533 (2020)

  13. Ge, C., Liu, X., Chen, L., Zheng, B., Gao, Y.: Largeea: Aligning entities for large-scale knowledge graphs. CoRR arXiv:2108.05211 (2021)

  14. Guo, L., Sun, Z., Hu, W.: Learning to exploit long-term relational dependencies in knowledge graphs. In: ICML, pp. 2505–2514 (2019)

  15. Hao, Y., Zhang, Y., He, S., Liu, K., Zhao, J.: A joint embedding method for entity alignment of knowledge bases. In: CCKS, pp. 3–14 (2016)

  16. He, F., Li, Z., Yang, Q., Liu, A., Liu, G., Zhao, P., Zhao, L., Zhang, M., Chen, Z.: Unsupervised entity alignment using attribute triples and relation triples. In: DASFAA, vol. 11446, pp. 367–382. Springer (2019)

  17. Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998)

    Article  MathSciNet  Google Scholar 

  18. Karypis, G., Kumar, V.: METIS: A Software Package for Partitioning Unstructured Graphs, Partitioning Meshes, and Computing Fill-Reducing Orderings of Sparse Matrices (1997)

  19. Karypis, G., Kumar, V.: Multilevel k-way partitioning scheme for irregular graphs. J. Parallel Distrib. Comput. 48(1), 96–129 (1998)

    Article  Google Scholar 

  20. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR. OpenReview.net (2017)

  21. Lample, G., Conneau, A., Ranzato, M., Denoyer, L., Jégou, H.: Word translation without parallel data. In: ICLR. OpenReview.net (2018)

  22. Li, C., Cao, Y., Hou, L., Shi, J., Li, J., Chua, T.: Semi-supervised entity alignment via joint knowledge embedding model and cross-graph model. In: EMNLP, pp. 2723–2732. Association for Computational Linguistics (2019)

  23. Li, L., Li, T.: MEET: a generalized framework for reciprocal recommender systems. In: CIKM, pp. 35–44. ACM (2012)

  24. Liu, F., Chen, M., Roth, D., Collier, N.: Visual pivoting for (unsupervised) entity alignment. CoRR arXiv:2009.13603 (2020)

  25. Liu, Z., Cao, Y., Pan, L., Li, J., Chua, T.: Exploring and evaluating attributes, values, and structures for entity alignment. In: EMNLP, pp. 6355–6364 (2020)

  26. Liu, Z., Xiong, C., Sun, M., Liu, Z.: Entity-duet neural ranking: understanding the role of knowledge graph semantics in neural information retrieval. In: ACL, pp. 2395–2405 (2018)

  27. Mao, X., Wang, W., Wu, Y., Lan, M.: Boosting the speed of entity alignment 10 \(\times \): dual attention matching network with normalized hard sample mining. In: WWW, pp. 821–832 (2021)

  28. Mao, X., Wang, W., Xu, H., Lan, M., Wu, Y.: MRAEA: an efficient and robust entity alignment approach for cross-lingual knowledge graph. In: WSDM, pp. 420–428. ACM (2020)

  29. Mao, X., Wang, W., Xu, H., Wu, Y., Lan, M.: Relational reflection entity alignment. In: CIKM, pp. 1095–1104 (2020)

  30. Neve, J., Palomares, I.: Aggregation strategies in user-to-user reciprocal recommender systems. In: SMC, pp. 4031–4036. IEEE (2019)

  31. Nie, H., Han, X., Sun, L., Wong, C.M., Chen, Q., Wu, S., Zhang, W.: Global structure and local semantics-preserved embeddings for entity alignment. In: IJCAI, pp. 3658–3664 (2020)

  32. Palomares, I., Porcel, C., Pizzato, L.A., Guy, I., Herrera-Viedma, E.: Reciprocal recommender systems: analysis of state-of-art literature, challenges and opportunities on social recommendation. CoRR arXiv:2007.16120 (2020)

  33. Papadakis, G., Skoutas, D., Thanos, E., Palpanas, T.: Blocking and filtering techniques for entity resolution: a survey. ACM Comput. Surv. 53(2), 31:1-31:42 (2020)

    Google Scholar 

  34. Pizzato, L.A.S., Rej, T., Chung, T., Koprinska, I., Kay, J.: RECON: a reciprocal recommender for online dating. In: RecSys, pp. 207–214. ACM (2010)

  35. Qu, M., Tang, J., Bengio, Y.: Weakly-supervised knowledge graph alignment with adversarial learning. CoRR arXiv:1907.03179 (2019)

  36. Shi, X., Xiao, Y.: Modeling multi-mapping relations for precise cross-lingual entity alignment. In: EMNLP, pp. 813–822. Association for Computational Linguistics (2019)

  37. Stoyanovich, J., Howe, B., Jagadish, H.V.: Responsible data management. Proc. VLDB Endow. 13(12), 3474–3488 (2020)

    Article  Google Scholar 

  38. Suchanek, F.M., Abiteboul, S., Senellart, P.: PARIS: probabilistic alignment of relations, instances, and schema. PVLDB 5(3), 157–168 (2011)

    Google Scholar 

  39. Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: WWW, pp. 697–706 (2007)

  40. Sun, Z., Chen, M., Hu, W., Wang, C., Dai, J., Zhang, W.: Knowledge association with hyperbolic knowledge graph embeddings. In: EMNLP, pp. 5704–5716 (2020)

  41. Sun, Z., Hu, W., Li, C.: Cross-lingual entity alignment via joint attribute-preserving embedding. In: ISWC, pp. 628–644 (2017)

  42. Sun, Z., Hu, W., Zhang, Q., Qu, Y.: Bootstrapping entity alignment with knowledge graph embedding. In: IJCAI, pp. 4396–4402 (2018)

  43. Sun, Z., Huang, J., Hu, W., Chen, M., Guo, L., Qu, Y.: Transedge: translating relation-contextualized embeddings for knowledge graphs. In: ISWC, pp. 612–629 (2019)

  44. Sun, Z., Wang, C., Hu, W., Chen, M., Dai, J., Zhang, W., Qu, Y.: Knowledge graph alignment network with gated multi-hop neighborhood aggregation. In: AAAI (2020)

  45. Tang, X., Zhang, J., Chen, B., Yang, Y., Chen, H., Li, C.: BERT-INT: a bert-based interaction model for knowledge graph alignment. In: IJCAI, pp. 3174–3180 (2020)

  46. Trisedya, B.D., Qi, J., Zhang, R.: Entity alignment between knowledge graphs using attribute embeddings. In: AAAI, pp. 297–304 (2019)

  47. Trouillon, T., Welbl, J., Riedel, S., Gaussier, É., Bouchard, G.: Complex embeddings for simple link prediction. In: ICML, vol. 48, pp. 2071–2080. JMLR.org (2016)

  48. Vrandecic, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014)

    Article  Google Scholar 

  49. Wang, H., Zhang, F., Xie, X., Guo, M.: DKN: deep knowledge-aware network for news recommendation. In: WWW, pp. 1835–1844. ACM (2018)

  50. Wang, Z., Lv, Q., Lan, X., Zhang, Y.: Cross-lingual knowledge graph alignment via graph convolutional networks. In: EMNLP, pp. 349–357 (2018)

  51. Wu, Y., Liu, X., Feng, Y., Wang, Z., Yan, R., Zhao, D.: Relation-aware entity alignment for heterogeneous knowledge graphs. In: IJCAI, pp. 5278–5284 (2019)

  52. Wu, Y., Liu, X., Feng, Y., Wang, Z., Zhao, D.: Jointly learning entity and relation representations for entity alignment. In: EMNLP, pp. 240–249. Association for Computational Linguistics (2019)

  53. Wu, Y., Liu, X., Feng, Y., Wang, Z., Zhao, D.: Neighborhood matching network for entity alignment. In: ACL, pp. 6477–6487. Association for Computational Linguistics (2020)

  54. Xu, K., Song, L., Feng, Y., Song, Y., Yu, D.: Coordinated reasoning for cross-lingual knowledge graph alignment. In: AAAI, pp. 9354–9361. AAAI Press (2020)

  55. Yager, R.R., Rybalov, A.N.: Uninorm aggregation operators. Fuzzy Sets Syst. 80(1), 111–120 (1996)

  56. Yang, H., Zou, Y., Shi, P., Lu, W., Lin, J., Sun, X.: Aligning cross-lingual entities with multi-aspect information. In: EMNLP, pp. 4430–4440. Association for Computational Linguistics (2019)

  57. Yang, K., Liu, S., Zhao, J., Wang, Y., Xie, B.: COTSAE: co-training of structure and attribute embeddings for entity alignment. In: AAAI, pp. 3025–3032. AAAI Press (2020)

  58. Zeng, W., Zhao, X., Tang, J., Li, X., Luo, M., Zheng, Q.: Towards entity alignment in the open world: an unsupervised approach. In: DASFAA, vol. 12681, pp. 272–289 (2021)

  59. Zeng, W., Zhao, X., Tang, J., Lin, X.: Collective entity alignment via adaptive features. In: ICDE, pp. 1870–1873. IEEE (2020)

  60. Zeng, W., Zhao, X., Tang, J., Lin, X., Groth, P.: Reinforcement learning based collective entity alignment with adaptive features. ACM Trans. Inf. Syst. 39(3), 26:1–26:31 (2021)

  61. Zeng, W., Zhao, X., Wang, W., Tang, J., Tan, Z.: Degree-aware alignment for entities in tail. In: SIGIR, pp. 811–820. ACM (2020)

  62. Zhang, F., Liu, X., Tang, J., Dong, Y., Yao, P., Zhang, J., Gu, X., Wang, Y., Shao, B., Li, R., Wang, K.: OAG: toward linking large-scale heterogeneous entity graphs. In: SIGKDD, pp. 2585–2595 (2019)

  63. Zhao, X., Zeng, W., Tang, J., Wang, W., Suchanek, F.: An experimental study of state-of-the-art entity alignment approaches. IEEE Trans. Knowl. Data Eng. 1 (2020)

  64. Zhu, H., Xie, R., Liu, Z., Sun, M.: Iterative entity alignment via joint knowledge embeddings. In: IJCAI, pp. 4258–4264 (2017)

  65. Zhu, Q., Zhou, X., Wu, J., Tan, J., Guo, L.: Neighborhood-aware attentional representation for multilingual knowledge graphs. In: IJCAI, pp. 1943–1949 (2019)

  66. Zhu, Y., Liu, H., Wu, Z., Du, Y.: Relation-aware neighborhood matching model for entity alignment. In: Thirty-Fifth (AAAI) Conference on Artificial Intelligence, (AAAI) 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, (IAAI) 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, (EAAI) 2021, Virtual Event, February 2-9, 2021. AAAI, pp. 4749–4756 (2021)

  67. Zhuang, Y., Li, G., Zhong, Z., Feng, J.: Hike: A hybrid human-machine method for entity alignment in large-scale knowledge bases. In: CIKM, pp. 1917–1926 (2017)

Download references

Acknowledgements

This work was partially supported by NSFC under Grants Nos. 61872446 and 71971212 and NSF of Hunan Province under Grant No. 2019JJ20024.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiang Zhao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

Correctness analysis

We discuss the performance of reciprocal alignment inference and the direct inference (baseline model) under different circumstances, as mentioned in Sect. 4.4.

Case 1 For the ground-truth entity pair (uv):

$$\begin{aligned} \begin{aligned} u&= \mathop {\arg \max }\{sim({\varvec{v}},{\varvec{u}}^\prime ), u^\prime \in {\mathcal {E}}_s\}; \\ v&= \mathop {\arg \max }\{sim({\varvec{u}},{\varvec{v}}^\prime ), v^\prime \in {\mathcal {E}}_t\}, \end{aligned} \end{aligned}$$

We have already discussed this situation in Sect. 4.4, where the accurate similarity signal is provided by the representation learning process. In this case, both reciprocal alignment inference and direct alignment inference can generate the correct answer.

Case 2 For the ground-truth entity pair (uv):

$$\begin{aligned} \begin{aligned} u&= \mathop {\arg \max }\{sim({\varvec{v}},{\varvec{u}}^\prime ), u^\prime \in {\mathcal {E}}_s\}; \\ v^*&= \mathop {\arg \max }\{sim({\varvec{u}},{\varvec{v}}^\prime ), v^\prime \in {\mathcal {E}}_t\}, \quad v^*\ne v, \end{aligned} \end{aligned}$$

In this case, the direct alignment inference cannot generate the correct answer, since it only considers u’s preference and would generate \((u,v^*)\) as the answer. In comparison, reciprocal alignment inference does not necessarily generate the correct answer. This is because we only know \(p_{u,v} = 1, p_{v,u} < 1\). Given any target entity \(v^\prime \), we can derive \(p_{u,v^\prime } \le 1, p_{v^\prime ,u} \le 1\). Thus, \(r_{u,v}\le r_{u,v^\prime }\), while we cannot compare \(r_{v,u}\) and \(r_{v^\prime ,u}\). Therefore, we also cannot compare \(p_{u\leftrightarrow v} = (r_{u,v} + r_{v,u})/2\) with \(p_{u\leftrightarrow v^\prime } = (r_{u,v^\prime } + r_{v^\prime ,u})/2\) as the exact values are unknown.

Case 3 For the ground-truth entity pair (uv):

$$\begin{aligned} \begin{aligned} u^*&= \mathop {\arg \max }\{sim({\varvec{v}},{\varvec{u}}^\prime ), u^\prime \in {\mathcal {E}}_s\}, \quad u^*\ne u; \\ v&= \mathop {\arg \max }\{sim({\varvec{u}},{\varvec{v}}^\prime ), v^\prime \in {\mathcal {E}}_t\}, \end{aligned} \end{aligned}$$

In this case, the direct alignment inference can generate the correct answer since it only considers u’s preference. In comparison, reciprocal alignment inference does not necessarily generate the correct answer. This is because we only know \(p_{u,v} <1, p_{v,u} = 1\). Given any target entity \(v^\prime \), we can derive \(p_{u,v^\prime } \le 1, p_{v^\prime ,u} < 1\). Thus, \(r_{v,u}<r_{v^\prime ,u}\), while we cannot compare \(r_{u,v}\) and \(r_{u,v^\prime }\). Therefore, we also cannot compare \(p_{u\leftrightarrow v} = (r_{u,v} + r_{v,u})/2\) with \(p_{u\leftrightarrow v^\prime } = (r_{u,v^\prime } + r_{v^\prime ,u})/2\) as the exact values are unknown.

Case 4 For the ground-truth entity pair (uv):

$$\begin{aligned} \begin{aligned} u^*&= \mathop {\arg \max }\{sim({\varvec{v}},{\varvec{u}}^\prime ), u^\prime \in {\mathcal {E}}_s\}, \quad u^*\ne u; \\ v^*&= \mathop {\arg \max }\{sim({\varvec{u}},{\varvec{v}}^\prime ), v^\prime \in {\mathcal {E}}_t\}, \quad v^*\ne v, \end{aligned} \end{aligned}$$

In this case, the direct alignment inference cannot generate the correct answer, since it only considers u’s preference and would generate \((u,v^*)\) as the answer. In comparison, reciprocal alignment inference does not necessarily generate the correct answer. This is because we only know \(p_{u,v}<1, p_{v,u} < 1\). Given any target entity \(v^\prime \), we can derive \(p_{u,v^\prime } \le 1, p_{v^\prime ,u} \le 1\). However, we cannot compare \(r_{v,u}\) and \(r_{v^\prime ,u}\), or \(r_{u,v}\) and \(r_{u,v^\prime }\). Therefore, we cannot compare \(p_{u\leftrightarrow v} = (r_{u,v} + r_{v,u})/2\) with \(p_{u\leftrightarrow v^\prime } = (r_{u,v^\prime } + r_{v^\prime ,u})/2\) as the exact values are unknown.

In all, the direct alignment inference strategy can only generate the correct results under Case 1 and Case 3, while our proposed reciprocal inference strategy can generate the correct results under Case 1, and have the chances of producing correct answers in the rest of the cases. Considering that the input similarity matrix is usually of relatively low accuracy (since the representation learning models cannot fully capture the relatedness among entities), our proposal tends to attain better results than direct alignment inference, which is empirically validated in the experiment.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zeng, W., Zhao, X., Li, X. et al. On entity alignment at scale. The VLDB Journal 31, 1009–1033 (2022). https://doi.org/10.1007/s00778-021-00703-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-021-00703-3

Keywords

Navigation