Skip to main content

Using Knowledge Graphs for Record Linkage: Challenges and Opportunities

  • Conference paper
  • First Online:
Advanced Information Systems Engineering Workshops (CAiSE 2023)

Abstract

In this paper, we explore how Knowledge Graphs (KGs) can potentially benefit Record Linkage (RL). RL is the process of identifying and resolving duplicate records across different data sources, including structured, semi-structured, and unstructured data (e.g., in data lakes). RL is a critical task for information systems that rely on data to make decisions and is used in a wide variety of fields such as healthcare, finance, government and marketing. Due to recent advances in machine learning, there has been a significant progress in building automated RL methods. However, when dealing with vertical applications, featuring specialized domains such as a particular hospital or industry, human experts are still required to enter domain-specific knowledge, making RL prohibitively expensive. Despite KGs can be powerful tools to represent and derive domain-specific knowledge, their application to RL has been overlooked. Inspired by a healthcare case study in the Republic of Cyprus, we aim at filling this gap by identifying challenges and opportunities of using KGs to reduce the effort of solving RL in vertical applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ebraheem, M., Thirumuruganathan, S., Joty, S., Ouzzani, M., Tang, N.: Distributed representations of tuples for entity resolution. PVLDB 11(11), 1454–1467 (2018)

    Google Scholar 

  2. Gautam, B., Terrades, O.R., Pujadas-Mora, J.M., Valls, M.: Knowledge graph based methods for record linkage. Pattern Recogn. Lett. 136, 127–133 (2020)

    Article  Google Scholar 

  3. Ji, S., et al.: A survey on knowledge graphs: representation, acquisition, and applications. IEEE Trans. Neural Networks Learn. Syst. (2021)

    Google Scholar 

  4. Kannan, A.V., et al.: Multimodal knowledge graph for deep learning papers and code. In: CIKM, pp. 3417–3420 (2020)

    Google Scholar 

  5. Li, P., et al.: Linking temporal records. PVLDB 4(11), 956–967 (2011)

    Google Scholar 

  6. Li, Y., Li, J., Suhara, Y., Doan, A., Tan, W.C.: Deep entity matching with pre-trained language models. PVLDB 14(1), 50–60 (2020)

    Google Scholar 

  7. Maccioni, A., Torlone, R.: KAYAK: a framework for just-in-time data preparation in a data lake. In: Krogstie, J., Reijers, H.A. (eds.) CAiSE 2018. LNCS, vol. 10816, pp. 474–489. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91563-0_29

    Chapter  Google Scholar 

  8. Mousselly-Sergieh, H., Botschen, T., Gurevych, I., Roth, S.: A multimodal translation-based approach for knowledge graph representation learning. In: *SEM, pp. 225–234 (2018)

    Google Scholar 

  9. Obraczka, D., Schuchart, J., Rahm, E.: Embedding-assisted entity resolution for knowledge graphs. In: Second International Workshop on Knowledge Graph Construction (2021)

    Google Scholar 

  10. Pujara, J., Getoor, L.: Generic statistical relational entity resolution in knowledge graphs. arXiv preprint arXiv:1607.00992 (2016)

  11. Saeedi, A., Peukert, E., Rahm, E.: Incremental multi-source entity resolution for knowledge graph completion. In: Harth, A., Kirrane, S., Ngonga Ngomo, A.-C., Paulheim, H., Rula, A., Gentile, A.L., Haase, P., Cochez, M. (eds.) ESWC 2020. LNCS, vol. 12123, pp. 393–408. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49461-2_23

    Chapter  Google Scholar 

  12. Sarkhel, R., Nandi, A.: Cross-modal entity matching for visually rich documents. arXiv preprint arXiv:2303.00720 (2023)

  13. Steorts, R.C.: Entity resolution with empirically motivated priors. Bayesian Anal. 10(4), 849–875 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  14. Sun, Z., Vashishth, S., Sanyal, S., Talukdar, P., Yang, Y.: A re-evaluation of knowledge graph completion methods. In: ACL, pp. 5516–5522 (2020)

    Google Scholar 

  15. Teofili, T., Firmani, D., Koudas, N., Martello, V., Merialdo, P., Srivastava, D.: Effective explanations for entity resolution models. In: ICDE, pp. 2709–2721. IEEE (2022)

    Google Scholar 

Download references

Acknowledgments

This work was partly supported by the SEED PNR 2021 grant FLOWER, Sapienza Research Project B83C22007180001, the European Union Next-Generation EU (PIANO NAZIONALE DI RIPRESA E RESILIENZA (PNRR) – MISSIONE 4 COMPONENTE 2 initiative “Future Artificial Intelligence Research” – FAIR and the Horizon 2020 project 857420 DESTINI. Jerin George Mathew is financed by the Italian National PhD Program in AI.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jerin George Mathew .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Andreou, A.S., Firmani, D., Mathew, J.G., Mecella, M., Pingos, M. (2023). Using Knowledge Graphs for Record Linkage: Challenges and Opportunities. In: Ruiz, M., Soffer, P. (eds) Advanced Information Systems Engineering Workshops. CAiSE 2023. Lecture Notes in Business Information Processing, vol 482. Springer, Cham. https://doi.org/10.1007/978-3-031-34985-0_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-34985-0_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-34984-3

  • Online ISBN: 978-3-031-34985-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics