Skip to main content
Log in

A scalable privacy-preserving framework for temporal record linkage

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Record linkage (RL) is the process of identifying matching records from different databases that refer to the same entity. In many applications, it is common that the attribute values of records that belong to the same entity evolve over time, for example people can change their surname or address. Therefore, to identify the records that refer to the same entity over time, RL should make use of temporal information such as the time-stamp of when a record was created and/or update last. However, if RL needs to be conducted on information about people, due to privacy and confidentiality concerns organisations are often not willing or allowed to share sensitive data in their databases, such as personal medical records or location and financial details, with other organisations. This paper proposes a scalable framework for privacy-preserving temporal record linkage that can link different databases while ensuring the privacy of sensitive data in these databases. We propose two protocols that can be used in different linkage scenarios with and without a third party. Our protocols use Bloom filter encoding which incorporates the temporal information available in records during the linkage process. Our approaches first securely calculate the probabilities of entities changing attribute values in their records over a period of time. Based on these probabilities, we then generate a set of masking Bloom filters to adjust the similarities between record pairs. We provide a theoretical analysis of the complexity and privacy of our techniques and conduct an empirical study on large real databases containing several millions of records. The experimental results show that our approaches can achieve better linkage quality compared to non-temporal PPRL while providing privacy to individuals in the databases that are being linked.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Chiang YH, Doan A, Naughton JF (2014) Modeling entity evolution for temporal record matching. In: ACM SIGMOD, pp 1175–1186

  2. Christen P (2012) Data matching–concepts and techniques for record linkage, entity resolution, and duplicate detection. Springer, Berlin

    Google Scholar 

  3. Christen P, Gayler RW (2013) Adaptive temporal entity resolution on dynamic databases. In: PAKDD. Springer, pp 558–569

  4. Christen P, Vatsalan D, Wang Q (2015) Efficient entity resolution with adaptive and interactive training data selection. In: IEEE ICDM

  5. Christen P, Schnell R, Vatsalan D, Ranbaduge T (2017a) Efficient cryptanalysis of Bloom filters for privacy-preserving record linkage. In: PAKDD

  6. Christen V, Groß A, Fisher J, Wang Q, Christen P, Rahm E (2017b) Temporal group linkage and evolution analysis for census data. In: EDBT, pp 620–631

  7. Clifton C, Kantarcioglu M, Vaidya J, Lin X, Zhu M (2002) Tools for privacy preserving distributed data mining. SIGKDD Explor 4(2):28–34

    Article  Google Scholar 

  8. Durham EA, Toth C, Kuzu M, Kantarcioglu M, Xue Y, Malin B (2013) Composite Bloom filters for secure record linkage. In: TKDE

  9. Hand D, Christen P (2018) A note on using the F-measure for evaluating record linkage algorithms. Stat Comput 28(3):539–547

    Article  MathSciNet  Google Scholar 

  10. Hu Y, Wang Q, Vatsalan D, Christen P (2017) Improving temporal record linkage using regression classification. In: PAKDD

  11. Inan A, Kantarcioglu M, Ghinita G, Bertino E (2010) Private record matching using differential privacy. In: International conference on extending database technology. ACM, pp 123–134

  12. Karakasidis A, Verykios V (2011) Secure blocking+ secure matching= secure record linkage. J Comput Sci Eng 5(3):223–235

    Article  Google Scholar 

  13. Li F, Lee ML, Hsu W, Tan WC (2015) Linking temporal records for profiling entities. In: ACM SIGMOD, pp 593–605

  14. Li P, Dong XL, Maurino A, Srivastava D (2011) Linking temporal records. VLDB Endowment 4(11):956–967

    MATH  Google Scholar 

  15. Lin HY, Tzeng WG (2005) An efficient solution to the Millionaires’ problem based on homomorphic encryption. In: Applied cryptography and network security. Springer, pp 456–466

  16. Lindell Y, Pinkas B (2009) Secure multiparty computation for privacy-preserving data mining. JPC 1(1):5

    Article  Google Scholar 

  17. Lyubashevsky V, Peikert C, Regev O (2012) On ideal lattices and learning with errors over rings. Cryptology ePrint Archive, Report 2012/230, https://eprint.iacr.org/2012/230

  18. Naehrig M, Lauter K, Vaikuntanathan V (2011) Can homomorphic encryption be practical? In: 3rd ACM workshop on cloud computing security workshop. ACM

  19. Paillier P (1999) Public-key cryptosystems based on composite degree residuosity classes. In: EUROCRYPT. Springer, pp 223–238

  20. Ranbaduge T, Christen P (2018) Privacy-preserving temporal record linkage. In: IEEE ICDM, pp 1161–1171

  21. Ranbaduge T, Vatsalan D, Christen P (2014) Tree based scalable indexing for multi-party privacy-preserving record linkage. In: AusDM, CRPIT 158. Brisbane

  22. Ranbaduge T, Vatsalan D, Christen P (2015) Clustering-based scalable indexing for multi-party privacy-preserving record linkage. In: PAKDD’09. Springer LNAI, Vietnam

    Chapter  Google Scholar 

  23. Randall S, Ferrante A, Boyd J, Semmens J (2013) The effect of data cleaning on record linkage quality. BMC Med Inform Decis Mak 13:64

    Article  Google Scholar 

  24. Randall SM, Ferrante AM, Boyd JH, Bauer JK, Semmens JB (2014) Privacy-preserving record linkage on large real world datasets. JBI 50:205

    Google Scholar 

  25. Schnell R, Bachteler T, Reiher J (2009) Privacy-preserving record linkage using Bloom filters. BMC Med Inform Decis Mak 9:41

    Article  Google Scholar 

  26. Sweeney L (2002) K-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10(5):557–570

    Article  MathSciNet  Google Scholar 

  27. Vatsalan D, Christen P (2012) An iterative two-party protocol for scalable privacy-preserving record linkage. In: AusDM, CRPIT 134. Sydney, Australia

  28. Vatsalan D, Christen P (2013) Sorted nearest neighborhood clustering for efficient private blocking. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 341–352

  29. Vatsalan D, Christen P (2014) Scalable privacy-preserving record linkage for multiple databases. In: ACM CIKM, pp 1795–1798

  30. Vatsalan D, Christen P (2016) Multi-party privacy-preserving record linkage using Bloom filters. arXiv preprint arXiv:1612.08835

  31. Vatsalan D, Christen P, Verykios V (2013a) Efficient two-party private blocking based on sorted nearest neighborhood clustering. In: ACM CIKM. San Francisco, pp 1949–1958

  32. Vatsalan D, Christen P, Verykios VS (2013b) A taxonomy of privacy-preserving record linkage techniques. JIS 38(6):946

    Google Scholar 

  33. Vatsalan D, Sehili Z, Christen P, Rahm E (2017) Privacy-preserving record linkage for big data: current approaches and research challenges. Springer, Berlin, pp 851–895

    Google Scholar 

  34. Yakout M, Atallah M, Elmagarmid A (2009) Efficient private record linkage. In: IEEE international conference on data engineering, pp 1283–1286

  35. Yao AC (1982) Protocols for secure computations. In: IEEE SFCS

  36. Yasuda M, Shimoyama T, Kogure J, Yokoyama K, Koshiba T (2015) New packing method in somewhat homomorphic encryption and its applications. Secur Commun Netw 8(13):2194–2213

    Article  Google Scholar 

Download references

Acknowledgements

This work was funded by the Australian Research Council under Discovery Project DP160101934.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thilina Ranbaduge.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This paper is an extended version of the full paper published in the proceedings of the IEEE International Conference on Data Mining (ICDM) 2018 [20].

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ranbaduge, T., Christen, P. A scalable privacy-preserving framework for temporal record linkage. Knowl Inf Syst 62, 45–78 (2020). https://doi.org/10.1007/s10115-019-01370-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-019-01370-1

Keywords

Navigation