Abstract
Record linkage (RL) is the process of identifying matching records from different databases that refer to the same entity. In many applications, it is common that the attribute values of records that belong to the same entity evolve over time, for example people can change their surname or address. Therefore, to identify the records that refer to the same entity over time, RL should make use of temporal information such as the time-stamp of when a record was created and/or update last. However, if RL needs to be conducted on information about people, due to privacy and confidentiality concerns organisations are often not willing or allowed to share sensitive data in their databases, such as personal medical records or location and financial details, with other organisations. This paper proposes a scalable framework for privacy-preserving temporal record linkage that can link different databases while ensuring the privacy of sensitive data in these databases. We propose two protocols that can be used in different linkage scenarios with and without a third party. Our protocols use Bloom filter encoding which incorporates the temporal information available in records during the linkage process. Our approaches first securely calculate the probabilities of entities changing attribute values in their records over a period of time. Based on these probabilities, we then generate a set of masking Bloom filters to adjust the similarities between record pairs. We provide a theoretical analysis of the complexity and privacy of our techniques and conduct an empirical study on large real databases containing several millions of records. The experimental results show that our approaches can achieve better linkage quality compared to non-temporal PPRL while providing privacy to individuals in the databases that are being linked.
Similar content being viewed by others
References
Chiang YH, Doan A, Naughton JF (2014) Modeling entity evolution for temporal record matching. In: ACM SIGMOD, pp 1175–1186
Christen P (2012) Data matching–concepts and techniques for record linkage, entity resolution, and duplicate detection. Springer, Berlin
Christen P, Gayler RW (2013) Adaptive temporal entity resolution on dynamic databases. In: PAKDD. Springer, pp 558–569
Christen P, Vatsalan D, Wang Q (2015) Efficient entity resolution with adaptive and interactive training data selection. In: IEEE ICDM
Christen P, Schnell R, Vatsalan D, Ranbaduge T (2017a) Efficient cryptanalysis of Bloom filters for privacy-preserving record linkage. In: PAKDD
Christen V, Groß A, Fisher J, Wang Q, Christen P, Rahm E (2017b) Temporal group linkage and evolution analysis for census data. In: EDBT, pp 620–631
Clifton C, Kantarcioglu M, Vaidya J, Lin X, Zhu M (2002) Tools for privacy preserving distributed data mining. SIGKDD Explor 4(2):28–34
Durham EA, Toth C, Kuzu M, Kantarcioglu M, Xue Y, Malin B (2013) Composite Bloom filters for secure record linkage. In: TKDE
Hand D, Christen P (2018) A note on using the F-measure for evaluating record linkage algorithms. Stat Comput 28(3):539–547
Hu Y, Wang Q, Vatsalan D, Christen P (2017) Improving temporal record linkage using regression classification. In: PAKDD
Inan A, Kantarcioglu M, Ghinita G, Bertino E (2010) Private record matching using differential privacy. In: International conference on extending database technology. ACM, pp 123–134
Karakasidis A, Verykios V (2011) Secure blocking+ secure matching= secure record linkage. J Comput Sci Eng 5(3):223–235
Li F, Lee ML, Hsu W, Tan WC (2015) Linking temporal records for profiling entities. In: ACM SIGMOD, pp 593–605
Li P, Dong XL, Maurino A, Srivastava D (2011) Linking temporal records. VLDB Endowment 4(11):956–967
Lin HY, Tzeng WG (2005) An efficient solution to the Millionaires’ problem based on homomorphic encryption. In: Applied cryptography and network security. Springer, pp 456–466
Lindell Y, Pinkas B (2009) Secure multiparty computation for privacy-preserving data mining. JPC 1(1):5
Lyubashevsky V, Peikert C, Regev O (2012) On ideal lattices and learning with errors over rings. Cryptology ePrint Archive, Report 2012/230, https://eprint.iacr.org/2012/230
Naehrig M, Lauter K, Vaikuntanathan V (2011) Can homomorphic encryption be practical? In: 3rd ACM workshop on cloud computing security workshop. ACM
Paillier P (1999) Public-key cryptosystems based on composite degree residuosity classes. In: EUROCRYPT. Springer, pp 223–238
Ranbaduge T, Christen P (2018) Privacy-preserving temporal record linkage. In: IEEE ICDM, pp 1161–1171
Ranbaduge T, Vatsalan D, Christen P (2014) Tree based scalable indexing for multi-party privacy-preserving record linkage. In: AusDM, CRPIT 158. Brisbane
Ranbaduge T, Vatsalan D, Christen P (2015) Clustering-based scalable indexing for multi-party privacy-preserving record linkage. In: PAKDD’09. Springer LNAI, Vietnam
Randall S, Ferrante A, Boyd J, Semmens J (2013) The effect of data cleaning on record linkage quality. BMC Med Inform Decis Mak 13:64
Randall SM, Ferrante AM, Boyd JH, Bauer JK, Semmens JB (2014) Privacy-preserving record linkage on large real world datasets. JBI 50:205
Schnell R, Bachteler T, Reiher J (2009) Privacy-preserving record linkage using Bloom filters. BMC Med Inform Decis Mak 9:41
Sweeney L (2002) K-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10(5):557–570
Vatsalan D, Christen P (2012) An iterative two-party protocol for scalable privacy-preserving record linkage. In: AusDM, CRPIT 134. Sydney, Australia
Vatsalan D, Christen P (2013) Sorted nearest neighborhood clustering for efficient private blocking. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 341–352
Vatsalan D, Christen P (2014) Scalable privacy-preserving record linkage for multiple databases. In: ACM CIKM, pp 1795–1798
Vatsalan D, Christen P (2016) Multi-party privacy-preserving record linkage using Bloom filters. arXiv preprint arXiv:1612.08835
Vatsalan D, Christen P, Verykios V (2013a) Efficient two-party private blocking based on sorted nearest neighborhood clustering. In: ACM CIKM. San Francisco, pp 1949–1958
Vatsalan D, Christen P, Verykios VS (2013b) A taxonomy of privacy-preserving record linkage techniques. JIS 38(6):946
Vatsalan D, Sehili Z, Christen P, Rahm E (2017) Privacy-preserving record linkage for big data: current approaches and research challenges. Springer, Berlin, pp 851–895
Yakout M, Atallah M, Elmagarmid A (2009) Efficient private record linkage. In: IEEE international conference on data engineering, pp 1283–1286
Yao AC (1982) Protocols for secure computations. In: IEEE SFCS
Yasuda M, Shimoyama T, Kogure J, Yokoyama K, Koshiba T (2015) New packing method in somewhat homomorphic encryption and its applications. Secur Commun Netw 8(13):2194–2213
Acknowledgements
This work was funded by the Australian Research Council under Discovery Project DP160101934.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This paper is an extended version of the full paper published in the proceedings of the IEEE International Conference on Data Mining (ICDM) 2018 [20].
Rights and permissions
About this article
Cite this article
Ranbaduge, T., Christen, P. A scalable privacy-preserving framework for temporal record linkage. Knowl Inf Syst 62, 45–78 (2020). https://doi.org/10.1007/s10115-019-01370-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-019-01370-1