Abstract
For the purpose of research, organizations often need to share and link data that belongs to a single individual while protecting the privacy, which is referred to as privacy preserving record linkage (PPRL). Various approaches have been developed to tackle this problem, however, it is still a challenging task due to the massive amount of data, multiple data sources, and ‘dirty’ data. Therefore, in this paper, an enhanced approximate multi-party PPRL (MP-PPRL) approach is proposed to improve privacy, scalability, and linkage quality. For privacy, bloom filter (BF) is a better and more efficient masking techniques than others so far. Thus, the records are encoded into BFs to ensure privacy. However, BFs may be compromised through frequency-based attacks. To enhance privacy, a distributed protocol that introduces multiple linkage units (Multi-LUs) to resist frequency-based attacks is proposed. In scalability, we develop a blocking technique based on sorted nearest neighborhood (SNN) approach for clustering similar BFs across multiple databases, called BF-SNN, which dramatically reduces complexity. In linkage quality, a personalized threshold that varies with different levels of ‘dirty’ data is introduced, which provides a more accurate error-tolerance for ‘dirty’ data and consequently improves linkage quality. An analysis and an empirical study are conducted on large real-world datasets to show the benefit of the proposed approach.
Similar content being viewed by others
Data availability
Enquiries about data availability should be directed to the authors.
References
Christen, P.: Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer, Berlin (2012)
Vatsalan, D., Karapiperis, D., Verykios, V S.: Privacy-preserving record linkage. In: Encyclopedia of Big Data Technologies. Springer, Cham (2019)
Xu, X., Xue, Y., Qi, L., et al.: An edge computing-enabled computation offloading method with privacy preservation for internet of connected vehicles. Future Gener. Comput. Syst. 96(July), 89–100 (2019)
Qi, L., Zhang, X., Li, S., et al.: Spatial–temporal data-driven service recommendation with privacy-preservation. Inf. Sci. 515, 91–102 (2019)
Vatsalan, D., Christen, P., Verykios, V.S.: A taxonomy of privacy-preserving record linkage techniques. Inf. Syst. 38(6), 946–969 (2013)
Vatsalan, D., Sehili, Z., Christen, P., Rahm, E.: Privacy-preserving record linkage for big data: current approaches and research challenges. In: Handbook of Big Data Technologies, pp. 851–895. Springer, Cham (2017)
Nóbrega, T., Pires, C., Nascimento, D.C.: Blockchain-based privacy-preserving record linkage enhancing data privacy in an untrusted environment. Inf. Syst. 102, 101826 (2021)
Rohde, F., Franke, M., Sehili, Z., et al.: Optimization of the Mainzelliste software for fast privacy-preserving record linkage. J. Transl. Med. 19(1), 33 (2021)
Kantarcioglu, M., Wei, J., Malin, B.: A privacy-preserving framework for integrating person-specific databases. In: UNESCO Chair in Data Privacy International Conference on Privacy in Statistical Databases, pp. 298–314 (2008)
Christine, M.O., Yung, M., Gu, L.F., Rohan, B.: Privacy-preserving data linkage protocols. In: Proceedings of ACM Workshop on Privacy in the Electronic Society, pp. 94–102 (2004)
Lai, P.K.Y., Yiu, S.M., Chow, K.P., Chong, C.F., Hui, L.C.K.: An efficient bloom filter based solution for multi-party private matching. In: Proceedings of the 2006 International Conference on Security and Management, 2006, pp. 286–292 (2006)
Karapiperis, D., Vatsalan, D., Verykios, V.S., Christen, P.: Large-scale multi-party counting set intersection using a space efficient global synopsis. In: International Conference on Database Systems for Advanced Applications, pp. 329–345 (2015)
Vatsalan, D., Christen, P.: Scalable privacy-preserving record linkage for multiple databases. In: Proceedings of the 23rd ACM International Conference on Information and Knowledge Management, pp. 1795–1798 (2014)
Vatsalan, D., Christen, P., Rahm, E.: Scalable privacy-preserving linking of multiple databases using counting bloom filters. In: IEEE 16th International Conference on Data Mining Workshops, pp. 882–889 (2016)
Schnell, R., Bachteler, T., Reiher, J.: Privacy-preserving record linkage using bloom filters. BMC Med. Inform. Decis. Mak. 9(1), 41 (2009)
Karr, A.F., Lin, X.D., Sanil, A.P., Reiter, J.P.: Analysis of integrated data without data integration. Chance 17(3), 26–29 (2004)
Christen, P., Vidanage, A., Ranbaduge, T.: Pattern-mining based cryptanalysis of bloom filters for privacy-preserving record linkage. In: Pacific–Asia Conference on Knowledge Discovery and Data Mining, pp. 530–542 (2018)
Vidanage, A., Ranbaduge, T., Christen P., Schnell R.: Efficient pattern mining based cryptanalysis for privacy-preserving record linkage. In: IEEE 35th International Conference on Data Engineering, 2019, pp. 1698–1701 (2019)
Malaguti, E., Toth, P.: A survey on vertex coloring problems. Int. Trans. Oper. Res. 17(1), 1–34 (2010)
Vatsalan, D., Christen, P.: Sorted nearest neighborhood clustering for efficient private blocking. In: Advances in Knowledge Discovery and Data Mining, pp. 341–352 (2013)
Vatsalan, D., Christen, P., Verykios, V.S.: Efficient two-party private blocking based on sorted nearest neighborhood clustering. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, pp. 1949–1958 (2013)
Kuzu, M., Kantarcioglu, M., Inan, A., Bertino, E., Durham, E., Malin, B.: Efficient privacy-aware record integration. In: Proceedings of the 16th ACM International Conference on Extending Database Technology, pp. 167–178 (2013)
Bonomi, L., Xiong, L., Chen, R., Fung, B.C.: Frequent grams based embedding for privacy preserving record linkage. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 1597–1601 (2012)
Inan, A., Kantarcioglu, M., Bertino, E., Scannapieco, M.: A hybrid approach to private record linkage. In: Proceedings of the 24th IEEE International Conference on Data Engineering, pp. 496–505 (2008)
Franke, M., Gladbach, M., Sehili, Z., Rohde, F., Rahm, E.: ScaDS research on scalable privacy-preserving record linkage. Datenbank-Spektrum 19(1), 31–40 (2019)
Christen, P., Vatsalan, D.: Flexible and extensible generation and corruption of personal data. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, pp. 1165–1168 (2013)
Ranbaduge, T., Vatsalan, D., Christen, P.: Clustering-based scalable indexing for multi-party privacy preserving record linkage. In: Pacific–Asia Conference on Knowledge Discovery and Data Mining, pp. 549–561 (2015)
Ranbaduge, T., Vatsalan, D., Christen, P., Verykios, V.S.: Hashing-based distributed multi-party blocking for privacy-preserving record linkage. In: Pacific–Asia Conference on Knowledge Discovery and Data Mining, pp. 415–427 (2016)
Acknowledgements
This work is supported by the National Basic Research 973 Program of China under Grant No. 2012CB316201 and the National Natural Science Foundation of China under Grant Nos. (61472070, 61672142, U1435216, 61602103).
Funding
The authors have not disclosed any funding.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have not disclosed any competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Han, S., Shen, D., Nie, T. et al. An enhanced privacy-preserving record linkage approach for multiple databases. Cluster Comput 25, 3641–3652 (2022). https://doi.org/10.1007/s10586-022-03590-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-022-03590-7