Skip to main content
Log in

An enhanced privacy-preserving record linkage approach for multiple databases

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

For the purpose of research, organizations often need to share and link data that belongs to a single individual while protecting the privacy, which is referred to as privacy preserving record linkage (PPRL). Various approaches have been developed to tackle this problem, however, it is still a challenging task due to the massive amount of data, multiple data sources, and ‘dirty’ data. Therefore, in this paper, an enhanced approximate multi-party PPRL (MP-PPRL) approach is proposed to improve privacy, scalability, and linkage quality. For privacy, bloom filter (BF) is a better and more efficient masking techniques than others so far. Thus, the records are encoded into BFs to ensure privacy. However, BFs may be compromised through frequency-based attacks. To enhance privacy, a distributed protocol that introduces multiple linkage units (Multi-LUs) to resist frequency-based attacks is proposed. In scalability, we develop a blocking technique based on sorted nearest neighborhood (SNN) approach for clustering similar BFs across multiple databases, called BF-SNN, which dramatically reduces complexity. In linkage quality, a personalized threshold that varies with different levels of ‘dirty’ data is introduced, which provides a more accurate error-tolerance for ‘dirty’ data and consequently improves linkage quality. An analysis and an empirical study are conducted on large real-world datasets to show the benefit of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Data availability

Enquiries about data availability should be directed to the authors.

References

  1. Christen, P.: Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer, Berlin (2012)

    Book  Google Scholar 

  2. Vatsalan, D., Karapiperis, D., Verykios, V S.: Privacy-preserving record linkage. In: Encyclopedia of Big Data Technologies. Springer, Cham (2019)

  3. Xu, X., Xue, Y., Qi, L., et al.: An edge computing-enabled computation offloading method with privacy preservation for internet of connected vehicles. Future Gener. Comput. Syst. 96(July), 89–100 (2019)

    Article  Google Scholar 

  4. Qi, L., Zhang, X., Li, S., et al.: Spatial–temporal data-driven service recommendation with privacy-preservation. Inf. Sci. 515, 91–102 (2019)

    Article  Google Scholar 

  5. Vatsalan, D., Christen, P., Verykios, V.S.: A taxonomy of privacy-preserving record linkage techniques. Inf. Syst. 38(6), 946–969 (2013)

    Article  Google Scholar 

  6. Vatsalan, D., Sehili, Z., Christen, P., Rahm, E.: Privacy-preserving record linkage for big data: current approaches and research challenges. In: Handbook of Big Data Technologies, pp. 851–895. Springer, Cham (2017)

  7. Nóbrega, T., Pires, C., Nascimento, D.C.: Blockchain-based privacy-preserving record linkage enhancing data privacy in an untrusted environment. Inf. Syst. 102, 101826 (2021)

    Article  Google Scholar 

  8. Rohde, F., Franke, M., Sehili, Z., et al.: Optimization of the Mainzelliste software for fast privacy-preserving record linkage. J. Transl. Med. 19(1), 33 (2021)

    Article  Google Scholar 

  9. Kantarcioglu, M., Wei, J., Malin, B.: A privacy-preserving framework for integrating person-specific databases. In: UNESCO Chair in Data Privacy International Conference on Privacy in Statistical Databases, pp. 298–314 (2008)

  10. Christine, M.O., Yung, M., Gu, L.F., Rohan, B.: Privacy-preserving data linkage protocols. In: Proceedings of ACM Workshop on Privacy in the Electronic Society, pp. 94–102 (2004)

  11. Lai, P.K.Y., Yiu, S.M., Chow, K.P., Chong, C.F., Hui, L.C.K.: An efficient bloom filter based solution for multi-party private matching. In: Proceedings of the 2006 International Conference on Security and Management, 2006, pp. 286–292 (2006)

  12. Karapiperis, D., Vatsalan, D., Verykios, V.S., Christen, P.: Large-scale multi-party counting set intersection using a space efficient global synopsis. In: International Conference on Database Systems for Advanced Applications, pp. 329–345 (2015)

  13. Vatsalan, D., Christen, P.: Scalable privacy-preserving record linkage for multiple databases. In: Proceedings of the 23rd ACM International Conference on Information and Knowledge Management, pp. 1795–1798 (2014)

  14. Vatsalan, D., Christen, P., Rahm, E.: Scalable privacy-preserving linking of multiple databases using counting bloom filters. In: IEEE 16th International Conference on Data Mining Workshops, pp. 882–889 (2016)

  15. Schnell, R., Bachteler, T., Reiher, J.: Privacy-preserving record linkage using bloom filters. BMC Med. Inform. Decis. Mak. 9(1), 41 (2009)

    Article  Google Scholar 

  16. Karr, A.F., Lin, X.D., Sanil, A.P., Reiter, J.P.: Analysis of integrated data without data integration. Chance 17(3), 26–29 (2004)

    Article  MathSciNet  Google Scholar 

  17. Christen, P., Vidanage, A., Ranbaduge, T.: Pattern-mining based cryptanalysis of bloom filters for privacy-preserving record linkage. In: Pacific–Asia Conference on Knowledge Discovery and Data Mining, pp. 530–542 (2018)

  18. Vidanage, A., Ranbaduge, T., Christen P., Schnell R.: Efficient pattern mining based cryptanalysis for privacy-preserving record linkage. In: IEEE 35th International Conference on Data Engineering, 2019, pp. 1698–1701 (2019)

  19. Malaguti, E., Toth, P.: A survey on vertex coloring problems. Int. Trans. Oper. Res. 17(1), 1–34 (2010)

    Article  MathSciNet  Google Scholar 

  20. Vatsalan, D., Christen, P.: Sorted nearest neighborhood clustering for efficient private blocking. In: Advances in Knowledge Discovery and Data Mining, pp. 341–352 (2013)

  21. Vatsalan, D., Christen, P., Verykios, V.S.: Efficient two-party private blocking based on sorted nearest neighborhood clustering. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, pp. 1949–1958 (2013)

  22. Kuzu, M., Kantarcioglu, M., Inan, A., Bertino, E., Durham, E., Malin, B.: Efficient privacy-aware record integration. In: Proceedings of the 16th ACM International Conference on Extending Database Technology, pp. 167–178 (2013)

  23. Bonomi, L., Xiong, L., Chen, R., Fung, B.C.: Frequent grams based embedding for privacy preserving record linkage. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 1597–1601 (2012)

  24. Inan, A., Kantarcioglu, M., Bertino, E., Scannapieco, M.: A hybrid approach to private record linkage. In: Proceedings of the 24th IEEE International Conference on Data Engineering, pp. 496–505 (2008)

  25. Franke, M., Gladbach, M., Sehili, Z., Rohde, F., Rahm, E.: ScaDS research on scalable privacy-preserving record linkage. Datenbank-Spektrum 19(1), 31–40 (2019)

    Article  Google Scholar 

  26. Christen, P., Vatsalan, D.: Flexible and extensible generation and corruption of personal data. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, pp. 1165–1168 (2013)

  27. Ranbaduge, T., Vatsalan, D., Christen, P.: Clustering-based scalable indexing for multi-party privacy preserving record linkage. In: Pacific–Asia Conference on Knowledge Discovery and Data Mining, pp. 549–561 (2015)

  28. Ranbaduge, T., Vatsalan, D., Christen, P., Verykios, V.S.: Hashing-based distributed multi-party blocking for privacy-preserving record linkage. In: Pacific–Asia Conference on Knowledge Discovery and Data Mining, pp. 415–427 (2016)

Download references

Acknowledgements

This work is supported by the National Basic Research 973 Program of China under Grant No. 2012CB316201 and the National Natural Science Foundation of China under Grant Nos. (61472070, 61672142, U1435216, 61602103).

Funding

The authors have not disclosed any funding.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Derong Shen.

Ethics declarations

Conflict of interest

The authors have not disclosed any competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, S., Shen, D., Nie, T. et al. An enhanced privacy-preserving record linkage approach for multiple databases. Cluster Comput 25, 3641–3652 (2022). https://doi.org/10.1007/s10586-022-03590-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-022-03590-7

Keywords

Navigation