skip to main content
research-article

Efficient and Practical Approach for Private Record Linkage

Published: 01 August 2012 Publication History

Abstract

Record linkage is used to associate entities from multiple data sources. For example, two organizations contemplating a merger may want to know how common their customer bases are so that they may better assess the benefits of the merger. Another example is a database of people who are forbidden from a certain activity by regulators, may need to be compared to a list of people engaged in that activity. The autonomous entities who wish to carry out the record matching computation are often reluctant to fully share their data; they fear losing control over its subsequent dissemination and usage, or they want to insure privacy because the data is proprietary or confidential, and/or they are cautious simply because privacy laws forbid its disclosure or regulate the form of that disclosure. In such cases, the problem of carrying out the linkage computation without full data exchange has been called private record linkage. Previous private record linkage techniques have made use of a third party. We provide efficient techniques for private record linkage that improve on previous work in that (1) our techniques make no use of a third party, and (2) they achieve much better performance than previous schemes in terms of their execution time while maintaining acceptable quality of output compared to nonprivacy settings. Our protocol consists of two phases. The first phase primarily produces candidate record pairs for matching, by carrying out a very fast (but not accurate) matching between such pairs of records. The second phase is a novel protocol for efficiently computing distances between each candidate pair (without any expensive cryptographic operations such as modular exponentiations). Our experimental evaluation of our approach validates these claims.

References

[1]
Agrawal, R., Evfimievski, A., and Srikant, R. 2003. Information sharing across private databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 86--97.
[2]
Agrawal, R., Asonov, D., Kantarcioglu, M., and Li, Y. 2006. Sovereign joins. In Proceedings of the 22nd International Conference on Data Engineering (ICDE’06). IEEE Computer Society.
[3]
Al-Lawati, A., Lee, D., and McDaniel, P. 2005. Blocking-aware private record linkage. In Proceedings of the 2nd International Workshop on information Quality in Information Systems. ACM, 59--68.
[4]
Arkady, M. 2007. Data Quality Assessment. Technics Publications, LLC.
[5]
Atallah, M. J., Kerschbaum, F., and Du, W. 2003. Secure and private sequence comparisons. In Proceedings of the ACM Workshop on Privacy in the Electronic Society. 39--44.
[6]
Bachteler, T., Schnell, R., and Reiher, J. 2010. An empirical comparison of approaches to approximate string matching in private record linkage. In Proceedings of Statistics Canada Symposium, Social Statistics: The Interplay among Censuses, Surveys and Administrative Data.
[7]
Bourgain, J. 1985. On Lipschitz embedding of finite metric spaces in Hilbert space. Israel J. Math. 52, 1--2, 46--52.
[8]
Christen, P. 2011. A survey of indexing techniques for scalable record linkage and deduplication. IEEE Trans. Knowl. Data Eng. 99.
[9]
Churches, T. and Christen, P. 2004. Some methods for blindfolded record linkage. BMC Med. Inform. Decision Making 4, 9.
[10]
Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X., and Zhu, M. Y. 2002. Tools for privacy preserving data mining. SIGKDD Explor. 4, 2, 28--34.
[11]
Du, W. and Atallah, M. J. 2000. Protocols for secure remote database access with approximate matching. In Proceedings of the 1st Workshop on Security and Privacy in E-Commerce.
[12]
Du, W. and Atallah, M. J. 2001. Privacy-preserving statistical analysis. In Proceedings of the 17th Annual Computer Security Applications Conference. 102--110.
[13]
Elmagarmid, A. K., Ipeirotis, P. G., and Verykios, V. S. 2007. Duplicate record detection: A survey. IEEE Trans. Knowl. Data Eng. 19, 1, 1--16.
[14]
Emekci, F., Agrawal, D., Abbadi, A. E., and Gulbeden, A. 2006. Privacy preserving query processing using third parties. In Proceedings of the 22nd International Conference on Data Engineering (ICDE’06). IEEE Computer Society.
[15]
Fellegi, I. P. and Sunter, A. B. 1969. A theory for record linkage. J. Amer. Statist. Assoc. 64, 328, 1183--1210.
[16]
Freedman, M. J., Nissim, K., and Pinkas, B. 2004. Effcient private matching and set intersection. In Proceedings of the Annual International Conference on the Theory and Applications of Cryptographic Techniques (EUROCRYPT).
[17]
Goethals, B., Laur, S., Lipmaa, H., and Mielikinen, T. 2004. On private scalar product computation for privacy-preserving data mining. In Proceedings of the 7th Annual International Conference in Information Security and Cryptology. 104--120.
[18]
Hernandez, M. A. and Stolfo, S. J. 1998. Real-world data is dirty: Data cleansing and the merge/purge problem. Data Mining Knowl. Discov. 2, 1, 9--37.
[19]
Hjaltason, G. R. and Samet, H. 2003. Properties of embedding methods for similarity searching in metric spaces. IEEE Trans. Pattern Anal. Mach. Intell. 25, 5, 530--549.
[20]
Inan, A., Kantarcioglu, M., Bertino, E., and Scannapieco, M. 2008. A hybrid approach to private record linkage. In Proceedings of the IEEE 24th International Conference on Data Engineering (ICDE). 496--505.
[21]
Inan, A., Kantarcioglu, M., Ghinita, G., and Bertino, E. 2010. Private record matching using differential privacy. In Proceedings of the 13th International Conference on Extending Database Technology. ACM, 123--134.
[22]
Jin, L., Li, C., and Mehrotra, S. 2003. Efficient record linkage in large data sets. In Proceedings of the 8th International Conference on Database Systems for Advanced Applications (DASFAA). IEEE Computer Society, Los Alamitos, CA, 137.
[23]
Karakasidis, A. and Verykios, V. 2009. Privacy preserving record linkage using phonetic codes. In Proceedings of the 4th Balkan Conference in Informatics. IEEE, 101--106.
[24]
Kissner, L. and Song, D. 2005. Private and threshold set-intersection. Tech. rep. CMU-CS-05-113.
[25]
Koudas, N., Sarawagi, S., and Srivastava, D. 2006. Record linkage: similarity measures and algorithms. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 802--803.
[26]
Linial, N., London, E., and Rabinovich, Y. 1995. The geometry of graphs and some of its algorithmic applications. Combinatorica 15, 2, 215--245.
[27]
McCallum, A., Nigam, K., and Ungar, L. H. 2000. Efficient clustering of high-dimensional data sets with application to reference matching. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 169--178.
[28]
Monge, A. E. and Elkan, C. P. 1997. An efficient domain-independent algorithm for detecting approximately duplicate database records. In Proceedings 2nd ACM SIGMOD Workshop Research Issues in Data Mining and Knowledge Discovery (DMKD). 23--29.
[29]
Paillier, P. 1999. Public-key cryptosystems based on composite degree residuosity classes. In Proceedings of International Conference on the Theory and Application of Cryptographic Techniques (EUROCRYPT). 223--238.
[30]
Ravikumar, P. and Fienberg, S. E. 2004. A secure protocol for computing string distance metrics. In Proceedings of the 4th IEEE International Conference on Data Mining (ICDM), Workshop on Security Aspects of Data Mining (PSDM).
[31]
Ravikumar, P., Cohen, W., and Fienberg, S. E. 2004. A secure protocol for computing string distance metrics. In Proceedings of the 4th IEEE International Conference on Data Mining (ICDM), Workshop on Security Aspects of Data Mining (PSDM).
[32]
Scannapieco, M., Figotin, I., Bertino, E., and Elmagarmid, A. K. 2007. Privacy preserving schema and data matching. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 653--664.
[33]
Schneier, B. 1996. Applied Cryptography 2nd Ed. John Wiley & Sons.
[34]
Schnell, R., Bachteler, T., and Reiher, J. 2009. Privacy-preserving record linkage using bloom filters. BMC Med. Inform. Decision Making 9, 1, 41.
[35]
Smith, S. W. 1997. The Scientist and Engineer’s Guide to Digital Signal Processing. California Technical Publishing.
[36]
Sweeney, L. 2002. k-anonymity: A model for protecting privacy. Internat. J. Uncertainty, Fuzziness Knowl.-Based Syst. 10, 5, 557--570.
[37]
Yakout, M., Atallah, M. J., and Elmagarmid, A. 2009. Efficient private record linkage. In Proceedings of the 25nd International Conference on Data Engineering (ICDE). IEEE Computer Society.

Cited By

View all
  • (2020)Big Data Privacy in Biomedical ResearchIEEE Transactions on Big Data10.1109/TBDATA.2016.26088486:2(296-308)Online publication date: 1-Jun-2020
  • (2018)Privacy Preserving Probabilistic Record Linkage Without Trusted Third Party2018 16th Annual Conference on Privacy, Security and Trust (PST)10.1109/PST.2018.8514192(1-10)Online publication date: Aug-2018
  • (2018)Perfectly Secure and Efficient Two-Party Electronic-Health-Record LinkageIEEE Internet Computing10.1109/MIC.2018.11210254222:2(32-41)Online publication date: Mar-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Journal of Data and Information Quality
Journal of Data and Information Quality  Volume 3, Issue 3
August 2012
53 pages
ISSN:1936-1955
EISSN:1936-1963
DOI:10.1145/2287714
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 August 2012
Accepted: 01 April 2012
Revised: 01 March 2012
Received: 01 September 2009
Published in JDIQ Volume 3, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Record linkage
  2. integration
  3. linkage
  4. privacy
  5. private information retrieval
  6. private linkage
  7. secure scalar product

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Big Data Privacy in Biomedical ResearchIEEE Transactions on Big Data10.1109/TBDATA.2016.26088486:2(296-308)Online publication date: 1-Jun-2020
  • (2018)Privacy Preserving Probabilistic Record Linkage Without Trusted Third Party2018 16th Annual Conference on Privacy, Security and Trust (PST)10.1109/PST.2018.8514192(1-10)Online publication date: Aug-2018
  • (2018)Perfectly Secure and Efficient Two-Party Electronic-Health-Record LinkageIEEE Internet Computing10.1109/MIC.2018.11210254222:2(32-41)Online publication date: Mar-2018
  • (2015)A Model-Based Approach for Developing Data Cleansing SolutionsJournal of Data and Information Quality (JDIQ)10.1145/26415755:4(1-28)Online publication date: 2-Mar-2015
  • (2015)A Review of Privacy Preserving Mechanisms for Record LinkageMedical Data Privacy Handbook10.1007/978-3-319-23633-9_10(233-265)Online publication date: 2015
  • (2014)Record Linkage in Data WarehousingEncyclopedia of Information Science and Technology, Third Edition10.4018/978-1-4666-5888-2.ch189(1958-1967)Online publication date: 31-Jul-2014
  • (2014)Efficient protocols for private record linkageProceedings of the 29th Annual ACM Symposium on Applied Computing10.1145/2554850.2555001(1688-1694)Online publication date: 24-Mar-2014
  • (2012)An iterative two-party protocol for scalable privacy-preserving record linkageProceedings of the Tenth Australasian Data Mining Conference - Volume 13410.5555/2525373.2525389(127-138)Online publication date: 5-Dec-2012

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media