ABSTRACT
In this paper, we study the problem of privacy preserving record linkage which aims to perform record linkage without revealing anything about the non-linked records. We propose a new secure embedding strategy based on frequent variable length grams which allows record linkage on the embedded space. The frequent grams used for constructing the embedding base are mined from the original database under the framework of differential privacy. Compared with the state-of-the-art secure matching schema [15], our approach provides formal, provable privacy guarantees and achieves better scalability while providing comparable utility.
- A. Al-Lawati, D. Lee, and P. McDaniel. Blocking-aware private record linkage. In Proceedings of the 2nd international workshop on Information quality in information systems, IQIS '05, pages 59--68, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
- L. Bonomi, L. Xiong, R. Chen, and B. C. M. Fung. Privacy preserving record linkage via grams projections. eprint on arxiv.org, 2012.Google Scholar
- R. Chen, B. C. M. Fung, B. C. Desai, and N. M. Sossou. Differentially private transit data publication: A case study on the montreal transportation system. In Proc. of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), Beijing, China, August 2012. ACM Press. Google ScholarDigital Library
- T. Churches and P. Christen. Some methods for blindfolded record linkage. BMC Medical Informatics and Decision Making, 4(1):9, 2004.Google ScholarCross Ref
- C. Dwork. Differential privacy. In in ICALP, pages 1--12. Springer, 2006. Google ScholarDigital Library
- C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography, Third Theory of Cryptography Conference, TCC 2006, pages 265--284, 2006. Google ScholarDigital Library
- A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios. Duplicate record detection: A survey. IEEE Trans. Knowl. Data Eng., 19(1):1--16, 2007. Google ScholarDigital Library
- L. O. Evangelista, E. Cortez, A. S. da Silva, and W. M. Jr. Adaptive and flexible blocking for record linkage tasks. JIDM, 1(2):167--182, 2010.Google Scholar
- G. Hjaltason and H. Samet. Properties of embedding methods for similarity searching in metric spaces. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 25(5):530--549, may 2003. Google ScholarDigital Library
- A. Inan, M. Kantarcioglu, E. Bertino, and M. Scannapieco. A hybrid approach to private record linkage. In Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE '08, pages 496--505, Washington, DC, USA, 2008. IEEE Computer Society. Google ScholarDigital Library
- A. Inan, M. Kantarcioglu, G. Ghinita, and E. Bertino. Private record matching using differential privacy. In Proceedings of the 13th International Conference on Extending Database Technology, EDBT '10, pages 123--134, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- V. Levenshtein. Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Soviet Physics Doklady, 10:707, 1966.Google Scholar
- Y. Lindell and B. Pinkas. Secure multiparty computation for privacy-preserving data mining. Cryptology ePrint Archive, Report 2008/197, 2008.Google Scholar
- F. D. McSherry. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In Proceedings of the 35th SIGMOD international conference on Management of data, SIGMOD '09, pages 19--30, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- M. Scannapieco, I. Figotin, E. Bertino, and A. K. Elmagarmid. Privacy preserving schema and data matching. In Proceedings of the 2007 ACM SIGMOD international conference on Management of data, SIGMOD '07, pages 653--664, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- R. Schnell, T. Bachteler, and J. Reiher. Privacy-preserving record linkage using bloom filters. BMC Medical Informatics and Decision Making, 9(1):41, 2009.Google ScholarCross Ref
- C. Van Rijsbergen. Information retrieval. Butterworths, 1979. Google ScholarDigital Library
- W. Winkler. Overview of record linkage and current research directions. Technical Report Statistics#2006-2, Statistical Research Division, U.S. Bureau of the Census, 2006.Google Scholar
- M. Yakout, M. J. Atallah, and A. Elmagarmid. Efficient private record linkage. Data Engineering, International Conference on, 0:1283--1286, 2009. Google ScholarDigital Library
- A. C.-C. Yao. How to generate and exchange secrets. In Foundations of Computer Science, 1986. 27th Annual Symposium on, pages 162--167, oct. 1986. Google ScholarDigital Library
Index Terms
- Frequent grams based embedding for privacy preserving record linkage
Recommendations
Scalable Privacy-Preserving Record Linkage for Multiple Databases
CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge ManagementPrivacy-preserving record linkage (PPRL) is the process of identifying records that correspond to the same real-world entities across several databases without revealing any sensitive information about these entities. Various techniques have been ...
Efficient privacy-aware record integration
EDBT '13: Proceedings of the 16th International Conference on Extending Database TechnologyThe integration of information dispersed among multiple repositories is a crucial step for accurate data analysis in various domains. In support of this goal, it is critical to devise procedures for identifying similar records across distinct data ...
Efficient and Practical Approach for Private Record Linkage
Record linkage is used to associate entities from multiple data sources. For example, two organizations contemplating a merger may want to know how common their customer bases are so that they may better assess the benefits of the merger. Another ...
Comments