skip to main content
10.1145/2396761.2398480acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

Frequent grams based embedding for privacy preserving record linkage

Published:29 October 2012Publication History

ABSTRACT

In this paper, we study the problem of privacy preserving record linkage which aims to perform record linkage without revealing anything about the non-linked records. We propose a new secure embedding strategy based on frequent variable length grams which allows record linkage on the embedded space. The frequent grams used for constructing the embedding base are mined from the original database under the framework of differential privacy. Compared with the state-of-the-art secure matching schema [15], our approach provides formal, provable privacy guarantees and achieves better scalability while providing comparable utility.

References

  1. A. Al-Lawati, D. Lee, and P. McDaniel. Blocking-aware private record linkage. In Proceedings of the 2nd international workshop on Information quality in information systems, IQIS '05, pages 59--68, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. L. Bonomi, L. Xiong, R. Chen, and B. C. M. Fung. Privacy preserving record linkage via grams projections. eprint on arxiv.org, 2012.Google ScholarGoogle Scholar
  3. R. Chen, B. C. M. Fung, B. C. Desai, and N. M. Sossou. Differentially private transit data publication: A case study on the montreal transportation system. In Proc. of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), Beijing, China, August 2012. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. Churches and P. Christen. Some methods for blindfolded record linkage. BMC Medical Informatics and Decision Making, 4(1):9, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  5. C. Dwork. Differential privacy. In in ICALP, pages 1--12. Springer, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography, Third Theory of Cryptography Conference, TCC 2006, pages 265--284, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios. Duplicate record detection: A survey. IEEE Trans. Knowl. Data Eng., 19(1):1--16, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. L. O. Evangelista, E. Cortez, A. S. da Silva, and W. M. Jr. Adaptive and flexible blocking for record linkage tasks. JIDM, 1(2):167--182, 2010.Google ScholarGoogle Scholar
  9. G. Hjaltason and H. Samet. Properties of embedding methods for similarity searching in metric spaces. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 25(5):530--549, may 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Inan, M. Kantarcioglu, E. Bertino, and M. Scannapieco. A hybrid approach to private record linkage. In Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE '08, pages 496--505, Washington, DC, USA, 2008. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Inan, M. Kantarcioglu, G. Ghinita, and E. Bertino. Private record matching using differential privacy. In Proceedings of the 13th International Conference on Extending Database Technology, EDBT '10, pages 123--134, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. V. Levenshtein. Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Soviet Physics Doklady, 10:707, 1966.Google ScholarGoogle Scholar
  13. Y. Lindell and B. Pinkas. Secure multiparty computation for privacy-preserving data mining. Cryptology ePrint Archive, Report 2008/197, 2008.Google ScholarGoogle Scholar
  14. F. D. McSherry. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In Proceedings of the 35th SIGMOD international conference on Management of data, SIGMOD '09, pages 19--30, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Scannapieco, I. Figotin, E. Bertino, and A. K. Elmagarmid. Privacy preserving schema and data matching. In Proceedings of the 2007 ACM SIGMOD international conference on Management of data, SIGMOD '07, pages 653--664, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Schnell, T. Bachteler, and J. Reiher. Privacy-preserving record linkage using bloom filters. BMC Medical Informatics and Decision Making, 9(1):41, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  17. C. Van Rijsbergen. Information retrieval. Butterworths, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. W. Winkler. Overview of record linkage and current research directions. Technical Report Statistics#2006-2, Statistical Research Division, U.S. Bureau of the Census, 2006.Google ScholarGoogle Scholar
  19. M. Yakout, M. J. Atallah, and A. Elmagarmid. Efficient private record linkage. Data Engineering, International Conference on, 0:1283--1286, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. C.-C. Yao. How to generate and exchange secrets. In Foundations of Computer Science, 1986. 27th Annual Symposium on, pages 162--167, oct. 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Frequent grams based embedding for privacy preserving record linkage

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management
        October 2012
        2840 pages
        ISBN:9781450311564
        DOI:10.1145/2396761

        Copyright © 2012 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 29 October 2012

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • short-paper

        Acceptance Rates

        Overall Acceptance Rate1,861of8,427submissions,22%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader