Skip to main content

A Constraint Satisfaction Cryptanalysis of Bloom Filters in Private Record Linkage

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 6794))

Abstract

For over fifty years, “record linkage” procedures have been refined to integrate data in the face of typographical and semantic errors. These procedures are traditionally performed over personal identifiers (e.g., names), but in modern decentralized environments, privacy concerns have led to regulations that require the obfuscation of such attributes. Various techniques have been proposed to resolve the tension, including secure multi-party computation protocols, however, such protocols are computationally intensive and do not scale for real world linkage scenarios. More recently, procedures based on Bloom filter encoding (BFE) have gained traction in various applications, such as healthcare, where they yield highly accurate record linkage results in a reasonable amount of time. Though promising, no formal security analysis has been designed or applied to this emerging model, which is of concern considering the sensitivity of the corresponding data. In this paper, we introduce a novel attack, based on constraint satisfaction, to provide a rigorous analysis for BFE and guidelines regarding how to mitigate risk against the attack. In addition, we conduct an empirical analysis with data derived from public voter records to illustrate the feasibility of the attack. Our investigations show that the parameters of the BFE protocol can be configured to make it relatively resilient to the proposed attack without significant reduction in record linkage performance.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Elmagarmid, A., Ipeirotis, P., Verykios, V.: Duplicate record detection: a survey. IEEE Transactions on Knowledge and Data Engineering 16, 1–16 (2007)

    Article  Google Scholar 

  2. Churches, T., Christen, P.: Blind data linkage using n-gram similarity comparisons. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 121–126. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  3. Clifton, C., Kantarcioglu, M., Doan, A., Schadow, G., Vaidya, J., Elmagarmid, A., Suciu, D.: Privacy-preserving data integration and sharing. In: Proceedings of the 9th ACM SIGMOD Workshop on Data Mining and Knowledge Discovery, pp. 19–26 (2004)

    Google Scholar 

  4. Durham, E., Xue, Y., Kantarcioglu, M., Malin, B.: Private medical record linkage with approximate matching. In: Proceedings of the 2010 American Medical Informatics Association Annual Symposium, pp. 182–186 (2010)

    Google Scholar 

  5. Inan, A., Kantarcioglu, M., Bertino, E., Scannapieco, M.: A hybrid approach to private record linkage. In: Proceedings of the 24th IEEE International Conference on Data Engineering, pp. 496–505 (2008)

    Google Scholar 

  6. Verykios, V., Karakasidis, A., Mitrogiannis, V.: Privacy preserving record linkage approaches. International Journal of Data Mining, Modelling and Management 1, 206–221 (2009)

    Article  Google Scholar 

  7. Christen, P., Pudjijono, A.: Accurate synthetic generation of realistic personal information. In: Proceedings of the 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 507–514 (2009)

    Google Scholar 

  8. Hernandez, M., Stolfo, S.: Real-world data is dirty: data cleansing and the merge/purge problem. Data Mining and Knowledge Discovery 2, 9–37 (1998)

    Article  Google Scholar 

  9. Atallah, M., Kerschbaum, F., Du., W.: Secure and private sequence comparisons. In: Proceedings of the 2003 ACM Workshop on Privacy in the Electronic Society, pp. 39–44 (2003)

    Google Scholar 

  10. Feigenbaum, J., Ishai, Y., Nissim, K., Strauss, M., Wright, R.: Secure multiparty computation of approximations. ACM Transactions on Algorithms 2, 435–472 (2006)

    Article  MathSciNet  Google Scholar 

  11. Goldreich, O.: The Foundations of Cryptography, vol. 2. Cambridge University Press, Cambridge (2004)

    Google Scholar 

  12. Schnell, R., Bachteler, T., Reiher, J.: Privacy-preserving record linkage using Bloom filters. BMC Medical Informatics and Decision Making 9, 41 (2009)

    Article  Google Scholar 

  13. Lucks, M.: A constraint satisfaction algorithm for the automated decryption of simple substitution ciphers. In: Menezes, A., Vanstone, S.A. (eds.) CRYPTO 1990. LNCS, vol. 537, pp. 132–144. Springer, Heidelberg (1991)

    Google Scholar 

  14. Bloom, B.: Space/time trade-offs in hash coding with allowable errors. Communications of the ACM 13, 422–426 (1970)

    Article  MATH  Google Scholar 

  15. Quantin, C., Bouzelat, H., Allaert, F., Benhamiche, A., Faivre, J., Dusserre, L.: Automatic record hash coding and linkage for epidemiological follow-up data confidentiality. Methods of Information in Medicine 37, 271–277 (1998)

    Google Scholar 

  16. Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 2nd edn. Prentice-Hall, Englewood Cliffs (2003)

    Google Scholar 

  17. Mitzenmacher, M., Upfal, E.: Probability and computing: An introduction to randomized algorithms and probabilistic analysis. Cambridge University Press, Cambridge (2005)

    MATH  Google Scholar 

  18. Mooney, C.: Monte Carlo Simulation. Sage Publications, Thousand Oaks (1997)

    MATH  Google Scholar 

  19. Newman, M.: Power laws, pareto distributions and zipf’s law. Contemporary Physics 46, 323–351 (2005)

    Article  Google Scholar 

  20. Bessire, C., Regin, J.: Mac and combined heuristics: Two reasons to forsake fc (and cbj?) on hard problems. In: Freuder, E.C. (ed.) CP 1996. LNCS, vol. 1118, pp. 61–75. Springer, Heidelberg (1996)

    Google Scholar 

  21. North Carolina Voter Registiration Database (2011), ftp://www.app.sboe.state.nc.us/data

  22. Lakshmanan, L., Ng, R., Ramesh, G.: On disclosure risk analysis of anonymized itemsets in the presence of prior knowledge. ACM Transactions on Knowledge Discovery from Data 2, 13 (2008)

    Article  Google Scholar 

  23. Agrawal, R., Srikant, R.: Privacy preserving data mining. In: Proceedings of the 2000 ACM SIGMOD Conference on Management of Data, pp. 439–450 (2000)

    Google Scholar 

  24. Kargupta, H., Datta, S., Wang, Q., Sivakumar, K.: Random data perturbation techniques and privacy preserving data mining. Knowledge and Information Systems 7, 387–414 (2005)

    Article  Google Scholar 

  25. Chen, K., Liu, L.: Privacy preserving data classification with rotation perturbation. In: Proceedings of the 2005 IEEE Interanational Conference on Data Mining, pp. 589–592 (2005)

    Google Scholar 

  26. Pfitzmann, A.: Anonymity, unobservability, and pseudonymity - a proposal for terminology. In: Proceedings of the Privacy Enhancing Technologies Workshop, pp. 1–9 (2001)

    Google Scholar 

  27. Liu, K., Giannella, C.M., Kargupta, H.: An attacker’s view of distance preserving maps for privacy preserving data mining. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 297–308. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  28. Turgay, E.O., Pedersen, T.B., Saygın, Y., Savaş, E., Levi, A.: Disclosure risks of distance preserving data transformations. In: Ludäscher, B., Mamoulis, N. (eds.) SSDBM 2008. LNCS, vol. 5069, pp. 79–94. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  29. Diaz, C., Seys, S., Claessens, J., Preneel, B.: Towards measuring anonymity. In: Dingledine, R., Syverson, P.F. (eds.) PET 2002. LNCS, vol. 2482, pp. 54–68. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  30. Serjantov, A., Danezis, G.: Towards an information theoretic metric for anonymity. In: Dingledine, R., Syverson, P.F. (eds.) PET 2002. LNCS, vol. 2482, pp. 41–53. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  31. Deng, Y., Pang, J., Wu, P.: Measuring anonymity with relative entropy. In: Dimitrakos, T., Martinelli, F., Ryan, P.Y.A., Schneider, S. (eds.) FAST 2006. LNCS, vol. 4691, pp. 65–79. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  32. Koshy, T.: Discrete Mathematics with Applications. Elsevier, Amsterdam (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kuzu, M., Kantarcioglu, M., Durham, E., Malin, B. (2011). A Constraint Satisfaction Cryptanalysis of Bloom Filters in Private Record Linkage. In: Fischer-Hübner, S., Hopper, N. (eds) Privacy Enhancing Technologies. PETS 2011. Lecture Notes in Computer Science, vol 6794. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22263-4_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22263-4_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22262-7

  • Online ISBN: 978-3-642-22263-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics