Skip to main content
Log in

Secure and efficient anonymization of distributed confidential databases

  • Regular Contribution
  • Published:
International Journal of Information Security Aims and scope Submit manuscript

Abstract

Let us consider the following situation: \(t\) entities (e.g., hospitals) hold different databases containing different records for the same type of confidential (e.g., medical) data. They want to deliver a protected version of this data to third parties (e.g., pharmaceutical researchers), preserving in some way both the utility and the privacy of the original data. This can be done by applying a statistical disclosure control (SDC) method. One possibility is that each entity protects its own database individually, but this strategy provides less utility and privacy than a collective strategy where the entities cooperate, by means of a distributed protocol, to produce a global protected dataset. In this paper, we investigate the problem of distributed protocols for SDC protection methods. We propose a simple, efficient and secure distributed protocol for the specific SDC method of rank shuffling. We run some experiments to evaluate the quality of this protocol and to compare the individual and collective strategies for solving the problem of protecting a distributed database. With respect to other distributed versions of SDC methods, the new protocol provides either more security or more efficiency, as we discuss through the paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. The software is available at http://www.charm-crypto.com/.

References

  1. Akinyele, J.A., Garman, C., Miers, I., Pagano, M.W., Rushanan, M., Green, M., Rubin, A.D.: Charm: a framework for rapidly prototyping cryptosystems. J. Cryptogr. Eng. 3(2), 111–128 (2013)

    Article  Google Scholar 

  2. Beimel, A., Nissim, K., Omri, E.: Distributed private data analysis: simultaneously solving How and What. In: CRYPTO’08, Volume 5157 of Lecture Notes in Computer Science, pp. 451–468. Springer (2008)

  3. Brickell, J., Shmatikov, V.: Efficient anonymity preserving data collection. In: ACM SIGKDD, pp. 334–343. ACM Press (2006)

  4. Bunn, P., Ostrovsky, R.: Secure two-party \(k\)-means clustering. In: Proceedings of ACM Conference on Computer and Communications Security, pp. 486–497. ACM Press (2007)

  5. Chen, R., Mohammed, N., Fung, B.C.M., Desai, B.C., Xiong, L.: Publishing set-valued data via differential privacy. Proc. VLDB Endow. (PVLDB) 4(11), 1087–1098 (2011)

    Google Scholar 

  6. Dalenius, T., Reiss, S.: Data-swapping: a technique for disclosure control. J. Stat. Plan. Inference 6, 73–85 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  7. Damgard, I., Fitzi, M., Kiltz, E., Nielsen, J., Toft, T.: Unconditionally secure constant-rounds multi-party computation for equality, comparison, bits and exponentiation. In: Theory of Cryptography Conference, Volume 3876 of Lecture Notes in Computer Science, pp. 285–304. Springer (2006)

  8. Defays, D., Anwar, M.: Micro-aggregation: a generic method. In: Proceedings of the 2nd International Seminar on Statistical Confidentiality, pp. 69–78. (1995)

  9. Domingo-Ferrer, J., González-Nicolás, U.: Hybrid microdata using microaggregation. Inf. Sci. 180(15), 2834–2844 (2010)

    Article  Google Scholar 

  10. Domingo-Ferrer, J., Martínez-Ballesté, A., Mateo-Sanz, J., Sebé, F.: Efficient multivariate data-oriented microaggregation. Very Large Database J. 15, 355–369 (2006)

    Article  Google Scholar 

  11. Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14(1), 189–201 (2002)

    Article  Google Scholar 

  12. Domingo-Ferrer, J., Torra, V.: A quantitative comparison of disclosure control methods for microdata. In: Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 111–133. (2001)

  13. Domingo-Ferrer, J., Torra, V.: Disclosure control methods and information loss for microdata. In: Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 91–110. (2001)

  14. Domingo-Ferrer, J., Torra, V., Mateo-Sanz, J.M., Sebé, F.: Systematic measures of re-identification risk based on the probabilistic links of the partially synthetic data back to the original microdata. Tech. Rep. Cornell Univ. (2005)

  15. Dwork, C., Kenthapadi, K., McSherry, F., Mironov, I., Naor, M.: Our data, ourselves: privacy via distributed noise generation. In: Eurocrypt’06, Volume 4004 of Lecture Notes in Computer Science, pp. 486–503. Springer (2006)

  16. Dwork, C.: Differential privacy. In: ICALP’06 (2), Volume 4052 of Lecture Notes in Computer Science, pp. 1–12. Springer (2006)

  17. Dwork, C.: A firm foundation for private data analysis. Commun. ACM 54(1), 86–95 (2011)

    Article  Google Scholar 

  18. ElGamal, T.: A public key cryptosystem and a signature scheme based on discrete logarithms. IEEE Trans. Inf. Theory 31, 469–472 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  19. Gennaro, R., Jarecki, S., Krawczyk, H., Rabin, T.: Secure distributed key generation for discrete-log based cryptosystems. J. Cryptol. 20, 51–83 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  20. Heer, G.: A bootstrap procedure to preserve statistical confidentiality in contingency tables. In: Proceedings of the 1st International Seminar on Statistical Confidentiality, pp. 261–71. (1993)

  21. Herranz, J., Nin, J., Torra, V.: Distributed privacy-preserving methods for statistical disclosure control. In: Data Privacy Management and Autonomous Spontaneous Security, Volume 5939 of Lecture Notes in Computer Science, pp. 33–47. Springer (2010)

  22. Kim, J.: A method for limiting disclosure in microdata based on random noise and transformation. In: Proceedings of the ASA Section on Survey Research, Methodology, pp. 303–308. (1986)

  23. Laszlo, M., Mukherjee, S.: Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans. Knowl. Data Eng. 17(7), 902–911 (2005)

    Article  Google Scholar 

  24. Li, N., Qardaji, W., Su, D.: On sampling, anonymization, and differential privacy: or, \(k\)-anonymization meets differential privacy. In: Proceedings of the 7th ACM Symposium on Information, Computer and Communications Security (ASIACCS), pp. 32–33. (2012)

  25. Mateo-Sanz, J.M., Domingo-Ferrer, J., Sebé, F.: Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Min. Knowl. Disc. 11(2), 181–193 (2005)

    Google Scholar 

  26. Muralidhar, K., Sarathy, R.: Data shuffling- a new masking approach for numerical data. Manage. Sci. 52(2), 658–670 (2006)

    Article  Google Scholar 

  27. Nin, J., Herranz, J., Torra, V.: Rethinking rank swapping to decrease disclosure risk. Data Knowl. Eng. 64(1), 346–364 (2008)

    Article  Google Scholar 

  28. Paillier, P.: Public-key cryptosystems based on composite degree residuosity classes. In: Proceedings of Eurocrypt’99, Volume 1592 of Lecture Notes in Computer Science, pp. 223–238. Springer (1999)

  29. Samatari, P., Sweeney, L.: Protecting privacy when disclosing information: \(k\)-anonymity and its enforcement through generalization and suppression. Tech. Rep. SRI Int. Tech. Rep. (1998)

  30. Sarathy, R., Muralidhar, K.: Evaluating Laplace noise addition to satisfy differential privacy for numeric data. Trans. Data Priv. 4(1), 1–17 (2011)

    MathSciNet  Google Scholar 

  31. Soria, J., Domingo-Ferrer, J., Sánchez, D., Martínez, S.: Improving the utility of differentially private data releases via \(k\)-anonymity. In Proceedings of TrustCom/ISPA/IUCC, pp. 372–379. (2013)

  32. Sweeney, L.: \(k\)-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  33. Willenborg, L., de Waal, T.: Elements of Statistical Diclosure Control. In: Lecture Notes in Statistics. Springer (2001)

  34. Zhong, S., Yang, Z., Chen, T.: \(k\)-Anonymous data collection. Inf. Sci. 179(17), 2948–2963 (2009)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

Partial support by the Spanish program CONSOLIDER-INGENIO 2010, under project ARES (CSD2007-00004) is acknowledged. Javier Herranz enjoys a Ramón y Cajal grant, partially funded by the European Social Fund (ESF), from Spanish MINECO Ministry. The work of Jordi Nin is partially supported by the Ministry of Science and Technology of Spain under contract TIN2012-34557, and by the BSC-CNS Severo Ochoa program (SEV-2011-00067).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jordi Nin.

Appendix: Distributed versions of rank swapping cannot be secure

Appendix: Distributed versions of rank swapping cannot be secure

Let us recall the argument given in [21] to show that any distributed privacy-preserving (or multiparty) version of any SDC method in the swapping family can never offer the desired level of privacy, in our scenario where confidential attributes are not modified.

Remember that swapping methods work attribute by attribute. For simplicity we consider the case where \(X\) is partitioned between \(t=2\) entities, \(P_1\) and \(P_2\). Therefore, \(P_1\) knows all the attributes of some original records, whereas \(P_2\) knows all the attributes of the rest of records. Assume now that \(P_1\) and \(P_2\) jointly apply a distributed protocol which results in \(X' = \rho (X)\), where \(\rho \) is a perturbation method which swaps pairs of values of the same attribute.

\(P_1\) knows which records in \(X'\) correspond to his original records, because the confidential attributes have not been modified. Let \(i\) be the index of one of \(P_2\)’s original records. If the protection method is secure, then \(P_1\) should not have a high probability to obtain information on the confidential attributes of the record \(i\), even after having obtained the original non-confidential attributes of this record. But, for each of these original non-confidential attributes \(x_{ij}\) of the record \(i\), entity \(P_1\) can look for it in \(X'\). With reasonably high probability, \(x_{ij}\) is now placed in a record \(i' \ne i\) of \(X'\) that corresponds to \(P_1\). If this is the case, \(P_1\) can look for his value \(x_{i'j}\) in \(X'\), which for sure will be in the record \(i\), because the applied method is a swapping one.

Once \(P_1\) has found the protected record \(i\) where \(x_{i'j}\) lies, he has re-identified the non-confidential attributes of some record belonging to \(P_2\) with the corresponding confidential attributes, breaking in this way the privacy of the system.

A simple example of this kind of attacks is illustrated in Fig. 5. Original records in boldface correspond to records of entity \(P_1\), whereas the rest of original records correspond to entity \(P_2\). Once the distributed swapping method is applied, the protected dataset \(X'\) is publicly available. \(P_1\) realizes that only one record (his first one, with 1 as the value for \(at_1\) and 4 as the value for \(at_2\)) had ’high’ as the value of attribute \(at_3\). In \(X'\), this records has 5 as the value for \(at_1\) and 6 as the value for \(at_2\), which do not correspond to any record of \(P_1\). This means that values 1 and 5 for \(at_1\) and values 4 and 6 for \(at_2\) have been swapped. As a consequence, \(P_1\) knows that the record (owned by \(P_2\)) with \(at_1=5\) has confidential attribute \(at_3=\) ‘very high’, and the record (owned by \(P_2\)) with \(at_2=6\) also has \(at_3=\) ‘very high’. Other combinations (with \(P_2\) as the attacker, as well) can be easily found.

Fig. 5
figure 5

Example: insecurity of distributed rank swapping

Of course, if the values of an attribute for different records have many repetitions, this method is less effective. On the other hand, running this attack for different attributes, the success probability for the attacker \(P_1\) increases. The conclusion is that perturbation methods in the swapping family are not suitable for the scenario where the original database is distributed among several entities.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Herranz, J., Nin, J. Secure and efficient anonymization of distributed confidential databases. Int. J. Inf. Secur. 13, 497–512 (2014). https://doi.org/10.1007/s10207-014-0237-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10207-014-0237-x

Keywords

Navigation