Secure and efficient anonymization of distributed confidential databases

Herranz, Javier; Nin, Jordi

doi:10.1007/s10207-014-0237-x

Secure and efficient anonymization of distributed confidential databases

Regular Contribution
Published: 23 April 2014

Volume 13, pages 497–512, (2014)
Cite this article

International Journal of Information Security Aims and scope Submit manuscript

Javier Herranz¹ &
Jordi Nin²

563 Accesses
3 Citations
Explore all metrics

Abstract

Let us consider the following situation: \(t\) entities (e.g., hospitals) hold different databases containing different records for the same type of confidential (e.g., medical) data. They want to deliver a protected version of this data to third parties (e.g., pharmaceutical researchers), preserving in some way both the utility and the privacy of the original data. This can be done by applying a statistical disclosure control (SDC) method. One possibility is that each entity protects its own database individually, but this strategy provides less utility and privacy than a collective strategy where the entities cooperate, by means of a distributed protocol, to produce a global protected dataset. In this paper, we investigate the problem of distributed protocols for SDC protection methods. We propose a simple, efficient and secure distributed protocol for the specific SDC method of rank shuffling. We run some experiments to evaluate the quality of this protocol and to compare the individual and collective strategies for solving the problem of protecting a distributed database. With respect to other distributed versions of SDC methods, the new protocol provides either more security or more efficiency, as we discuss through the paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Secure and scalable deduplication of horizontally partitioned health data for privacy-preserving distributed statistical computation

Article Open access 03 January 2017

WiP: A Distributed Approach for Statistical Disclosure Control Technologies

Distributed Shuffling for Preserving Access Confidentiality

Notes

The software is available at http://www.charm-crypto.com/.

References

Akinyele, J.A., Garman, C., Miers, I., Pagano, M.W., Rushanan, M., Green, M., Rubin, A.D.: Charm: a framework for rapidly prototyping cryptosystems. J. Cryptogr. Eng. 3(2), 111–128 (2013)
Article Google Scholar
Beimel, A., Nissim, K., Omri, E.: Distributed private data analysis: simultaneously solving How and What. In: CRYPTO’08, Volume 5157 of Lecture Notes in Computer Science, pp. 451–468. Springer (2008)
Brickell, J., Shmatikov, V.: Efficient anonymity preserving data collection. In: ACM SIGKDD, pp. 334–343. ACM Press (2006)
Bunn, P., Ostrovsky, R.: Secure two-party \(k\)-means clustering. In: Proceedings of ACM Conference on Computer and Communications Security, pp. 486–497. ACM Press (2007)
Chen, R., Mohammed, N., Fung, B.C.M., Desai, B.C., Xiong, L.: Publishing set-valued data via differential privacy. Proc. VLDB Endow. (PVLDB) 4(11), 1087–1098 (2011)
Google Scholar
Dalenius, T., Reiss, S.: Data-swapping: a technique for disclosure control. J. Stat. Plan. Inference 6, 73–85 (1982)
Article MathSciNet MATH Google Scholar
Damgard, I., Fitzi, M., Kiltz, E., Nielsen, J., Toft, T.: Unconditionally secure constant-rounds multi-party computation for equality, comparison, bits and exponentiation. In: Theory of Cryptography Conference, Volume 3876 of Lecture Notes in Computer Science, pp. 285–304. Springer (2006)
Defays, D., Anwar, M.: Micro-aggregation: a generic method. In: Proceedings of the 2nd International Seminar on Statistical Confidentiality, pp. 69–78. (1995)
Domingo-Ferrer, J., González-Nicolás, U.: Hybrid microdata using microaggregation. Inf. Sci. 180(15), 2834–2844 (2010)
Article Google Scholar
Domingo-Ferrer, J., Martínez-Ballesté, A., Mateo-Sanz, J., Sebé, F.: Efficient multivariate data-oriented microaggregation. Very Large Database J. 15, 355–369 (2006)
Article Google Scholar
Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14(1), 189–201 (2002)
Article Google Scholar
Domingo-Ferrer, J., Torra, V.: A quantitative comparison of disclosure control methods for microdata. In: Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 111–133. (2001)
Domingo-Ferrer, J., Torra, V.: Disclosure control methods and information loss for microdata. In: Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 91–110. (2001)
Domingo-Ferrer, J., Torra, V., Mateo-Sanz, J.M., Sebé, F.: Systematic measures of re-identification risk based on the probabilistic links of the partially synthetic data back to the original microdata. Tech. Rep. Cornell Univ. (2005)
Dwork, C., Kenthapadi, K., McSherry, F., Mironov, I., Naor, M.: Our data, ourselves: privacy via distributed noise generation. In: Eurocrypt’06, Volume 4004 of Lecture Notes in Computer Science, pp. 486–503. Springer (2006)
Dwork, C.: Differential privacy. In: ICALP’06 (2), Volume 4052 of Lecture Notes in Computer Science, pp. 1–12. Springer (2006)
Dwork, C.: A firm foundation for private data analysis. Commun. ACM 54(1), 86–95 (2011)
Article Google Scholar
ElGamal, T.: A public key cryptosystem and a signature scheme based on discrete logarithms. IEEE Trans. Inf. Theory 31, 469–472 (1985)
Article MathSciNet MATH Google Scholar
Gennaro, R., Jarecki, S., Krawczyk, H., Rabin, T.: Secure distributed key generation for discrete-log based cryptosystems. J. Cryptol. 20, 51–83 (2007)
Article MathSciNet MATH Google Scholar
Heer, G.: A bootstrap procedure to preserve statistical confidentiality in contingency tables. In: Proceedings of the 1st International Seminar on Statistical Confidentiality, pp. 261–71. (1993)
Herranz, J., Nin, J., Torra, V.: Distributed privacy-preserving methods for statistical disclosure control. In: Data Privacy Management and Autonomous Spontaneous Security, Volume 5939 of Lecture Notes in Computer Science, pp. 33–47. Springer (2010)
Kim, J.: A method for limiting disclosure in microdata based on random noise and transformation. In: Proceedings of the ASA Section on Survey Research, Methodology, pp. 303–308. (1986)
Laszlo, M., Mukherjee, S.: Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans. Knowl. Data Eng. 17(7), 902–911 (2005)
Article Google Scholar
Li, N., Qardaji, W., Su, D.: On sampling, anonymization, and differential privacy: or, \(k\)-anonymization meets differential privacy. In: Proceedings of the 7th ACM Symposium on Information, Computer and Communications Security (ASIACCS), pp. 32–33. (2012)
Mateo-Sanz, J.M., Domingo-Ferrer, J., Sebé, F.: Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Min. Knowl. Disc. 11(2), 181–193 (2005)
Google Scholar
Muralidhar, K., Sarathy, R.: Data shuffling- a new masking approach for numerical data. Manage. Sci. 52(2), 658–670 (2006)
Article Google Scholar
Nin, J., Herranz, J., Torra, V.: Rethinking rank swapping to decrease disclosure risk. Data Knowl. Eng. 64(1), 346–364 (2008)
Article Google Scholar
Paillier, P.: Public-key cryptosystems based on composite degree residuosity classes. In: Proceedings of Eurocrypt’99, Volume 1592 of Lecture Notes in Computer Science, pp. 223–238. Springer (1999)
Samatari, P., Sweeney, L.: Protecting privacy when disclosing information: \(k\)-anonymity and its enforcement through generalization and suppression. Tech. Rep. SRI Int. Tech. Rep. (1998)
Sarathy, R., Muralidhar, K.: Evaluating Laplace noise addition to satisfy differential privacy for numeric data. Trans. Data Priv. 4(1), 1–17 (2011)
MathSciNet Google Scholar
Soria, J., Domingo-Ferrer, J., Sánchez, D., Martínez, S.: Improving the utility of differentially private data releases via \(k\)-anonymity. In Proceedings of TrustCom/ISPA/IUCC, pp. 372–379. (2013)
Sweeney, L.: \(k\)-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)
Article MathSciNet MATH Google Scholar
Willenborg, L., de Waal, T.: Elements of Statistical Diclosure Control. In: Lecture Notes in Statistics. Springer (2001)
Zhong, S., Yang, Z., Chen, T.: \(k\)-Anonymous data collection. Inf. Sci. 179(17), 2948–2963 (2009)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

Partial support by the Spanish program CONSOLIDER-INGENIO 2010, under project ARES (CSD2007-00004) is acknowledged. Javier Herranz enjoys a Ramón y Cajal grant, partially funded by the European Social Fund (ESF), from Spanish MINECO Ministry. The work of Jordi Nin is partially supported by the Ministry of Science and Technology of Spain under contract TIN2012-34557, and by the BSC-CNS Severo Ochoa program (SEV-2011-00067).

Author information

Authors and Affiliations

Departament de Matemàtica Aplicada 4, Universitat Politècnica de Catalunya - BarcelonaTech, Campus Nord, C/Jordi Girona 1-3, 08034 , Barcelona, Spain
Javier Herranz
Barcelona Supercomputing Center -BSC, Universitat Politècnica de Catalunya - BarcelonaTech, Campus Nord, C/Jordi Girona 1-3, 08034 , Barcelona, Spain
Jordi Nin

Authors

Javier Herranz
View author publications
You can also search for this author in PubMed Google Scholar
Jordi Nin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jordi Nin.

Appendix: Distributed versions of rank swapping cannot be secure

Let us recall the argument given in [21] to show that any distributed privacy-preserving (or multiparty) version of any SDC method in the swapping family can never offer the desired level of privacy, in our scenario where confidential attributes are not modified.

Remember that swapping methods work attribute by attribute. For simplicity we consider the case where \(X\) is partitioned between \(t=2\) entities, \(P_1\) and \(P_2\). Therefore, \(P_1\) knows all the attributes of some original records, whereas \(P_2\) knows all the attributes of the rest of records. Assume now that \(P_1\) and \(P_2\) jointly apply a distributed protocol which results in \(X' = \rho (X)\), where \(\rho \) is a perturbation method which swaps pairs of values of the same attribute.

\(P_1\) knows which records in \(X'\) correspond to his original records, because the confidential attributes have not been modified. Let \(i\) be the index of one of \(P_2\)’s original records. If the protection method is secure, then \(P_1\) should not have a high probability to obtain information on the confidential attributes of the record \(i\), even after having obtained the original non-confidential attributes of this record. But, for each of these original non-confidential attributes \(x_{ij}\) of the record \(i\), entity \(P_1\) can look for it in \(X'\). With reasonably high probability, \(x_{ij}\) is now placed in a record \(i' \ne i\) of \(X'\) that corresponds to \(P_1\). If this is the case, \(P_1\) can look for his value \(x_{i'j}\) in \(X'\), which for sure will be in the record \(i\), because the applied method is a swapping one.

Once \(P_1\) has found the protected record \(i\) where \(x_{i'j}\) lies, he has re-identified the non-confidential attributes of some record belonging to \(P_2\) with the corresponding confidential attributes, breaking in this way the privacy of the system.

A simple example of this kind of attacks is illustrated in Fig. 5. Original records in boldface correspond to records of entity \(P_1\), whereas the rest of original records correspond to entity \(P_2\). Once the distributed swapping method is applied, the protected dataset \(X'\) is publicly available. \(P_1\) realizes that only one record (his first one, with 1 as the value for \(at_1\) and 4 as the value for \(at_2\)) had ’high’ as the value of attribute \(at_3\). In \(X'\), this records has 5 as the value for \(at_1\) and 6 as the value for \(at_2\), which do not correspond to any record of \(P_1\). This means that values 1 and 5 for \(at_1\) and values 4 and 6 for \(at_2\) have been swapped. As a consequence, \(P_1\) knows that the record (owned by \(P_2\)) with \(at_1=5\) has confidential attribute \(at_3=\) ‘very high’, and the record (owned by \(P_2\)) with \(at_2=6\) also has \(at_3=\) ‘very high’. Other combinations (with \(P_2\) as the attacker, as well) can be easily found.

Of course, if the values of an attribute for different records have many repetitions, this method is less effective. On the other hand, running this attack for different attributes, the success probability for the attacker \(P_1\) increases. The conclusion is that perturbation methods in the swapping family are not suitable for the scenario where the original database is distributed among several entities.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Herranz, J., Nin, J. Secure and efficient anonymization of distributed confidential databases. Int. J. Inf. Secur. 13, 497–512 (2014). https://doi.org/10.1007/s10207-014-0237-x

Download citation

Published: 23 April 2014
Issue Date: November 2014
DOI: https://doi.org/10.1007/s10207-014-0237-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Secure and efficient anonymization of distributed confidential databases

Abstract

Access this article

Similar content being viewed by others

Secure and scalable deduplication of horizontally partitioned health data for privacy-preserving distributed statistical computation

WiP: A Distributed Approach for Statistical Disclosure Control Technologies

Distributed Shuffling for Preserving Access Confidentiality

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: Distributed versions of rank swapping cannot be secure

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Secure and efficient anonymization of distributed confidential databases

Abstract

Access this article

Similar content being viewed by others

Secure and scalable deduplication of horizontally partitioned health data for privacy-preserving distributed statistical computation

WiP: A Distributed Approach for Statistical Disclosure Control Technologies

Distributed Shuffling for Preserving Access Confidentiality

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: Distributed versions of rank swapping cannot be secure

Appendix: Distributed versions of rank swapping cannot be secure

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation