Enhancing data utility in differential privacy via microaggregation-based $$k$$ -anonymity

Soria-Comas, Jordi; Domingo-Ferrer, Josep; Sánchez, David; Martínez, Sergio

doi:10.1007/s00778-014-0351-4

Enhancing data utility in differential privacy via microaggregation-based $k$-anonymity

Regular Paper
Published: 13 February 2014

Volume 23, pages 771–794, (2014)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Jordi Soria-Comas¹,
Josep Domingo-Ferrer¹,
David Sánchez¹ &
…
Sergio Martínez¹

3550 Accesses
3 Altmetric
Explore all metrics

Abstract

It is not uncommon in the data anonymization literature to oppose the “old” $k$-anonymity model to the “new” differential privacy model, which offers more robust privacy guarantees. Yet, it is often disregarded that the utility of the anonymized results provided by differential privacy is quite limited, due to the amount of noise that needs to be added to the output, or because utility can only be guaranteed for a restricted type of queries. This is in contrast with $k$-anonymity mechanisms, which make no assumptions on the uses of anonymized data while focusing on preserving data utility from a general perspective. In this paper, we show that a synergy between differential privacy and $k$-anonymity can be found: $k$-anonymity can help improving the utility of differentially private responses to arbitrary queries. We devote special attention to the utility improvement of differentially private published data sets. Specifically, we show that the amount of noise required to fulfill $\varepsilon $-differential privacy can be reduced if noise is added to a $k$-anonymous version of the data set, where $k$-anonymity is reached through a specially designed microaggregation of all attributes. As a result of noise reduction, the general analytical utility of the anonymized output is increased. The theoretical benefits of our proposal are illustrated in a practical setting with an empirical evaluation on three data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Utility Analysis of Differentially Private Anonymized Data Based on Random Sampling

Discrimination rate: an attribute-centric metric to measure privacy

Article 29 May 2017

Differential Privacy Mechanisms: A State-of-the-Art Survey

References

Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigraphy, R., Thomas, D., Zhu, A.: Anonymizing tables. In: Proceedings of the 10th International Conference on Database Theory-ICDT 2005, pp. 246–258 (2005)
Batet, M., Valls, A., Gibert, K.: A distance function to assess the similarity of words using ontologies. In: XV Congreso Español sobre Tecnologías y Lógica Fuzzy, Huelva, pp. 561–566. Spain (2010)
Blum, A., Ligett, K., Roth, A.: A learning theory approach to non-interactive database privacy. In: Proceedings of the 40th Annual Symposium on the Theory of Computing-STOC 2008, pp. 609–618 (2008)
Brand, R., Domingo-Ferrer, J., Mateo-Sanz, J.M.: Reference data sets to test and compare SDC methods for protection of numerical microdata. European Project IST-2000-25069 CASC. http://neon.vb.cbs.nl/casc (2002)
Charest, A.-S.: Empirical evaluation of statistical inference from differentially-private contingency tables. In: Proceedings of Privacy in Statistical Databases-PSD 2012, LNCS 7556, pp. 257–272. Springer (2012)
Charest, A.-S.: How can we analyze differentially-private synthetic data sets? J. Priv. Confident. 2(2), 21–33 (2010)
Google Scholar
Chen, R., Mohammed, N., Fung, B.C.M., Desai B.C., Xiong, L.: Publishing set-valued data via differential privacy. In: 37th International Conference on Very Large Data Bases-VLDB 2011/Proceedings of the VLDB Endowment 4(11), 1087–1098 (2011)
Clifton, C., Tassa, T.: On syntactic anonymity and differential privacy. Trans. Data Priv. 6(2), 161–183 (2013)
MathSciNet Google Scholar
Cormode, G., Procopiuc, C.M., Shen, E., Srivastava, D., Yu, T.: Differentially private spatial decompositions. In: IEEE International Conference on Data Engineering (ICDE 2012), pp. 20–31 (2012)
Cormode, G., Procopiuc, C.M., Shen, E., Srivastava, D., Yu, T.: Empirical privacy and empirical utility of anonymized data. In: ICDE Workshop on Privacy-Preserving Data Publication and Analysis (2013)
Dalenius, T.: The invasion of privacy problem and statistics production. An overview. Stat. Tidskrift 12, 213–225 (1974)
Google Scholar
Dandekar, R., Domingo-Ferrer, J., Sebé, F.: LHS-based hybrid microdata vs rank swapping and microaggregation for numeric microdata protection. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases, LNCS 2316, pp. 153–162. Springer (2002)
Domingo-Ferrer, J.: A critique of $k$-anonymity and some of its enhancements. In: Proceedings of ARES/PSAI 2008, pp. 990–993. IEEE Computer Society (2008)
Domingo-Ferrer, J.: Marginality: a numerical mapping for enhanced exploitation of taxonomic attributes. In: Proceedings of the 9th International Conference on Modeling Attributes for Artificial Intelligence-MDAI 2012, LNCS 7647, pp. 367–381. Springer (2012)
Domingo-Ferrer, J., González-Nicolás, U.: Hybrid microdata using microaggregation. Inf. Sci. 180(15), 2834–2844 (2010)
Article Google Scholar
Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans.Knowl. Data Eng. 14(1), 189–201 (2002)
Article Google Scholar
Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous $k$-anonymity through microaggregation. Data Min. Knowl. Discov. 11(2), 195–212 (2005)
Article MathSciNet Google Scholar
Domingo-Ferrer, J., Mateo-Sanz, J.M., Torra, V.: Comparing SDC methods for microdata on the basis of information loss and disclosure risk. In: Pre-Proceedings of ETK-NTTS’2001 (vol. 2), pp. 807–826. Eurostat (2001)
Domingo-Ferrer, J., Sánchez, D., Rufian-Torrell, G.: Anonymization of nominal data based on semantic marginality. Inf. Sci. 242, 35–48 (2013)
Article Google Scholar
Domingo-Ferrer, J., Sebé, F., Solanas, A.: A polynomial-time approximation to optimal multivariate microaggregation. Comput. Math. Appl. 55(4), 714–732 (2008)
Article MATH MathSciNet Google Scholar
Domingo-Ferrer, J., Martínez-Ballesté, A., Mateo-Sanz, J., Sebé, F.: Efficient multivariate data-oriented microaggregation. VLDB J. 15, 355–369 (2006)
Article Google Scholar
Dwork, C., Naor, M., Reingold, O., Rothblum G.N., Vadhan, S.: On the complexity of differentially private data release: efficient algorithms and hardness results. In: Proceedings of the 41st Annual Symposium on the Theory of Computing-STOC 2009, pp. 381–390 (2009)
Dwork, C.: Differential privacy. In: Proceedings of 33rd International Colloquium on Automata, Languages and Programming-ICALP 2006, LNCS 4052, pp. 1–12. Springer (2006)
Dwork, C.: A firm foundation for private data analysis. Commun. ACM 54(1), 86–95 (2011)
Article Google Scholar
Fellbaum, C.: WordNet: An Electronic Lexical Database (Language, Speech, and Communication). The MIT Press, Cambridge (1998)
Google Scholar
Frank, A., Asuncion, A.: UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science. http://archive.ics.uci.edu/ml/datasets/Adult (2010)
Fung, B.C.M., Wang, K., Yu., P.S.: Top-down specialization for information and privacy preservation. In: Proceedings of the 21st International Conference on Data Engineering, pp. 205–216. IEEE Computer Society (2005)
Goldberger, J., Tassa, T.: Efficient anonymizations with enhanced utility. Trans. Data Priv. 3, 149–175 (2010)
MathSciNet Google Scholar
Hardt, M., Ligett, K., McSherry, F.: A simple and practical algorithm for differentially private data release. Preprint arXiv:1012.4763 (2010)
Hay, M., Rastogi, V., Miklau, G., Suciu, D.: Boosting the accuracy of differentially private histograms through consistency. PVLDB 3(1), 1021–1032 (2010)
Google Scholar
Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Spicer, K., de Wolf, P.-P.: Statistical Disclosure Control. Wiley, London (2012)
Book Google Scholar
Laszlo, M., Mukherjee, S.: Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans. Knowl. Data Eng. 17(7), 902–911 (2005)
Article Google Scholar
Li, N., Li, T., Venkatasubramanian, S.: t-Closeness: privacy beyond k-anonymity and l-diversity. In: IEEE International Conference on Data Engineering (ICDE 2007), pp. 106–115 (2007)
Li, N., Qardaji, V., Su, D.: On sampling, anonymization, and differential privacy: Or, k -anonymization meets differential privacy. In: 7th ACM Symposium on Information, Computer and Communications, Security (ASIACCS’2012), pp. 32–33 (2012)
Li, N., Yang, W., Qardaji, W.: Differentially private grids for geospatial data. In: IEEE International Conference on Data Engineering (ICDE 2013), pp. 757–768 (2013)
Li, Y., Bandar, Z., McLean, D.: An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans. Knowl. Data Eng. 15, 871–882 (2003)
Article Google Scholar
Lin, J.-L., Wen, T.-H., Hsieh, J.-C., Chang, P.-C.: Density-based microaggregation for statistical disclosure control. Expert Syst. Appl. 37, 3256–3263 (2010)
Article Google Scholar
Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-Diversity: privacy beyond k-anonymity. In: IEEE International Conference on Data Engineering (ICDE 2006), pp. 24 (2006)
Machanavajjhala, A., Kifer, D., Abowd, J., Gehrke, J., Vilhuber, L.: Privacy: theory meets practice on the map. In: IEEE International Conference on Data Engineering (ICDE 2008), pp. 277–286 (2008)
Martínez, S., Valls, A., Sánchez, D.: Semantically-grounded construction of centroids for data sets with textual attributes. Knowl.-Based Syst. 35, 160–172 (2012)
Google Scholar
Martínez, S., Sánchez, D., Valls, A.: Semantic adaptive microaggregation of categorical microdata. Comput. Secur. 31(5), 653–672 (2012)
Article Google Scholar
McSherry, F., Talwar, K.: Mechanism design via differential privacy. In: Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science-FOCS 2007, pp. 94–103 (2007)
McSherry, F.: Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, pp. 19–30. ACM (2009)
Mohammed, N., Chen, R., Fung, B.C.M., Yu, P.S.: Differentially private data release for data mining. In: Proceedings of the 17th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining-KDD 2011, pp. 493–501. ACM (2011)
Nissim, K., Raskhodnikova, S., Smith, A.: Smooth sensitivity and sampling in private data analysis. In: Proceedings of the 39th ACM Symposium on Theory of Computing-STOC 2007, pp. 75–84. ACM (2007)
Petrakis, E.G.M., Varelas, G., Hliaoutakis, A., Raftopoulou, P.: X-similarity: computing semantic similarity between concepts from different ontologies. J. Dig. Inf. Manag. 4, 233–237 (2006)
Google Scholar
Pirró, G.: A semantic similarity metric combining features and intrinsic information content. Data Knowl. Eng. 68, 1289–1308 (2009)
Article Google Scholar
Rada, R., Mili, H., Bicknell, E., Blettner, M.: Development and application of a metric on semantic nets. IEEE Trans. Syst., Man Cybern. 19(1), 17–30 (1989)
Article Google Scholar
Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: $k$-anonymity and its enforcement through generalization and suppression. SRI International Report (1998)
Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)
Article Google Scholar
Sánchez, D., Batet, M.: Semantic similarity estimation in the biomedical domain: an ontology-based information-theoretic perspective. J. Biomed. Inform. 44, 749–759 (2011)
Article Google Scholar
Sánchez, D., Batet, M., Isern, D.: Ontology-based information content computation. Knowl. -Based Syst. 24, 297–303 (2011)
Article Google Scholar
Sánchez, D., Batet, M.: A new model to compute the information content of concepts from taxonomical knowledge. Int. J. Semant. Web Inf. Syst. 8, 34–50 (2012)
Article Google Scholar
Sánchez, D., Batet, M., Isern, D., Valls, A.: Ontology-based semantic similarity: a new feature-based approach. Expert Syst. Appl. 39(9), 7718–7728 (2012)
Article Google Scholar
Soria-Comas, J., Domingo-Ferrer, J., Sánchez, D., Martínez, S.: Improving the utility of differentially private data releases via $k$-anonymity. In: 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (IEEE Trust-Com 2013), pp. 372–379. Melbourne, Australia, July 16–18 (2013)
Sweeney, L.: $k$-Anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(5), 557–570 (2002)
Article MATH MathSciNet Google Scholar
Willenborg, L., De Waal, T.: Statistical Disclosure Control in Practice. Springer, Berlin (1996)
Book MATH Google Scholar
Wong, R., Li, J., Fu, A., Wang, K.: ($\alpha $, k)-Anonymity: an enhanced k-anonymity model for privacy preserving data publishing. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2016), pp. 754–759 (2006)
Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, pp. 133–138. Las Cruces, New Mexico (1994)
Xiao, Y., Xiong, L., Yuan, C.: Differentially private data release through multidimensional partitioning. In: Proceedings of the 7th VLDB Conference on Secure Data Management (SDM’10), pp. 150–168 (2010)
Xiao, X., Wang, G., Gehrke, J.: Differential privacy via wavelet transforms. IEEE Trans. Knowl. Data Eng. 23(8), 1200–1214 (2010)
Article Google Scholar
Xu, J., Zhang, Z., Xiao, X., Yang, Y., Yu, G.: Differentially private histogram publication. In: IEEE International Conference on Data Engineering (ICDE 2012), pp. 32–43 (2012)
Yancey, W.E., Winkler, W.E., Creecy, R.H.: Disclosure risk assessment in perturbative microdata protection. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases, LNCS 2316, pp. 135–152. Springer (2002)

Download references

Acknowledgments

This work was partly supported by the Government of Catalonia under grant 2009 SGR 1135, by the Spanish Government through projects TIN2011-27076-C03-01 “CO-PRIVACY,” TIN2012-32757 “ICWT,” IPT2012-0603-430000 “BallotNext” and CONSOLIDER INGENIO 2010 CSD2007-00004 “ARES,” and by the European Commission under FP7 projects “DwB” and “Inter-Trust.” The second author is partially supported as an ICREA Acadèmia researcher by the Government of Catalonia. The authors are with the UNESCO Chair in Data Privacy, but they are solely responsible for the views expressed in this paper, which do not necessarily reflect the position of UNESCO nor commit that organization.

Author information

Authors and Affiliations

Department of Computer Science and Mathematics, UNESCO Chair in Data Privacy, Universitat Rovira i Virgili, Av. Països Catalans 26, 43007 , Tarragona, Catalonia, Spain
Jordi Soria-Comas, Josep Domingo-Ferrer, David Sánchez & Sergio Martínez

Authors

Jordi Soria-Comas
View author publications
You can also search for this author inPubMed Google Scholar
Josep Domingo-Ferrer
View author publications
You can also search for this author inPubMed Google Scholar
David Sánchez
View author publications
You can also search for this author inPubMed Google Scholar
Sergio Martínez
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to David Sánchez.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 192 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Soria-Comas, J., Domingo-Ferrer, J., Sánchez, D. et al. Enhancing data utility in differential privacy via microaggregation-based $k$-anonymity. The VLDB Journal 23, 771–794 (2014). https://doi.org/10.1007/s00778-014-0351-4

Download citation

Received: 24 April 2013
Revised: 10 December 2013
Accepted: 18 January 2014
Published: 13 February 2014
Issue Date: October 2014
DOI: https://doi.org/10.1007/s00778-014-0351-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhancing data utility in differential privacy via microaggregation-based \(k\)-anonymity

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Utility Analysis of Differentially Private Anonymized Data Based on Random Sampling

Discrimination rate: an attribute-centric metric to measure privacy

Differential Privacy Mechanisms: A State-of-the-Art Survey

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 192 KB)

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Enhancing data utility in differential privacy via microaggregation-based \(k\)-anonymity

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Utility Analysis of Differentially Private Anonymized Data Based on Random Sampling

Discrimination rate: an attribute-centric metric to measure privacy

Differential Privacy Mechanisms: A State-of-the-Art Survey

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 192 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now