Managing dimensionality in data privacy anonymization

Zakerzadeh, Hessam; Aggarwal, Charu C.; Barker, Ken

doi:10.1007/s10115-015-0906-8

Managing dimensionality in data privacy anonymization

Regular Paper
Published: 11 December 2015

Volume 49, pages 341–373, (2016)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Hessam Zakerzadeh¹,
Charu C. Aggarwal² &
Ken Barker¹

759 Accesses
2 Altmetric
Explore all metrics

Abstract

The curse of dimensionality has remained a challenge for a wide variety of algorithms in data mining, clustering, classification, and privacy. Recently, it was shown that an increasing dimensionality makes the data resistant to effective privacy. The theoretical results seem to suggest that the dimensionality curse is a fundamental barrier to privacy preservation. However, in practice, we show that some of the common properties of real data can be leveraged in order to greatly ameliorate the negative effects of the curse of dimensionality. In real data sets, many dimensions contain high levels of inter-attribute correlations. Such correlations enable the use of a process known as vertical fragmentation in order to decompose the data into vertical subsets of smaller dimensionality. An information-theoretic criterion of mutual information is used in the vertical decomposition process. This allows the use of an anonymization process, which is based on combining results from multiple independent fragments. We present a general approach, which can be applied to the k-anonymity, $\ell $-diversity, and t-closeness models. In the presence of inter-attribute correlations, such an approach continues to be much more robust in higher dimensionality, without losing accuracy. We present experimental results illustrating the effectiveness of the approach. This approach is resilient enough to prevent identity, attribute, and membership disclosure attack.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

Although the works in [29, 30] are originally proposed for the set-valued (transaction) data, the relational data can be transformed to set-valuded data for anonymization.
J48 is an open source implementation of C4.5 in Java, http://weka.sourceforge.net/doc/weka/classifiers/trees/J48.html.
http://weka.sourceforge.net/doc/weka/classifiers/lazy/IBk.html.
Recall that fragmentation is the first step in the k-anonymity and step 3 in the $\ell $-diversity algorithm.
Here, the power of an attacker means the number of quasi-identifier that he/she is aware of.
The work in [12] only considers the $\ell $-diversity model, so it is not suitable for comparison with our work.
http://archive.ics.uci.edu/ml.
There are 7 classes. However, there exist instances for only 6 classes in this training sample.
This data set has only 36 features.
We adopt distinct $\ell $-diversity in our experiments.
Note that this metric is independent of anonymity degree (k or $\ell $) because it is calculated before anonymization algorithm is applied on the data.

References

Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: SIGMOD
Aggarwal CC (2005) On $k$-anonymity and the curse of dimensionality. In: VLDB
Aggarwal CC (2007) On randomization, public information, and the curse of dimensionality. In: ICDE
Aggarwal CC (2008) Privacy and the dimensionality curse. In: Aggarwal C, Yu PS (eds) Privacy preserving data mining: models and algorithms. Springer, Berlin
Agrawal S, Haritsa J (2005) A framework for high accuracy privacy-preserving data mining. In: ICDE
Aggarwal CC, Yu PS (2008) Privacy preserving data mining: models and algorithms. Springer, Berlin
Book Google Scholar
Chow C-Y, Mokbel MF (2011) Trajectory privacy in location-based services and data publication. ACM SIGKDD Explor Newsl 13(1):19–29
Article Google Scholar
Ciriani V, Capitani Di Vimercati S, Foresti S, Jajodia S, Paraboschi S, Samarati P (2010) Combining fragmentation and encryption to protect privacy in data storage. ACM Trans Inf Syst Secur 13(3):1–33
Article Google Scholar
Ding C, Peng H (2003) Minimum redundancy feature selection from microarray gene expression data. In: CSB
Ganapathy V, Thomas D, Feder T, Garcia-Molina H, Motwani R (2011) Distributing data for secure database services. In: PAIS workshop
Ghinita G, Karras P, Kalnis P, Mamoulis N (2007) Fast data anonymization with low information loss. In: VLDB
Ghinita G, Tao Y, Kalnis P (2008) On the anonymization of sparse high-dimensional data. In: ICDE
Iyengar V (2002) Transforming data to satisfy privacy constraints. In: KDD
Kifer D (2009) Attacks on privacy and deFinetti’s theorem. In: SIGMOD
LeFevre K, DeWitt DJ, Ramakrishnan R (2006) Mondrian multidimensional $k$-anonymity. In: ICDE
LeFevre K, DeWitt DJ, Ramakrishnan R (2006) Workload-aware anonymization. In: KDD
LeFevre K, DeWitt D, Ramakrishnan R (2005) Incognito: efficient full-domain $k$-anonymity. In: SIGMOD
Li F, Sun J, Papadimitriou S, Mihaila G, Stanoi I (2007) Hiding in the crowd: privacy preservation on evolving streams through correlation tracking. In: ICDE
Li N, Li T, Venkatasubramaniam S (2007) $t$-diversity. In: ICDE
Li T, Li N, Zhang J, Molloy I (2012) Slicing: a new approach for privacy preserving data publishing. IEEE Trans Knowl Data Eng 24(3):561–574
Article Google Scholar
Liu H, Motoda H (2007) Computational methods for feature selection. Chapman and Hall/CRC, London (data mining and knowledge discovery series)
MATH Google Scholar
Machanvajjhala A, Gehrke J, Kifer D, Venkitasubramaniam M (2006) $\ell $-anonymity. In: ICDE
Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Article Google Scholar
Samarati P (2001) Protecting Respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13(6):1010–1027
Article Google Scholar
Vaidya J, Clifton C (2002) Privacy-preserving association rule mining in vertically partitioned data. In: KDD
Xiao X, Tao Y (2006) Anatomy: simple and effective privacy preservation. In: VLDB
Wong W, Mamoulis N, Cheung D (2010) Non-homogeneous generalization in privacy preserving data publishing. In: SIGMOD
Xue M, Karras P, Raissi C, Vaidya J, Tan K (2012) Anonymizing set-valued data by nonreciprocal recoding. In: KDD
Terrovitis M, Mamoulis N, Kalnis P (2008) Privacy-preserving anonymization of set-valued data. In: VLDB
Xu Y, Wang K, Fu AW, Yu PS (2008) Anonymizing transaction databases for publication. In: KDD
Mohammad N, Fung B, Hung P, Lee C (2009) Anonymizing healthcare data: a case study on the blood transfusion service. In: KDD
Ercan Nergiz M, Atzori M, Clifton C (2007) Hiding the presence of individuals from shared databases. In: SIGMOD
Zakerzadeh H, Aggarwal CC, Barker K (2014) Towards breaking the curse of dimensionality for high-dimensional privacy. In: SDM
Kifer D, Gehrke J (2006) Injecting utility into anonymized datasets. In: SIGMOD
Mohammed N, Fung B, Hung P, Lee C (2010) Centralized and distributed anonymization for high-dimensional healthcare data. In: TKDD
Terrovitis M, Liagouris J, Mamoulis N, Skiadopoulos S (2012) Privacy preservation by disassociation. In: VLDB
Nergiz M, Atzori M, Clifton C (2007) Hiding the presence of individuals from shared databases. In: SIGMOD
Dwork C (2006) Differntial privacy. In: ICALP
Dwork C, McSherry F, Nissim K, Smith A (2006) Calibrating noise to sensitivity in private data analysis. In: TCC
Hotelling H (1993) Analysis of a complex of statistical variables into principal components. J Educ Psychol 24(6):417–441
Article Google Scholar
He Y, Naughton J (2009) Anonymization of set-valued data via top-down, local generalization. In: VLDB
Zakerzadeh H, Osborn SL (2013) Delay-sensitive approaches for anonymizing numerical streaming data. Int J Inf Sec 12(5):423–437
Article Google Scholar
Cao J, Karras P (2012) Publishing microdata with a robust privacy guarantee. In: VLDB

Download references

Author information

Authors and Affiliations

University of Calgary, Calgary, Canada
Hessam Zakerzadeh & Ken Barker
IBM T.J. Watson Research Center, Yorktown Heights, NY, USA
Charu C. Aggarwal

Authors

Hessam Zakerzadeh
View author publications
You can also search for this author inPubMed Google Scholar
Charu C. Aggarwal
View author publications
You can also search for this author inPubMed Google Scholar
Ken Barker
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Hessam Zakerzadeh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zakerzadeh, H., Aggarwal, C.C. & Barker, K. Managing dimensionality in data privacy anonymization. Knowl Inf Syst 49, 341–373 (2016). https://doi.org/10.1007/s10115-015-0906-8

Download citation

Received: 01 April 2014
Revised: 16 October 2015
Accepted: 24 November 2015
Published: 11 December 2015
Issue Date: October 2016
DOI: https://doi.org/10.1007/s10115-015-0906-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Managing dimensionality in data privacy anonymization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Comparative Analysis of Anonymization Techniques

Data Anonymization Through Multi-modular Clustering

Mitigating the Curse of Dimensionality in Data Anonymization

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Managing dimensionality in data privacy anonymization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Comparative Analysis of Anonymization Techniques

Data Anonymization Through Multi-modular Clustering

Mitigating the Curse of Dimensionality in Data Anonymization

Explore related subjects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now