research-article

Differentially private data release for data mining

Authors:
Noman Mohammed

Concordia University, Montreal, Canada

Concordia University, Montreal, Canada
View Profile

,
Rui Chen

Concordia University, Montreal, Canada

Concordia University, Montreal, Canada
View Profile

,
Benjamin C.M. Fung

Concordia University, Montreal, Canada

Concordia University, Montreal, Canada
View Profile

,
Philip S. Yu

University of Illinois at Chicago, Chicago, USA

University of Illinois at Chicago, Chicago, USA
View Profile

KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2011Pages 493–501https://doi.org/10.1145/2020408.2020487

Published:21 August 2011Publication History

KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 493–501

ABSTRACT

Privacy-preserving data publishing addresses the problem of disclosing sensitive data when mining for useful information. Among the existing privacy models, ∈-differential privacy provides one of the strongest privacy guarantees and has no assumptions about an adversary's background knowledge. Most of the existing solutions that ensure ∈-differential privacy are based on an interactive model, where the data miner is only allowed to pose aggregate queries to the database. In this paper, we propose the first anonymization algorithm for the non-interactive setting based on the generalization technique. The proposed solution first probabilistically generalizes the raw data and then adds noise to guarantee ∈-differential privacy. As a sample application, we show that the anonymized data can be used effectively to build a decision tree induction classifier. Experimental results demonstrate that the proposed non-interactive anonymization algorithm is scalable and performs better than the existing solutions for classification analysis.

References

B. Barak, K. Chaudhuri, C. Dwork, S. Kale, F. McSherry, and K. Talwar. Privacy, accuracy, and consistency too: A holistic solution to contingency table release. In PODS, 2007. Google ScholarDigital Library
R. J. Bayardo and R. Agrawal. Data privacy through optimal k-anonymization. In ICDE, 2005. Google ScholarDigital Library
R. Bhaskar, S. Laxman, A. Smith, and A. Thakurta. Discovering frequent patterns in sensitive data. In SIGKDD, 2010. Google ScholarDigital Library
A. Blum, K. Ligett, and A. Roth. A learning theory approach to non-interactive database privacy. In STOC, 2008. Google ScholarDigital Library
G. Cormode, D. Srivastava, N. Li, and T. Li. Minimizing minimality and maximizing utility: Analyzing methodbased attacks on anonymized data. In VLDB, 2010. Google ScholarDigital Library
I. Dinur and K. Nissim. Revealing information while preserving privacy. In PODS, 2003. Google ScholarDigital Library
C. Dwork. Differential privacy. In ICALP, 2006. Google ScholarDigital Library
C. Dwork. A firm foundation for private data analysis. Commun. ACM, 54(1):86--95, 2011. Google ScholarDigital Library
C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, 2006. Google ScholarDigital Library
A. Frank and A. Asuncion. UCI machine learning repository, 2010.Google Scholar
A. Friedman and A. Schuster. Data mining with differential privacy. In SIGKDD, 2010. Google ScholarDigital Library
B. C. M. Fung, K. Wang, R. Chen, and P. S. Yu. Privacy-preserving data publishing: A survey of recent developments. ACM Computing Surveys, 42(4):1--53, June 2010. Google ScholarDigital Library
B. C. M. Fung, K. Wang, and P. S. Yu. Anonymizing classification data for privacy preservation. IEEE TKDE, 19(5):711--725, May 2007. Google ScholarDigital Library
S. R. Ganta, S. Kasiviswanathan, and A. Smith. Composition attacks and auxiliary information in data privacy. In SIGKDD, 2008. Google ScholarDigital Library
M. Hardt and K. Talwar. On the geometry of differential privacy. In STOC, 2010. Google ScholarDigital Library
M. Hay, V. Rastogi, G. Miklau, and D. Suciu. Boosting the accuracy of differentially private histograms through consistency. In VLDB, 2010. Google ScholarDigital Library
A. Inan, M. Kantarcioglu, G. Ghinita, and E. Bertino. Private record matching using differential privacy. In EDBT, 2010. Google ScholarDigital Library
V. S. Iyengar. Transforming data to satisfy privacy constraints. In SIGKDD, 2002. Google ScholarDigital Library
X. Jin, N. Zhang, and G. Das. Algorithm-safe privacy-preserving data publishing. In EDBT, 2010. Google ScholarDigital Library
S. P. Kasiviswanathan, M. Rudelson, A. Smith, and J. Ullman. The price of privately releasing contingency tables and the spectra of random matrices with correlated rows. In STOC, 2010. Google ScholarDigital Library
D. Kifer. Attacks on privacy and de finetti's theorem. In SIGMOD, 2009. Google ScholarDigital Library
D. Kifer and B. Lin. Towards an axiomatization of statistical privacy and utility. In PODS, 2010. Google ScholarDigital Library
K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Mondrian multidimensional k-anonymity. In ICDE, 2006. Google ScholarDigital Library
K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Workload-aware anonymization techniques for large-scale data sets. ACM TODS, 33(3), 2008. Google ScholarDigital Library
C. Li, M. Hay, V. Rastogi, G. Miklau, and A. McGregor. Optimizing linear counting queries under differential privacy. In PODS, 2010. Google ScholarDigital Library
N. Li, T. Li, and S. Venkatasubramanian. t-closeness: Privacy beyond k-anonymity and l-diversity. In ICDE, 2007.Google ScholarCross Ref
A. Machanavajjhala, J. Gehrke, and M. Gotz. Data publishing against realistic adversaries. In VLDB, 2009. Google ScholarDigital Library
A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. ACM TKDD, 2007. Google ScholarDigital Library
D. Martin, D. Kifer, A. Machanavajjhala, J. Gehrke, and J. Halpern. Worst-case background knowledge in privacy-preserving data publishing. In ICDE, 2007.Google ScholarCross Ref
F. McSherry. Privacy integrated queries. In SIGMOD, 2009. Google ScholarDigital Library
F. McSherry and I. Mironov. Differentially private recommender systems: building privacy into the net. In SIGKDD, 2009. Google ScholarDigital Library
F. McSherry and K. Talwar. Mechanism design via differential privacy. In FOCS, 2007. Google ScholarDigital Library
N. Mohammed, B. C. M. Fung, P. C. K. Hung, and C. Lee. Anonymizing healthcare data: A case study on the blood transfusion service. In SIGKDD, 2009. Google ScholarDigital Library
J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993. Google ScholarDigital Library
A. Roth and T. Roughgarden. Interactive privacy via the median mechanism. In STOC, 2010. Google ScholarDigital Library
P. Samarati. Protecting respondents' identities in microdata release. IEEE TKDE, 2001. Google ScholarDigital Library
L. Sweeney. k-anonymity: A model for protecting privacy. In International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 2002. Google ScholarDigital Library
K. Wang, B. C. M. Fung, and P. S. Yu. Handicapping attacker's confidence: An alternative to k-anonymization. KAIS, 11(3):345--368, April 2007. Google ScholarDigital Library
R. C. W. Wong, A. W. C. Fu, K. Wang, and J. Pei. Minimality attack in privacy preserving data publishing. In VLDB, 2007. Google ScholarDigital Library
R. C. W. Wong, A. W. C. Fu, K. Wang, Y. Xu, and P. S. Yu. Can the utility of anonymized data be used for privacy breaches? ACM TKDD, to appear. Google ScholarDigital Library
R. C. W. Wong, J. Li., A. W. C. Fu, and K. Wang. (a,k)-anonymity: An enhanced k-anonymity model for privacy preserving data publishing. In SIGKDD, 2006. Google ScholarDigital Library
X. Xiao and Y. Tao. Personalized privacy preservation. In SIGMOD, 2006. Google ScholarDigital Library
X. Xiao, Y. Tao, and N. Koudas. Transparent anonymization: Thwarting adversaries who know the algorithm. ACM TODS, 35(2), 2010. Google ScholarDigital Library
X. Xiao, G. Wang, and J. Gehrke. Differential privacy via wavelet transforms. In ICDE, 2010.Google ScholarCross Ref
L. Zhang, S. Jajodia, and A. Brodsky. Information disclosure under realistic assumptions: Privacy versus optimality. In CCS, 2007. Google ScholarDigital Library

Index Terms

Differentially private data release for data mining

Recommendations

A differentially private algorithm for location data release

The rise of mobile technologies in recent years has led to large volumes of location information, which are valuable resources for knowledge discovery such as travel patterns mining and traffic analysis. However, location dataset has been confronted ...
Read More
Differentially private data publishing via optimal univariate microaggregation and record perturbation
Abstract
We present an approach to generate differentially private data sets that consists in adding noise to a microaggregated version of the original data set. While this idea has already been pursued in the literature to reduce the ...
Read More
IMR based Anonymization for Privacy Preservation in Data Mining
KMO '16: Proceedings of the The 11th International Knowledge Management in Organizations Conference on The changing face of Knowledge Management Impacting Society

Privacy Preserving Data Mining (PPDM) is a data mining research area that aims to protect individual's personal information from unsolicited or unauthorized disclosure. Privacy relates to personal information that a person would not wish others to know ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2011
1446 pages
ISBN:9781450308137
DOI:10.1145/2020408
General Chair:
Chid Apte
IBM Research
,
Program Chairs:
Joydeep Ghosh
UT Austin
,
Padhraic Smyth
UC Irvine
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 August 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
anonymization
data mining
differential privacy
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

KDD '24: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 214
  Total Citations
  View Citations
- 2,342
  Total Downloads
- Downloads (Last 12 months)86
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Differentially private data release for data mining

KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

A differentially private algorithm for location data release

Differentially private data publishing via optimal univariate microaggregation and record perturbation

IMR based Anonymization for Privacy Preservation in Data Mining