ABSTRACT
Privacy-preserving data publishing addresses the problem of disclosing sensitive data when mining for useful information. Among the existing privacy models, ∈-differential privacy provides one of the strongest privacy guarantees and has no assumptions about an adversary's background knowledge. Most of the existing solutions that ensure ∈-differential privacy are based on an interactive model, where the data miner is only allowed to pose aggregate queries to the database. In this paper, we propose the first anonymization algorithm for the non-interactive setting based on the generalization technique. The proposed solution first probabilistically generalizes the raw data and then adds noise to guarantee ∈-differential privacy. As a sample application, we show that the anonymized data can be used effectively to build a decision tree induction classifier. Experimental results demonstrate that the proposed non-interactive anonymization algorithm is scalable and performs better than the existing solutions for classification analysis.
- B. Barak, K. Chaudhuri, C. Dwork, S. Kale, F. McSherry, and K. Talwar. Privacy, accuracy, and consistency too: A holistic solution to contingency table release. In PODS, 2007. Google ScholarDigital Library
- R. J. Bayardo and R. Agrawal. Data privacy through optimal k-anonymization. In ICDE, 2005. Google ScholarDigital Library
- R. Bhaskar, S. Laxman, A. Smith, and A. Thakurta. Discovering frequent patterns in sensitive data. In SIGKDD, 2010. Google ScholarDigital Library
- A. Blum, K. Ligett, and A. Roth. A learning theory approach to non-interactive database privacy. In STOC, 2008. Google ScholarDigital Library
- G. Cormode, D. Srivastava, N. Li, and T. Li. Minimizing minimality and maximizing utility: Analyzing methodbased attacks on anonymized data. In VLDB, 2010. Google ScholarDigital Library
- I. Dinur and K. Nissim. Revealing information while preserving privacy. In PODS, 2003. Google ScholarDigital Library
- C. Dwork. Differential privacy. In ICALP, 2006. Google ScholarDigital Library
- C. Dwork. A firm foundation for private data analysis. Commun. ACM, 54(1):86--95, 2011. Google ScholarDigital Library
- C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, 2006. Google ScholarDigital Library
- A. Frank and A. Asuncion. UCI machine learning repository, 2010.Google Scholar
- A. Friedman and A. Schuster. Data mining with differential privacy. In SIGKDD, 2010. Google ScholarDigital Library
- B. C. M. Fung, K. Wang, R. Chen, and P. S. Yu. Privacy-preserving data publishing: A survey of recent developments. ACM Computing Surveys, 42(4):1--53, June 2010. Google ScholarDigital Library
- B. C. M. Fung, K. Wang, and P. S. Yu. Anonymizing classification data for privacy preservation. IEEE TKDE, 19(5):711--725, May 2007. Google ScholarDigital Library
- S. R. Ganta, S. Kasiviswanathan, and A. Smith. Composition attacks and auxiliary information in data privacy. In SIGKDD, 2008. Google ScholarDigital Library
- M. Hardt and K. Talwar. On the geometry of differential privacy. In STOC, 2010. Google ScholarDigital Library
- M. Hay, V. Rastogi, G. Miklau, and D. Suciu. Boosting the accuracy of differentially private histograms through consistency. In VLDB, 2010. Google ScholarDigital Library
- A. Inan, M. Kantarcioglu, G. Ghinita, and E. Bertino. Private record matching using differential privacy. In EDBT, 2010. Google ScholarDigital Library
- V. S. Iyengar. Transforming data to satisfy privacy constraints. In SIGKDD, 2002. Google ScholarDigital Library
- X. Jin, N. Zhang, and G. Das. Algorithm-safe privacy-preserving data publishing. In EDBT, 2010. Google ScholarDigital Library
- S. P. Kasiviswanathan, M. Rudelson, A. Smith, and J. Ullman. The price of privately releasing contingency tables and the spectra of random matrices with correlated rows. In STOC, 2010. Google ScholarDigital Library
- D. Kifer. Attacks on privacy and de finetti's theorem. In SIGMOD, 2009. Google ScholarDigital Library
- D. Kifer and B. Lin. Towards an axiomatization of statistical privacy and utility. In PODS, 2010. Google ScholarDigital Library
- K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Mondrian multidimensional k-anonymity. In ICDE, 2006. Google ScholarDigital Library
- K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Workload-aware anonymization techniques for large-scale data sets. ACM TODS, 33(3), 2008. Google ScholarDigital Library
- C. Li, M. Hay, V. Rastogi, G. Miklau, and A. McGregor. Optimizing linear counting queries under differential privacy. In PODS, 2010. Google ScholarDigital Library
- N. Li, T. Li, and S. Venkatasubramanian. t-closeness: Privacy beyond k-anonymity and l-diversity. In ICDE, 2007.Google ScholarCross Ref
- A. Machanavajjhala, J. Gehrke, and M. Gotz. Data publishing against realistic adversaries. In VLDB, 2009. Google ScholarDigital Library
- A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. ACM TKDD, 2007. Google ScholarDigital Library
- D. Martin, D. Kifer, A. Machanavajjhala, J. Gehrke, and J. Halpern. Worst-case background knowledge in privacy-preserving data publishing. In ICDE, 2007.Google ScholarCross Ref
- F. McSherry. Privacy integrated queries. In SIGMOD, 2009. Google ScholarDigital Library
- F. McSherry and I. Mironov. Differentially private recommender systems: building privacy into the net. In SIGKDD, 2009. Google ScholarDigital Library
- F. McSherry and K. Talwar. Mechanism design via differential privacy. In FOCS, 2007. Google ScholarDigital Library
- N. Mohammed, B. C. M. Fung, P. C. K. Hung, and C. Lee. Anonymizing healthcare data: A case study on the blood transfusion service. In SIGKDD, 2009. Google ScholarDigital Library
- J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993. Google ScholarDigital Library
- A. Roth and T. Roughgarden. Interactive privacy via the median mechanism. In STOC, 2010. Google ScholarDigital Library
- P. Samarati. Protecting respondents' identities in microdata release. IEEE TKDE, 2001. Google ScholarDigital Library
- L. Sweeney. k-anonymity: A model for protecting privacy. In International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 2002. Google ScholarDigital Library
- K. Wang, B. C. M. Fung, and P. S. Yu. Handicapping attacker's confidence: An alternative to k-anonymization. KAIS, 11(3):345--368, April 2007. Google ScholarDigital Library
- R. C. W. Wong, A. W. C. Fu, K. Wang, and J. Pei. Minimality attack in privacy preserving data publishing. In VLDB, 2007. Google ScholarDigital Library
- R. C. W. Wong, A. W. C. Fu, K. Wang, Y. Xu, and P. S. Yu. Can the utility of anonymized data be used for privacy breaches? ACM TKDD, to appear. Google ScholarDigital Library
- R. C. W. Wong, J. Li., A. W. C. Fu, and K. Wang. (a,k)-anonymity: An enhanced k-anonymity model for privacy preserving data publishing. In SIGKDD, 2006. Google ScholarDigital Library
- X. Xiao and Y. Tao. Personalized privacy preservation. In SIGMOD, 2006. Google ScholarDigital Library
- X. Xiao, Y. Tao, and N. Koudas. Transparent anonymization: Thwarting adversaries who know the algorithm. ACM TODS, 35(2), 2010. Google ScholarDigital Library
- X. Xiao, G. Wang, and J. Gehrke. Differential privacy via wavelet transforms. In ICDE, 2010.Google ScholarCross Ref
- L. Zhang, S. Jajodia, and A. Brodsky. Information disclosure under realistic assumptions: Privacy versus optimality. In CCS, 2007. Google ScholarDigital Library
Index Terms
- Differentially private data release for data mining
Recommendations
A differentially private algorithm for location data release
The rise of mobile technologies in recent years has led to large volumes of location information, which are valuable resources for knowledge discovery such as travel patterns mining and traffic analysis. However, location dataset has been confronted ...
Differentially private data publishing via optimal univariate microaggregation and record perturbation
AbstractWe present an approach to generate differentially private data sets that consists in adding noise to a microaggregated version of the original data set. While this idea has already been pursued in the literature to reduce the ...
IMR based Anonymization for Privacy Preservation in Data Mining
KMO '16: Proceedings of the The 11th International Knowledge Management in Organizations Conference on The changing face of Knowledge Management Impacting SocietyPrivacy Preserving Data Mining (PPDM) is a data mining research area that aims to protect individual's personal information from unsolicited or unauthorized disclosure. Privacy relates to personal information that a person would not wish others to know ...
Comments