skip to main content
10.1145/2020408.2020487acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Differentially private data release for data mining

Published:21 August 2011Publication History

ABSTRACT

Privacy-preserving data publishing addresses the problem of disclosing sensitive data when mining for useful information. Among the existing privacy models, ∈-differential privacy provides one of the strongest privacy guarantees and has no assumptions about an adversary's background knowledge. Most of the existing solutions that ensure ∈-differential privacy are based on an interactive model, where the data miner is only allowed to pose aggregate queries to the database. In this paper, we propose the first anonymization algorithm for the non-interactive setting based on the generalization technique. The proposed solution first probabilistically generalizes the raw data and then adds noise to guarantee ∈-differential privacy. As a sample application, we show that the anonymized data can be used effectively to build a decision tree induction classifier. Experimental results demonstrate that the proposed non-interactive anonymization algorithm is scalable and performs better than the existing solutions for classification analysis.

References

  1. B. Barak, K. Chaudhuri, C. Dwork, S. Kale, F. McSherry, and K. Talwar. Privacy, accuracy, and consistency too: A holistic solution to contingency table release. In PODS, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. J. Bayardo and R. Agrawal. Data privacy through optimal k-anonymization. In ICDE, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Bhaskar, S. Laxman, A. Smith, and A. Thakurta. Discovering frequent patterns in sensitive data. In SIGKDD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Blum, K. Ligett, and A. Roth. A learning theory approach to non-interactive database privacy. In STOC, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. Cormode, D. Srivastava, N. Li, and T. Li. Minimizing minimality and maximizing utility: Analyzing methodbased attacks on anonymized data. In VLDB, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. I. Dinur and K. Nissim. Revealing information while preserving privacy. In PODS, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Dwork. Differential privacy. In ICALP, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Dwork. A firm foundation for private data analysis. Commun. ACM, 54(1):86--95, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Frank and A. Asuncion. UCI machine learning repository, 2010.Google ScholarGoogle Scholar
  11. A. Friedman and A. Schuster. Data mining with differential privacy. In SIGKDD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. B. C. M. Fung, K. Wang, R. Chen, and P. S. Yu. Privacy-preserving data publishing: A survey of recent developments. ACM Computing Surveys, 42(4):1--53, June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. B. C. M. Fung, K. Wang, and P. S. Yu. Anonymizing classification data for privacy preservation. IEEE TKDE, 19(5):711--725, May 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. R. Ganta, S. Kasiviswanathan, and A. Smith. Composition attacks and auxiliary information in data privacy. In SIGKDD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Hardt and K. Talwar. On the geometry of differential privacy. In STOC, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Hay, V. Rastogi, G. Miklau, and D. Suciu. Boosting the accuracy of differentially private histograms through consistency. In VLDB, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Inan, M. Kantarcioglu, G. Ghinita, and E. Bertino. Private record matching using differential privacy. In EDBT, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. V. S. Iyengar. Transforming data to satisfy privacy constraints. In SIGKDD, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. X. Jin, N. Zhang, and G. Das. Algorithm-safe privacy-preserving data publishing. In EDBT, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. P. Kasiviswanathan, M. Rudelson, A. Smith, and J. Ullman. The price of privately releasing contingency tables and the spectra of random matrices with correlated rows. In STOC, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Kifer. Attacks on privacy and de finetti's theorem. In SIGMOD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. Kifer and B. Lin. Towards an axiomatization of statistical privacy and utility. In PODS, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Mondrian multidimensional k-anonymity. In ICDE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Workload-aware anonymization techniques for large-scale data sets. ACM TODS, 33(3), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. C. Li, M. Hay, V. Rastogi, G. Miklau, and A. McGregor. Optimizing linear counting queries under differential privacy. In PODS, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. N. Li, T. Li, and S. Venkatasubramanian. t-closeness: Privacy beyond k-anonymity and l-diversity. In ICDE, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  27. A. Machanavajjhala, J. Gehrke, and M. Gotz. Data publishing against realistic adversaries. In VLDB, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. ACM TKDD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. Martin, D. Kifer, A. Machanavajjhala, J. Gehrke, and J. Halpern. Worst-case background knowledge in privacy-preserving data publishing. In ICDE, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  30. F. McSherry. Privacy integrated queries. In SIGMOD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. F. McSherry and I. Mironov. Differentially private recommender systems: building privacy into the net. In SIGKDD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. F. McSherry and K. Talwar. Mechanism design via differential privacy. In FOCS, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. N. Mohammed, B. C. M. Fung, P. C. K. Hung, and C. Lee. Anonymizing healthcare data: A case study on the blood transfusion service. In SIGKDD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. A. Roth and T. Roughgarden. Interactive privacy via the median mechanism. In STOC, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. P. Samarati. Protecting respondents' identities in microdata release. IEEE TKDE, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. L. Sweeney. k-anonymity: A model for protecting privacy. In International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. K. Wang, B. C. M. Fung, and P. S. Yu. Handicapping attacker's confidence: An alternative to k-anonymization. KAIS, 11(3):345--368, April 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. R. C. W. Wong, A. W. C. Fu, K. Wang, and J. Pei. Minimality attack in privacy preserving data publishing. In VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. R. C. W. Wong, A. W. C. Fu, K. Wang, Y. Xu, and P. S. Yu. Can the utility of anonymized data be used for privacy breaches? ACM TKDD, to appear. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. R. C. W. Wong, J. Li., A. W. C. Fu, and K. Wang. (a,k)-anonymity: An enhanced k-anonymity model for privacy preserving data publishing. In SIGKDD, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. X. Xiao and Y. Tao. Personalized privacy preservation. In SIGMOD, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. X. Xiao, Y. Tao, and N. Koudas. Transparent anonymization: Thwarting adversaries who know the algorithm. ACM TODS, 35(2), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. X. Xiao, G. Wang, and J. Gehrke. Differential privacy via wavelet transforms. In ICDE, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  45. L. Zhang, S. Jajodia, and A. Brodsky. Information disclosure under realistic assumptions: Privacy versus optimality. In CCS, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Differentially private data release for data mining

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
          August 2011
          1446 pages
          ISBN:9781450308137
          DOI:10.1145/2020408

          Copyright © 2011 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 21 August 2011

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate1,133of8,635submissions,13%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader