ABSTRACT
One approach to analysis of private data is ε-differential privacy, a randomization-based approach that protects individual data items by injecting carefully limited noise into results. A challenge in applying this to private data analysis is that the noise added to the feature parameters is directly proportional to the number of parameters learned. While careful feature selection would alleviate this problem, the process of feature selection itself can reveal private information, requiring the application of differential privacy to the feature selection process. In this paper, we analyze the sensitivity of various feature selection techniques used in data mining and show that some of them are not suitable for differentially private analysis due to high sensitivity. We give experimental results showing the value of using low sensitivity feature selection techniques. We also show that the same concepts can be used to improve differentially private decision trees.
- Marko Bohanec and Vladislav Rajkovic. 1988. Knowledge acquisition and explanation for multi-attribute decision making. In 8th Intl Workshop on Expert Systems and their Applications. 59--78.Google Scholar
- Leo Breiman, Jerome Friedman, Charles J Stone, and Richard A Olshen. 1984. Classification and regression trees. CRC press.Google Scholar
- Graham Cormode. 2011. Personal privacy vs population privacy: learning to attack anonymization. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1253--1261. Google ScholarDigital Library
- Cynthia Dwork. 2008. Differential privacy: A survey of results. In Theory and applications of models of computation. Springer, 1--19. Google ScholarDigital Library
- Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. 2006. Our data, ourselves: Privacy via distributed noise generation. In Annual International Conference on the Theory and Applications of Cryptographic Techniques. Springer, 486--503. Google ScholarDigital Library
- Cynthia Dwork, Moni Naor, Omer Reingold, Guy N Rothblum, and Salil Vadhan. 2009. On the complexity of differentially private data release: efficient algorithms and hardness results. In Proceedings of the forty-first annual ACM symposium on Theory of computing. ACM, 381--390. Google ScholarDigital Library
- Stephen E Fienberg, Aleksandra Slavković, and Carline Uhler. 2011. Privacy preserving GWAS data sharing. In Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on. IEEE, 628--635. Google ScholarDigital Library
- Arik Friedman and Assaf Schuster. 2010. Data mining with differential privacy. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 493--502. Google ScholarDigital Library
- Luigi Galavotti, Fabrizio Sebastiani, and Maria Simi. 2000. Experiments on the use of feature selection and negative evidence in automated text categorization. In Research & Advanced Technology for Digital Libraries. Springer. Google ScholarDigital Library
- Mengdi Huai, Liusheng Huang, Wei Yang, Lu Li, and Mingyu Qi. 2015. Privacy preserving naive bayes classification. In International Conference on Knowledge Science, Engineering and Management. Springer, 627--638. Google ScholarDigital Library
- Zhanglong Ji, Zachary Chase Lipton, and Charles Elkan. 2014. Differential Privacy and Machine Learning: a Survey and Review. CoRR abs/1412.7584 (2014).arXiv:1412.7584 http://arxiv.org/abs/1412.7584Google Scholar
- M. Lichman. 2013. UCI Machine Learning Repository. (2013). http://archive.ics.uci.edu/mlGoogle Scholar
- Noman Mohammed, Rui Chen, Benjamin Fung, and Philip S Yu. 2011. Differentially private data release for data mining. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 493--501. Google ScholarDigital Library
- J. Ross Quinlan. 1986. Induction of decision trees. Machine learning 1, 1 (1986), 81--106. Google ScholarDigital Library
- S. Rana, S. K. Gupta, and S. Venkatesh. 2015. Differentially Private Random Forest with High Utility. In 2015 IEEE International Conference on Data Mining. 955--960. Google ScholarDigital Library
- Thomas Steinke and Jonathan Ullman. 2017. Tight Lower Bounds for Differentially Private Selection. CoRR abs/1704.03024 (2017). arXiv:1704.03024 http://arxiv.org/abs/1704.03024Google Scholar
- Ben Stoddard, Yan Chen, and Ashwin Machanavajjhala. 2014. Differentially private algorithms for empirical machine learning. arXiv preprint arXiv:1411.5428 (2014).Google Scholar
- Jaideep Vaidya, Basit Shafiq, Anirban Basu, and Yuan Hong. 2013. Differentially private naive Bayes classification. In Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT)-Volume 01. IEEE Computer Society, 571--576. Google ScholarDigital Library
- Staal A Vinterbo. 2012. Differentially private projected histograms: Construction and use for prediction. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 19--34.Google ScholarCross Ref
- Grace Hui Yang and Ian Soboroff (Eds.). 2015. Privacy Preserving IR Workshop (PIR2016). SIGIR, Santiago, Chile. http://cs-sys-1.uis.georgetown.edu/~hw271/pirNetwork/index.htmlGoogle Scholar
- J. Yang and Y. Li. 2014. Differentially private feature selection. In 2014 International Joint Conference on Neural Networks (IJCNN). 4182--4189.Google Scholar
- Yiming Yang and Jan O Pedersen. 1997. A comparative study on feature selection in text categorization. In ICML, Vol. 97. 412--420. Google ScholarDigital Library
- Differentially Private Feature Selection for Data Mining
Recommendations
Data mining with differential privacy
KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data miningWe consider the problem of data mining with formal privacy guarantees, given a data access interface based on the differential privacy framework. Differential privacy requires that computations be insensitive to changes in any particular individual's ...
Differentially private data release for data mining
KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data miningPrivacy-preserving data publishing addresses the problem of disclosing sensitive data when mining for useful information. Among the existing privacy models, ∈-differential privacy provides one of the strongest privacy guarantees and has no assumptions ...
Privacy Preserving Data Mining Techniques: Current Scenario and Future Prospects
ICCCT '12: Proceedings of the 2012 Third International Conference on Computer and Communication TechnologyPrivacy preserving has originated as an important concern with reference to the success of the data mining. Privacy preserving data mining (PPDM) deals with protecting the privacy of individual data or sensitive knowledge without sacrificing the utility ...
Comments