skip to main content
10.1145/3180445.3180452acmconferencesArticle/Chapter ViewAbstractPublication PagescodaspyConference Proceedingsconference-collections
research-article

Differentially Private Feature Selection for Data Mining

Published:21 March 2018Publication History

ABSTRACT

One approach to analysis of private data is ε-differential privacy, a randomization-based approach that protects individual data items by injecting carefully limited noise into results. A challenge in applying this to private data analysis is that the noise added to the feature parameters is directly proportional to the number of parameters learned. While careful feature selection would alleviate this problem, the process of feature selection itself can reveal private information, requiring the application of differential privacy to the feature selection process. In this paper, we analyze the sensitivity of various feature selection techniques used in data mining and show that some of them are not suitable for differentially private analysis due to high sensitivity. We give experimental results showing the value of using low sensitivity feature selection techniques. We also show that the same concepts can be used to improve differentially private decision trees.

References

  1. Marko Bohanec and Vladislav Rajkovic. 1988. Knowledge acquisition and explanation for multi-attribute decision making. In 8th Intl Workshop on Expert Systems and their Applications. 59--78.Google ScholarGoogle Scholar
  2. Leo Breiman, Jerome Friedman, Charles J Stone, and Richard A Olshen. 1984. Classification and regression trees. CRC press.Google ScholarGoogle Scholar
  3. Graham Cormode. 2011. Personal privacy vs population privacy: learning to attack anonymization. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1253--1261. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cynthia Dwork. 2008. Differential privacy: A survey of results. In Theory and applications of models of computation. Springer, 1--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. 2006. Our data, ourselves: Privacy via distributed noise generation. In Annual International Conference on the Theory and Applications of Cryptographic Techniques. Springer, 486--503. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Cynthia Dwork, Moni Naor, Omer Reingold, Guy N Rothblum, and Salil Vadhan. 2009. On the complexity of differentially private data release: efficient algorithms and hardness results. In Proceedings of the forty-first annual ACM symposium on Theory of computing. ACM, 381--390. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Stephen E Fienberg, Aleksandra Slavković, and Carline Uhler. 2011. Privacy preserving GWAS data sharing. In Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on. IEEE, 628--635. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Arik Friedman and Assaf Schuster. 2010. Data mining with differential privacy. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 493--502. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Luigi Galavotti, Fabrizio Sebastiani, and Maria Simi. 2000. Experiments on the use of feature selection and negative evidence in automated text categorization. In Research & Advanced Technology for Digital Libraries. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Mengdi Huai, Liusheng Huang, Wei Yang, Lu Li, and Mingyu Qi. 2015. Privacy preserving naive bayes classification. In International Conference on Knowledge Science, Engineering and Management. Springer, 627--638. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Zhanglong Ji, Zachary Chase Lipton, and Charles Elkan. 2014. Differential Privacy and Machine Learning: a Survey and Review. CoRR abs/1412.7584 (2014).arXiv:1412.7584 http://arxiv.org/abs/1412.7584Google ScholarGoogle Scholar
  12. M. Lichman. 2013. UCI Machine Learning Repository. (2013). http://archive.ics.uci.edu/mlGoogle ScholarGoogle Scholar
  13. Noman Mohammed, Rui Chen, Benjamin Fung, and Philip S Yu. 2011. Differentially private data release for data mining. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 493--501. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Ross Quinlan. 1986. Induction of decision trees. Machine learning 1, 1 (1986), 81--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Rana, S. K. Gupta, and S. Venkatesh. 2015. Differentially Private Random Forest with High Utility. In 2015 IEEE International Conference on Data Mining. 955--960. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Thomas Steinke and Jonathan Ullman. 2017. Tight Lower Bounds for Differentially Private Selection. CoRR abs/1704.03024 (2017). arXiv:1704.03024 http://arxiv.org/abs/1704.03024Google ScholarGoogle Scholar
  17. Ben Stoddard, Yan Chen, and Ashwin Machanavajjhala. 2014. Differentially private algorithms for empirical machine learning. arXiv preprint arXiv:1411.5428 (2014).Google ScholarGoogle Scholar
  18. Jaideep Vaidya, Basit Shafiq, Anirban Basu, and Yuan Hong. 2013. Differentially private naive Bayes classification. In Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT)-Volume 01. IEEE Computer Society, 571--576. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Staal A Vinterbo. 2012. Differentially private projected histograms: Construction and use for prediction. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 19--34.Google ScholarGoogle ScholarCross RefCross Ref
  20. Grace Hui Yang and Ian Soboroff (Eds.). 2015. Privacy Preserving IR Workshop (PIR2016). SIGIR, Santiago, Chile. http://cs-sys-1.uis.georgetown.edu/~hw271/pirNetwork/index.htmlGoogle ScholarGoogle Scholar
  21. J. Yang and Y. Li. 2014. Differentially private feature selection. In 2014 International Joint Conference on Neural Networks (IJCNN). 4182--4189.Google ScholarGoogle Scholar
  22. Yiming Yang and Jan O Pedersen. 1997. A comparative study on feature selection in text categorization. In ICML, Vol. 97. 412--420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Differentially Private Feature Selection for Data Mining

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader