research-article

Differentially Private Feature Selection for Data Mining

Authors:
Balamurugan Anandan

Purdue University, West Lafayette, IN, USA

Purdue University, West Lafayette, IN, USA
View Profile

,
Chris Clifton

Purdue University, West Lafayette, IN, USA

Purdue University, West Lafayette, IN, USA
View Profile

IWSPA '18: Proceedings of the Fourth ACM International Workshop on Security and Privacy AnalyticsMarch 2018Pages 43–53https://doi.org/10.1145/3180445.3180452

Published:21 March 2018Publication History

IWSPA '18: Proceedings of the Fourth ACM International Workshop on Security and Privacy Analytics

Pages 43–53

ABSTRACT

One approach to analysis of private data is ε-differential privacy, a randomization-based approach that protects individual data items by injecting carefully limited noise into results. A challenge in applying this to private data analysis is that the noise added to the feature parameters is directly proportional to the number of parameters learned. While careful feature selection would alleviate this problem, the process of feature selection itself can reveal private information, requiring the application of differential privacy to the feature selection process. In this paper, we analyze the sensitivity of various feature selection techniques used in data mining and show that some of them are not suitable for differentially private analysis due to high sensitivity. We give experimental results showing the value of using low sensitivity feature selection techniques. We also show that the same concepts can be used to improve differentially private decision trees.

References

Marko Bohanec and Vladislav Rajkovic. 1988. Knowledge acquisition and explanation for multi-attribute decision making. In 8th Intl Workshop on Expert Systems and their Applications. 59--78.Google Scholar
Leo Breiman, Jerome Friedman, Charles J Stone, and Richard A Olshen. 1984. Classification and regression trees. CRC press.Google Scholar
Graham Cormode. 2011. Personal privacy vs population privacy: learning to attack anonymization. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1253--1261. Google ScholarDigital Library
Cynthia Dwork. 2008. Differential privacy: A survey of results. In Theory and applications of models of computation. Springer, 1--19. Google ScholarDigital Library
Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. 2006. Our data, ourselves: Privacy via distributed noise generation. In Annual International Conference on the Theory and Applications of Cryptographic Techniques. Springer, 486--503. Google ScholarDigital Library
Cynthia Dwork, Moni Naor, Omer Reingold, Guy N Rothblum, and Salil Vadhan. 2009. On the complexity of differentially private data release: efficient algorithms and hardness results. In Proceedings of the forty-first annual ACM symposium on Theory of computing. ACM, 381--390. Google ScholarDigital Library
Stephen E Fienberg, Aleksandra Slavković, and Carline Uhler. 2011. Privacy preserving GWAS data sharing. In Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on. IEEE, 628--635. Google ScholarDigital Library
Arik Friedman and Assaf Schuster. 2010. Data mining with differential privacy. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 493--502. Google ScholarDigital Library
Luigi Galavotti, Fabrizio Sebastiani, and Maria Simi. 2000. Experiments on the use of feature selection and negative evidence in automated text categorization. In Research & Advanced Technology for Digital Libraries. Springer. Google ScholarDigital Library
Mengdi Huai, Liusheng Huang, Wei Yang, Lu Li, and Mingyu Qi. 2015. Privacy preserving naive bayes classification. In International Conference on Knowledge Science, Engineering and Management. Springer, 627--638. Google ScholarDigital Library
Zhanglong Ji, Zachary Chase Lipton, and Charles Elkan. 2014. Differential Privacy and Machine Learning: a Survey and Review. CoRR abs/1412.7584 (2014).arXiv:1412.7584 http://arxiv.org/abs/1412.7584Google Scholar
M. Lichman. 2013. UCI Machine Learning Repository. (2013). http://archive.ics.uci.edu/mlGoogle Scholar
Noman Mohammed, Rui Chen, Benjamin Fung, and Philip S Yu. 2011. Differentially private data release for data mining. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 493--501. Google ScholarDigital Library
J. Ross Quinlan. 1986. Induction of decision trees. Machine learning 1, 1 (1986), 81--106. Google ScholarDigital Library
S. Rana, S. K. Gupta, and S. Venkatesh. 2015. Differentially Private Random Forest with High Utility. In 2015 IEEE International Conference on Data Mining. 955--960. Google ScholarDigital Library
Thomas Steinke and Jonathan Ullman. 2017. Tight Lower Bounds for Differentially Private Selection. CoRR abs/1704.03024 (2017). arXiv:1704.03024 http://arxiv.org/abs/1704.03024Google Scholar
Ben Stoddard, Yan Chen, and Ashwin Machanavajjhala. 2014. Differentially private algorithms for empirical machine learning. arXiv preprint arXiv:1411.5428 (2014).Google Scholar
Jaideep Vaidya, Basit Shafiq, Anirban Basu, and Yuan Hong. 2013. Differentially private naive Bayes classification. In Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT)-Volume 01. IEEE Computer Society, 571--576. Google ScholarDigital Library
Staal A Vinterbo. 2012. Differentially private projected histograms: Construction and use for prediction. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 19--34.Google ScholarCross Ref
Grace Hui Yang and Ian Soboroff (Eds.). 2015. Privacy Preserving IR Workshop (PIR2016). SIGIR, Santiago, Chile. http://cs-sys-1.uis.georgetown.edu/~hw271/pirNetwork/index.htmlGoogle Scholar
J. Yang and Y. Li. 2014. Differentially private feature selection. In 2014 International Joint Conference on Neural Networks (IJCNN). 4182--4189.Google Scholar
Yiming Yang and Jan O Pedersen. 1997. A comparative study on feature selection in text categorization. In ICML, Vol. 97. 412--420. Google ScholarDigital Library

Differentially Private Feature Selection for Data Mining
1. Security and privacy

Recommendations

Data mining with differential privacy
KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

We consider the problem of data mining with formal privacy guarantees, given a data access interface based on the differential privacy framework. Differential privacy requires that computations be insensitive to changes in any particular individual's ...
Read More
Differentially private data release for data mining
KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

Privacy-preserving data publishing addresses the problem of disclosing sensitive data when mining for useful information. Among the existing privacy models, ∈-differential privacy provides one of the strongest privacy guarantees and has no assumptions ...
Read More
Privacy Preserving Data Mining Techniques: Current Scenario and Future Prospects
ICCCT '12: Proceedings of the 2012 Third International Conference on Computer and Communication Technology

Privacy preserving has originated as an important concern with reference to the success of the data mining. Privacy preserving data mining (PPDM) deals with protecting the privacy of individual data or sensitive knowledge without sacrificing the utility ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
IWSPA '18: Proceedings of the Fourth ACM International Workshop on Security and Privacy Analytics
March 2018
72 pages
ISBN:9781450356343
DOI:10.1145/3180445
General Chair:
Rakesh Verma
University of Houston, USA
,
Program Chairs:
Murat Kantarcioglu
University of Texas - Dallas, USA
,
Rakesh Verma
University of Houston, USA
Copyright © 2018 ACM
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 March 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
classification
data mining
decision trees
differential privacy
feature selection
naive Bayes
privacy preserving data mining
sensitivity
Qualifiers
- research-article
Conference

Acceptance Rates
IWSPA '18 Paper Acceptance Rate4of11submissions,36%Overall Acceptance Rate18of58submissions,31%
More
Upcoming Conference
CODASPY '24

Sponsor:

sigsac

Fourteenth ACM Conference on Data and Application Security and Privacy

June 19 - 21, 2024

Porto , Portugal
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 321
  Total Downloads
- Downloads (Last 12 months)33
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Differentially Private Feature Selection for Data Mining

IWSPA '18: Proceedings of the Fourth ACM International Workshop on Security and Privacy Analytics

ABSTRACT

References

Cited By

Recommendations

Data mining with differential privacy

Differentially private data release for data mining

Privacy Preserving Data Mining Techniques: Current Scenario and Future Prospects

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Differentially Private Feature Selection for Data Mining

IWSPA '18: Proceedings of the Fourth ACM International Workshop on Security and Privacy Analytics

ABSTRACT

References

Cited By

Recommendations

Data mining with differential privacy

Differentially private data release for data mining

Privacy Preserving Data Mining Techniques: Current Scenario and Future Prospects

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media