ABSTRACT
The research problem of privacy-preserving data publishing is to release microdata in an aggregated form using distinguished techniques that will effectively conceal sensitive and private information but can be used by external users to exercise data mining. These techniques are often studied in interactive and non-interactive settings. While non-interactive setting mainly deals with the data publication using anonymization or noise addition approaches, interactive models are based on noisy response of queries. Most of the data pattern verification and classification accuracy determination approaches exist for non-interactively published microdata. In this paper, we verify the data pattern and determine classification accuracy on an interactive privacy preservation model called differential privacy. The contributions of this paper are: (1) We present a concise literature review of non-interactive and interactive models and technologies. (2) We propose an approach of retrieving information along with investigating, understanding and comparing the data classification accuracy experimentally on Privacy Integrated Queries. (3) We verify data pattern by comparing the correlation and classification accuracy of the differentially private data with non-interactive k-anonymous data.
- C. C. Aggarwal and P. S. Yu. Privacy-preserving data mining: models and algorithms. Springer, 2008. Google ScholarDigital Library
- S. Ali and Y. Xiang. Spam classification using adaptive boosting algorithm. In Proc. of the 6th IEEE/ACIS Internaional Conference on Computer and Information Science, pages 972--976, July 2007.Google ScholarCross Ref
- P. Benassi. Truste: an online privacy seal program. ACM Communications, 42(2): 56--59, 1999. Google ScholarDigital Library
- V. Ciriani, S. D. C. di Vimercati, S. Foresti, and P. Samarati. k-anonymity. Universitţa degli Studi di Milano, 26013 Crema, Italia.Google Scholar
- R. Clarke. Internet privacy concerns confirm the case for intervention. {online}, http://www.rogerclarke.com/DV/CACM99.html#Priv. Google ScholarDigital Library
- T. Dalenius. Towards a methodology for statistical disclosure control. Statistisk Tidskrift, 5: 429--444, 1977.Google Scholar
- C. Dwork. Differential privacy. In ICALP (2), pages 1--12, 2006. Google ScholarDigital Library
- C. Dwork and F. D. McSherry. Differential data privacy. United States Patent 20070143289 Publication, 2007.Google Scholar
- C. Dwork, F. Mcsherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In Proc. of the 3rd Theory of Cryptography Conference. Springer, 2006. Google ScholarDigital Library
- C. Dwork and A. Smith. Differential privacy for statistics: What we know and what we want to learn. Journal of Privacy and Confidentiality, 1(2), 2009.Google ScholarCross Ref
- B. C. M. Fung, K. Wang, R. Chen, and P. S. Yu. Privacy-preserving data publishing: A survey of recent developments. ACM Computing Surveys, 42(4): 1--53, 2010. Google ScholarDigital Library
- B. C. M. Fung, K. Wang, and P. S. Yu. Anonymizing classification data for privacy preservation. IEEE Transactions on Knowledge and Data Engineering (TKDE), 19: 711--725, 2007. Google ScholarDigital Library
- A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery Data (TKDD), 1(1): 3, 2007. Google ScholarDigital Library
- F. D. McSherry. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In Proc. of the 35th SIGMOD international conference on Management of data, pages 19--30, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- L. Sweeney. k-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10(5): 557--570, 2002. Google ScholarDigital Library
- University of Texus, Dallas. Anonymization toolbox user manual. {online}, http://cs.utdallas.edu/dspl/cgi-bin/toolbox/anonManual.pdf, February 2010.Google Scholar
- D. Vu and A. Slavkovic. Differential privacy for clinical trial data: Preliminary evaluations. In Proc. of the 2009 IEEE International Conference on Data Mining Workshops (ICDMW), pages 138--143, Washington, DC, 2009. Google ScholarDigital Library
Recommendations
IMR based Anonymization for Privacy Preservation in Data Mining
KMO '16: Proceedings of the The 11th International Knowledge Management in Organizations Conference on The changing face of Knowledge Management Impacting SocietyPrivacy Preserving Data Mining (PPDM) is a data mining research area that aims to protect individual's personal information from unsolicited or unauthorized disclosure. Privacy relates to personal information that a person would not wish others to know ...
A review of privacy preserving models for multi-party data release framework
WIR '16: Proceedings of the ACM Symposium on Women in Research 2016Nowadays, with the improvement of internet technology and advancement in distributed computing data is increasing rapidly. There is a need of information sharing between organizations. Ideally, we wish to share data from multiple private databases and ...
On privacy preservation against adversarial data mining
KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data miningPrivacy preserving data processing has become an important topic recently because of advances in hardware technology which have lead to widespread proliferation of demographic and sensitive data. A rudimentary way to preserve privacy is to simply hide ...
Comments