research-article

Privacy preserving linear discriminant analysis from perturbed data

Authors:

Somnath Chakrabarti,

Zhiyuan Chen,

Aryya Gangopadhyay,

Shibnath MukherjeeAuthors Info & Claims

SAC '10: Proceedings of the 2010 ACM Symposium on Applied Computing

Pages 610 - 615

https://doi.org/10.1145/1774088.1774217

Published: 22 March 2010 Publication History

Get Access

Abstract

The ubiquity of the internet not only makes it very convenient for individuals or organizations to share data for data mining or statistical analysis, but also greatly increases the chance of privacy breach. There exist many techniques such as random perturbation to protect the privacy of such data sets. However, perturbation often has negative impacts on the quality of data mining or statistical analysis conducted over the perturbed data. This paper studies the impact of random perturbation for a popular data mining and analysis method: linear discriminant analysis. The contributions are two fold. First, we discover that for large data sets, the impact of perturbation is quite limited (i.e., high quality results may be obtained directly from perturbed data) if the perturbation process satisfies certain conditions. Second, we discover that for small data sets, the negative impact of perturbation can be reduced by publishing additional statistics about the perturbation along with the perturbed data. We provide both theoretical derivations and experimental verifications of these results.

References

[1]

C. C. Aggarwal and P. S. Yu. Privacy-Preserving Data Mining: Models and Algorithms. Springer, 2008.

Digital Library

Google Scholar

[2]

D. Agrawal and C. C. Aggarwal. On the design and quantification of privacy preserving data mining algorithms. In 20th ACM SIGMOD SIGACT-SIGART Symposium on Principles of Database Systems, pages 247--255, Santa Barbara, CA, 2001.

Digital Library

Google Scholar

[3]

R. Agrawal and R. Srikant. Privacy preserving data mining. In 2000 ACM SIGMOD Conference on Management of Data, pages 439--450, Dallas, TX, May 2000.

Digital Library

Google Scholar

[4]

T. Dalenius and S. P. Reiss. Data-swapping: A technique for disclosure control. Journal of Statistical Planning and Inference, 6:73--85, 1982.

Crossref

Google Scholar

[5]

A. Evfimevski, J. Gehrke, and R. Srikant. Limiting privacy breaches in privacy preserving data mining. In 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pages 211--222, San Diego, CA, June 2003.

Digital Library

Google Scholar

[6]

R. Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7:179--188, 1936.

Crossref

Google Scholar

[7]

S. Hettich, C. Blake, and C. Merz. UCI repository of machine learning databases, 1998.

Google Scholar

[8]

Z. Huang, W. Du, and B. Chen. Derivin private information from randomized data. In SIGMOD 2005, pages 37--48, Baltimore, MD, June 2005.

Digital Library

Google Scholar

[9]

H. Kargupta, S. Datta, Q. Wang, and K. Sivakumar. On the privacy preserving properties of random data perturbation techniques. In ICDM, pages 99--106, 2003.

Digital Library

Google Scholar

[10]

L. Liu, M. Kantarcioglu, and B. Thuraisingham. Privacy preserving decision tree mining from perturbed data. In HICSS, 2009.

Google Scholar

[11]

L. Liu, M. Kantarcioglu, and B. Thuraisingham. The applicability of the perturbation based privacy preserving data mining for real-world data. Data Knowl. Eng., 65:2008, 5--21.

Digital Library

Google Scholar

[12]

G. J. McLachlan. Discriminant Analysis and Statistical Pattern Recognition. Wiley Interscience, 2004.

Google Scholar

[13]

S. Mukherjee, M. Banerjee, Z. Chen, and A. Gangopadhyay. A privacy preserving technique for distance-based classification with worst case privacy guarantees. Data Knowl. Eng., 66(2):264--288, 2008.

Digital Library

Google Scholar

[14]

L. Sweeney. Achieving k-anonymity privacy protection using generalization and suppression. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10:2002, 571--588.

Digital Library

Google Scholar

Recommendations

Privacy-preserving data publishing
Privacy-Preserving Data Publishing: An Overview
Privacy Preserving Data Mining Techniques: Current Scenario and Future Prospects
ICCCT '12: Proceedings of the 2012 Third International Conference on Computer and Communication Technology

Privacy preserving has originated as an important concern with reference to the success of the data mining. Privacy preserving data mining (PPDM) deals with protecting the privacy of individual data or sensitive knowledge without sacrificing the utility ...

Comments

Information & Contributors

Information

Published In

SAC '10: Proceedings of the 2010 ACM Symposium on Applied Computing

March 2010

2712 pages

ISBN:9781605586397

DOI:10.1145/1774088

Conference Chairs:
Sung Y. Shin
South Dakota State University
,
Sascha Ossowski
University Rey Juan Carlos, Spain
,
Michael Schumacher
University of Applied Sciences Western Switzerland, Switzerland
,
Program Chairs:
Mathew J. Palakal
Indiana University Purdue University
,
Chih-Cheng Hung
Southern Polytechnic State University

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 March 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

Division of Information and Intelligent Systems

Conference

SAC'10

Sponsor:

SIGAPP

SAC'10: The 2010 ACM Symposium on Applied Computing

March 22 - 26, 2010

Sierre, Switzerland

Acceptance Rates

SAC '10 Paper Acceptance Rate 364 of 1,353 submissions, 27%;

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25

Sponsor:
sigapp

The 40th ACM/SIGAPP Symposium on Applied Computing

March 31 - April 4, 2025

Catania , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
186
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Recommendations

Privacy-preserving data publishing

Privacy-Preserving Data Publishing: An Overview

Privacy Preserving Data Mining Techniques: Current Scenario and Future Prospects

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations