skip to main content
10.1145/1774088.1774217acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Privacy preserving linear discriminant analysis from perturbed data

Published:22 March 2010Publication History

ABSTRACT

The ubiquity of the internet not only makes it very convenient for individuals or organizations to share data for data mining or statistical analysis, but also greatly increases the chance of privacy breach. There exist many techniques such as random perturbation to protect the privacy of such data sets. However, perturbation often has negative impacts on the quality of data mining or statistical analysis conducted over the perturbed data. This paper studies the impact of random perturbation for a popular data mining and analysis method: linear discriminant analysis. The contributions are two fold. First, we discover that for large data sets, the impact of perturbation is quite limited (i.e., high quality results may be obtained directly from perturbed data) if the perturbation process satisfies certain conditions. Second, we discover that for small data sets, the negative impact of perturbation can be reduced by publishing additional statistics about the perturbation along with the perturbed data. We provide both theoretical derivations and experimental verifications of these results.

References

  1. C. C. Aggarwal and P. S. Yu. Privacy-Preserving Data Mining: Models and Algorithms. Springer, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Agrawal and C. C. Aggarwal. On the design and quantification of privacy preserving data mining algorithms. In 20th ACM SIGMOD SIGACT-SIGART Symposium on Principles of Database Systems, pages 247--255, Santa Barbara, CA, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Agrawal and R. Srikant. Privacy preserving data mining. In 2000 ACM SIGMOD Conference on Management of Data, pages 439--450, Dallas, TX, May 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. Dalenius and S. P. Reiss. Data-swapping: A technique for disclosure control. Journal of Statistical Planning and Inference, 6:73--85, 1982.Google ScholarGoogle ScholarCross RefCross Ref
  5. A. Evfimevski, J. Gehrke, and R. Srikant. Limiting privacy breaches in privacy preserving data mining. In 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pages 211--222, San Diego, CA, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7:179--188, 1936.Google ScholarGoogle ScholarCross RefCross Ref
  7. S. Hettich, C. Blake, and C. Merz. UCI repository of machine learning databases, 1998.Google ScholarGoogle Scholar
  8. Z. Huang, W. Du, and B. Chen. Derivin private information from randomized data. In SIGMOD 2005, pages 37--48, Baltimore, MD, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. H. Kargupta, S. Datta, Q. Wang, and K. Sivakumar. On the privacy preserving properties of random data perturbation techniques. In ICDM, pages 99--106, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L. Liu, M. Kantarcioglu, and B. Thuraisingham. Privacy preserving decision tree mining from perturbed data. In HICSS, 2009.Google ScholarGoogle Scholar
  11. L. Liu, M. Kantarcioglu, and B. Thuraisingham. The applicability of the perturbation based privacy preserving data mining for real-world data. Data Knowl. Eng., 65:2008, 5--21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. J. McLachlan. Discriminant Analysis and Statistical Pattern Recognition. Wiley Interscience, 2004.Google ScholarGoogle Scholar
  13. S. Mukherjee, M. Banerjee, Z. Chen, and A. Gangopadhyay. A privacy preserving technique for distance-based classification with worst case privacy guarantees. Data Knowl. Eng., 66(2):264--288, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. L. Sweeney. Achieving k-anonymity privacy protection using generalization and suppression. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10:2002, 571--588. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    SAC '10: Proceedings of the 2010 ACM Symposium on Applied Computing
    March 2010
    2712 pages
    ISBN:9781605586397
    DOI:10.1145/1774088

    Copyright © 2010 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 22 March 2010

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    SAC '10 Paper Acceptance Rate364of1,353submissions,27%Overall Acceptance Rate1,650of6,669submissions,25%
  • Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader