skip to main content
10.1145/775047.775089acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Transforming data to satisfy privacy constraints

Published:23 July 2002Publication History

ABSTRACT

Data on individuals and entities are being collected widely. These data can contain information that explicitly identifies the individual (e.g., social security number). Data can also contain other kinds of personal information (e.g., date of birth, zip code, gender) that are potentially identifying when linked with other available data sets. Data are often shared for business or legal reasons. This paper addresses the important issue of preserving the anonymity of the individuals or entities during the data dissemination process. We explore preserving the anonymity by the use of generalizations and suppressions on the potentially identifying portions of the data. We extend earlier works in this area along various dimensions. First, satisfying privacy constraints is considered in conjunction with the usage for the data being disseminated. This allows us to optimize the process of preserving privacy for the specified usage. In particular, we investigate the privacy transformation in the context of data mining applications like building classification and regression models. Second, our work improves on previous approaches by allowing more flexible generalizations for the data. Lastly, this is combined with a more thorough exploration of the solution space using the genetic algorithm framework. These extensions allow us to transform the data so that they are more useful for their intended purpose while satisfying the privacy constraints.

References

  1. R. Agrawal and R. Srikant. Privacy-preserving data mining. In Proceedings of ACM SIGMOD Conference on Management of Data, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. Blake, E. Keogh, and C. Merz. UCI repository of machine learning databases. University of California, Irvine, Dept. of Information and Computer Science, URL=http://www.ics.uci.edu/~mlearn/MLRespository.html, 1998.Google ScholarGoogle Scholar
  3. G. Chen and S. Keller-McNulty. Estimation of identification risk in microdata. Journal of Official Statistics, 14(1):79--95, 1998.Google ScholarGoogle Scholar
  4. J. Domingo-Ferrer, J. Mateo-Sanz, and V. Torra. Comparing SDC methods for microdata on the basis of information loss and disclosure risk. In Proceedings of NTTS and ETK, 2001.Google ScholarGoogle Scholar
  5. J. Dougherty, R. Kohavi, and M. Sahami. Supervised and unsupervised discretization of continuous features. In Proceedings of Twelfth International Conference on Machine Learning, 1995.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. G. Duncan and D. Lambert. Disclosure-limited data dissemination. Journal of the American Statistical Association, 81(393):10--28, 1986.Google ScholarGoogle ScholarCross RefCross Ref
  7. D. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Holland. Adaptation in Natural and Artificial Systems. University of Michigan Press, 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Hong. Use of contextual information for feature ranking and discretization. IEEE Transactions on Knowledge and Data Engineering, 9(5):718--730, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Hundepool and L. Willenborg. μ- and τ- argus: Software for statistical disclosure control. In Proceedings of Third Internation Seminar on Statistical Confidentiality, 1996.Google ScholarGoogle Scholar
  11. J. Kim and W. Winkler. Masking microdata files. In ASA Proceedings of the Section on Survey Research Methods, pages 114--119, 1995.Google ScholarGoogle Scholar
  12. D. Lambert. Measures of disclosure risk and harm. Journal off Official Statistics, 9(2):313--331, 1993.Google ScholarGoogle Scholar
  13. J. Quinlan. Induction of decision trees. Machine Learning, 1:81--106, 1986. Google ScholarGoogle ScholarCross RefCross Ref
  14. P. Samarati. Protecting respondents' identities in microdata release. IEEE Transactions on Knowledge Engineering, 13(6):1010--1027, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. Samarati and L. Sweeney. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical Report Technical Report, SRI International, March 1998.Google ScholarGoogle Scholar
  16. C. Skinner. On identification disclosure and prediction disclosure for microdata. Statistica Neerlandica, 46(1):21--32, 1992.Google ScholarGoogle ScholarCross RefCross Ref
  17. L. Sweeney. Datafly: A system for providing anonymity in medical data. In Proceedings of Eleventh International Conference on Database Security, pages 356--381. Database Security XI: Status and Prospects, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Whitley. The genitor algorithm and selective pressure: Why rank-based allocation of reproductive trials is best. In Proceedings of Third International Conference on Genetic Algorithms, pages 116--121. Morgan Kaufmann, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. L. Willenborg and T. D. Waal. Statistical Disclosure Control in Practice. Springer-Verlag, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  20. L. Willenborg and T. D. Waal. Elements of Statistical Disclosure Control. Springer-Verlag, 2000.Google ScholarGoogle Scholar
  21. W. Yancey, W. Winkler, and R. Creecy. Disclosure risk assessment in perturbative microdata protection. Technical Report Research Report Statistics 2002--01, Statistical Research Division, U.S. Bureau of the Census, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Transforming data to satisfy privacy constraints

                  Recommendations

                  Comments

                  Login options

                  Check if you have access through your login credentials or your institution to get full access on this article.

                  Sign in
                  • Published in

                    cover image ACM Conferences
                    KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
                    July 2002
                    719 pages
                    ISBN:158113567X
                    DOI:10.1145/775047

                    Copyright © 2002 ACM

                    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                    Publisher

                    Association for Computing Machinery

                    New York, NY, United States

                    Publication History

                    • Published: 23 July 2002

                    Permissions

                    Request permissions about this article.

                    Request Permissions

                    Check for updates

                    Qualifiers

                    • Article

                    Acceptance Rates

                    KDD '02 Paper Acceptance Rate44of307submissions,14%Overall Acceptance Rate1,133of8,635submissions,13%

                    Upcoming Conference

                    KDD '24

                  PDF Format

                  View or Download as a PDF file.

                  PDF

                  eReader

                  View online with eReader.

                  eReader