Article

Transforming data to satisfy privacy constraints

Author:
Vijay S. Iyengar

Thomas J. Watson Research Center, Yorktown Heights, NY

Thomas J. Watson Research Center, Yorktown Heights, NY
View Profile

KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningJuly 2002Pages 279–288https://doi.org/10.1145/775047.775089

Published:23 July 2002Publication History

KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 279–288

ABSTRACT

Data on individuals and entities are being collected widely. These data can contain information that explicitly identifies the individual (e.g., social security number). Data can also contain other kinds of personal information (e.g., date of birth, zip code, gender) that are potentially identifying when linked with other available data sets. Data are often shared for business or legal reasons. This paper addresses the important issue of preserving the anonymity of the individuals or entities during the data dissemination process. We explore preserving the anonymity by the use of generalizations and suppressions on the potentially identifying portions of the data. We extend earlier works in this area along various dimensions. First, satisfying privacy constraints is considered in conjunction with the usage for the data being disseminated. This allows us to optimize the process of preserving privacy for the specified usage. In particular, we investigate the privacy transformation in the context of data mining applications like building classification and regression models. Second, our work improves on previous approaches by allowing more flexible generalizations for the data. Lastly, this is combined with a more thorough exploration of the solution space using the genetic algorithm framework. These extensions allow us to transform the data so that they are more useful for their intended purpose while satisfying the privacy constraints.

References

R. Agrawal and R. Srikant. Privacy-preserving data mining. In Proceedings of ACM SIGMOD Conference on Management of Data, 2000. Google ScholarDigital Library
C. Blake, E. Keogh, and C. Merz. UCI repository of machine learning databases. University of California, Irvine, Dept. of Information and Computer Science, URL=http://www.ics.uci.edu/~mlearn/MLRespository.html, 1998.Google Scholar
G. Chen and S. Keller-McNulty. Estimation of identification risk in microdata. Journal of Official Statistics, 14(1):79--95, 1998.Google Scholar
J. Domingo-Ferrer, J. Mateo-Sanz, and V. Torra. Comparing SDC methods for microdata on the basis of information loss and disclosure risk. In Proceedings of NTTS and ETK, 2001.Google Scholar
J. Dougherty, R. Kohavi, and M. Sahami. Supervised and unsupervised discretization of continuous features. In Proceedings of Twelfth International Conference on Machine Learning, 1995.Google ScholarDigital Library
G. Duncan and D. Lambert. Disclosure-limited data dissemination. Journal of the American Statistical Association, 81(393):10--28, 1986.Google ScholarCross Ref
D. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, 1989. Google ScholarDigital Library
J. Holland. Adaptation in Natural and Artificial Systems. University of Michigan Press, 1975. Google ScholarDigital Library
S. Hong. Use of contextual information for feature ranking and discretization. IEEE Transactions on Knowledge and Data Engineering, 9(5):718--730, 1997. Google ScholarDigital Library
A. Hundepool and L. Willenborg. μ- and τ- argus: Software for statistical disclosure control. In Proceedings of Third Internation Seminar on Statistical Confidentiality, 1996.Google Scholar
J. Kim and W. Winkler. Masking microdata files. In ASA Proceedings of the Section on Survey Research Methods, pages 114--119, 1995.Google Scholar
D. Lambert. Measures of disclosure risk and harm. Journal off Official Statistics, 9(2):313--331, 1993.Google Scholar
J. Quinlan. Induction of decision trees. Machine Learning, 1:81--106, 1986. Google ScholarCross Ref
P. Samarati. Protecting respondents' identities in microdata release. IEEE Transactions on Knowledge Engineering, 13(6):1010--1027, 2001. Google ScholarDigital Library
P. Samarati and L. Sweeney. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical Report Technical Report, SRI International, March 1998.Google Scholar
C. Skinner. On identification disclosure and prediction disclosure for microdata. Statistica Neerlandica, 46(1):21--32, 1992.Google ScholarCross Ref
L. Sweeney. Datafly: A system for providing anonymity in medical data. In Proceedings of Eleventh International Conference on Database Security, pages 356--381. Database Security XI: Status and Prospects, 1998. Google ScholarDigital Library
D. Whitley. The genitor algorithm and selective pressure: Why rank-based allocation of reproductive trials is best. In Proceedings of Third International Conference on Genetic Algorithms, pages 116--121. Morgan Kaufmann, 1989. Google ScholarDigital Library
L. Willenborg and T. D. Waal. Statistical Disclosure Control in Practice. Springer-Verlag, 1996.Google ScholarCross Ref
L. Willenborg and T. D. Waal. Elements of Statistical Disclosure Control. Springer-Verlag, 2000.Google Scholar
W. Yancey, W. Winkler, and R. Creecy. Disclosure risk assessment in perturbative microdata protection. Technical Report Research Report Statistics 2002--01, Statistical Research Division, U.S. Bureau of the Census, 2002. Google ScholarDigital Library

Index Terms

Recommendations

K-Anonymity for Preserving Data on Hands-Using Android Application Development
ISEC '16: Proceedings of the 9th India Software Engineering Conference

In this Paper, privacy preserving of personal data using K-anonymity on hands- an Android Application is developed. Due to vast increase and its usage many people are interested to carry a mobile instead of a lap-top, because mobile is not only confined ...
Read More
A Study on the Impact of Data Anonymization on Anti-discrimination
ICDMW '12: Proceedings of the 2012 IEEE 12th International Conference on Data Mining Workshops

In last years, data mining has raised some concerns related to privacy invasion of the individuals and potential discrimination based on the extracted patterns and profiles. Efforts at fighting against these risks have led to developing privacy ...
Read More
Yet another privacy metric for publishing micro-data
WPES '08: Proceedings of the 7th ACM workshop on Privacy in the electronic society

Recently many schemes, including k-anonymity [8], l-diversity [6] and t-closeness [5] have been introduced for preserving individual privacy when publishing database tables. Furthermore k-anonymity and l-diversity have been shown to have weaknesses. In ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
July 2002
719 pages
ISBN:158113567X
DOI:10.1145/775047
Conference Chair:
Osmar R. Zaïane
University of Alberta, Canada
,
General Chair:
Randy Goebel
University of Alberta, Canada
,
Program Chairs:
David Hand
Imperial College, UK
,
Daniel Keim
AT&T
,
Raymond Ng
University of British Columbia, Canada
Copyright © 2002 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 July 2002
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data transformation
generalization
predictive modeling
privacy
suppression
Qualifiers
- Article
Conference

Acceptance Rates
KDD '02 Paper Acceptance Rate44of307submissions,14%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 539
  Total Citations
  View Citations
- 3,085
  Total Downloads
- Downloads (Last 12 months)118
- Downloads (Last 6 weeks)17
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Transforming data to satisfy privacy constraints

KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

K-Anonymity for Preserving Data on Hands-Using Android Application Development

A Study on the Impact of Data Anonymization on Anti-discrimination

Yet another privacy metric for publishing micro-data