research-article

PCTA: privacy-constrained clustering-based transaction data anonymization

Authors:
Aris Gkoulalas-Divanis

IBM Research-Zurich Rüschlikon, Switzerland

IBM Research-Zurich Rüschlikon, Switzerland
View Profile

,
Grigorios Loukides

Vanderbilt University Nashville, TN

Vanderbilt University Nashville, TN
View Profile

PAIS '11: Proceedings of the 4th International Workshop on Privacy and Anonymity in the Information SocietyMarch 2011Article No.: 5Pages 1–10https://doi.org/10.1145/1971690.1971695

Published:25 March 2011Publication History

PAIS '11: Proceedings of the 4th International Workshop on Privacy and Anonymity in the Information Society

Pages 1–10

ABSTRACT

Transaction data about individuals are increasingly collected to support a plethora of applications, spanning from marketing to biomedical studies. Publishing these data is required by many organizations, but may result in privacy breaches, if an attacker exploits potentially identifying information to link individuals to their records in the published data. Algorithms that prevent this threat by transforming transaction data prior to their release have been proposed recently, but incur significant information loss due to their inability to accommodate a range of different privacy requirements that data owners often have. To address this issue, we propose a novel clustering-based framework to anonymizing transaction data. Our framework provides the basis for designing algorithms that explore a larger solution space than existing methods, which allows publishing data with less information loss, and can satisfy a wide range of privacy requirements. Based on this framework, we develop PCTA, a generalization-based algorithm to construct anonymizations that incur a small amount of information loss under many different privacy requirements. Experiments with benchmark datasets verify that PCTA significantly outperforms the current state-of-the-art algorithms in terms of data utility, while being comparable in terms of efficiency.

References

National Institutes of Health. Policy for sharing of data obtained in NIH supported or conducted genome-wide association studies. NOT-OD-07-088. 2007.Google Scholar
Health insurance portability and accountability act of 1996 united states public law.Google Scholar
R. Bayardo and R. Agrawal. Data privacy through optimal k-anonymization. In ICDE, pages 217--228, 2005. Google ScholarDigital Library
J. Byun, A. Kamra, E. Bertino, and N. Li. Efficient k-anonymity using clustering technique. In DASFAA, pages 188--200, 2007. Google ScholarDigital Library
J. Cao, P. Karras, C. Raïssi, and K. Tan. rho-uncertainty: Inference-proof transaction anonymization. PVLDB. 3(1):1033--1044, 2010. Google ScholarDigital Library
C.-C. Chang, B. Thompson, H. Wang, and D. Yao. Towards publishing recommendation data with predictive anonymization. In 5th ACM Symposium on Information, Computer and Communications Security, pages 24--35, 2010. Google ScholarDigital Library
B. Chen, D. Kifer, K. LeFevre, and A. Machanavajjhala. Privacy-preserving data publishing. Found. Trends databases, 2(1--2):1--167, 2009. Google ScholarDigital Library
J. Domingo-Ferrer and V. Torra. Ordinal, continuous and heterogeneous k-anonymity through microaggregation. DMKD, 11(2):195--212, 2005. Google ScholarDigital Library
B. C. M. Fung, K. Wang, R. Chen, and P. S. Yu. Privacy-preserving data publishing: A survey on recent developments. ACM Comput. Surv., 42, 2010. Google ScholarDigital Library
B. C. M. Fung, K. Wang, and P. S. Yu. Top-down specialization for information and privacy preservation. In ICDE, pages 205--216, 2005. Google ScholarDigital Library
G. Ghinita, Y. Tao, and P. Kalnis. On the anonymization of sparse high-dimensional data. In ICDE, pages 715--724, 2008. Google ScholarDigital Library
A. Gkoulalas-Divanis and V. Verykios. A free terrain model for trajectory k-anonymity. In DEXA, pages 49--56, 2008. Google ScholarDigital Library
A. Gkoulalas-Divanis and V. S. Verykios. Privacy in Trajectory Data, chapter 11, pages 199--212. Social Implications of Data Mining and Information Privacy: Interdisciplinary Frameworks and Solutions. Information Science Reference, 2008.Google Scholar
Y. He and J. F. Naughton. Anonymization of set-valued data via top-down, local generalization. PVLDB, 2(1):934--945, 2009. Google ScholarDigital Library
V. S. Iyengar. Transforming data to satisfy privacy constraints. In KDD, pages 279--288, 2002. Google ScholarDigital Library
S. Jha, L. Kruger, and P. McDaniel. Privacy preserving clustering. In ESORICS, pages 397--417, 2005. Google ScholarDigital Library
S. Kisilevich, L. Rokach, Y. Elovici, and B. Shapira. Efficient multidimensional suppression for k-anonymity. TKDE, 22:334--347, 2010. Google ScholarDigital Library
K. LeFevre, D. DeWitt, and R. Ramakrishnan. Mondrian multidimensional k-anonymity. In ICDE, page 25, 2006. Google ScholarDigital Library
J. Li, R. Wong, A. Fu, and J. Pei. Achieving -anonymity by clustering in attribute hierarchical structures. In DaWaK, pages 405--416, 2006. Google ScholarDigital Library
K. Liu and E. Terzi. Towards identity anonymization on graphs. In 2008 SIGMOD, pages 93--106, 2008. Google ScholarDigital Library
G. Loukides, A. Gkoulalas-Divanis, and B. Malin. COAT: COnstraint-based Anonymization of Transactions. KAIS. To Appear. Google ScholarDigital Library
G. Loukides, A. Gkoulalas-Divanis, and B. Malin. Anonymization of electronic medical records for validating genome-wide association studies. PNAS, 17:7898--7903, 2010.Google ScholarCross Ref
G. Loukides, A. Gkoulalas-Divanis, and J. Shao. Anonymizing transaction data to eliminate sensitive inferences. In DEXA, pages 400--415, 2010. Google ScholarDigital Library
G. Loukides and J. Shao. Capturing data usefulness and privacy protection in k-anonymisation. In SAC, pages 370--374, 2007. Google ScholarDigital Library
A. Narayanan and V. Shmatikov. Robust de-anonymization of large sparse datasets. In IEEE S&P, pages 111--125, 2008. Google ScholarDigital Library
M. E. Nergiz and C. Clifton. Thoughts on k-anonymization. DKE, 63(3):622--645, 2007. Google ScholarDigital Library
T. D. of State Health Services. User manual of texas hospital inpatient discharge public use data file. http://www.dshs.state.tx.us/THCIC/, 2008.Google Scholar
R. G. Pensa, A. Monreale, F. Pinelli, and D. Pedreschi. Pattern-preserving k-anonymization of sequences and its application to mobility data mining. In Workshop on Privacy in Location-Based Applications, 2008.Google Scholar
S. J. Rizvi and J. R. Haritsa. Maintaining data privacy in association rule mining. In VLDB, pages 682--693, 2002. Google ScholarDigital Library
P. Samarati. Protecting respondents identities in microdata release. TKDE, 13(9):1010--1027, 2001. Google ScholarDigital Library
L. Sweeney. k-anonymity: a model for protecting privacy. IJUFKS, 10:557--570, 2002. Google ScholarDigital Library
M. Terrovitis, N. Mamoulis, and P. Kalnis. Local and global recoding methods for anonymizing set-valued data. VLDB J. To appear. Google ScholarDigital Library
M. Terrovitis, N. Mamoulis, and P. Kalnis. Privacy-preserving anonymization of set-valued data. PVLDB, 1(1):115--125, 2008. Google ScholarDigital Library
V. S. Verykios, M. L. Damiani, and A. Gkoulalas-Divanis. Privacy and Security in Spatiotemporal Data and Trajectories, chapter 8, pages 213--240. Mobility, Data Mining and Privacy: Geographic Knowledge Discovery. Springer, 2008.Google ScholarCross Ref
J. Xu, W. Wang, J. Pei, X. Wang, B. Shi, and A. W.-C. Fu. Utility-based anonymization using local recoding. In KDD, pages 785--790, 2006. Google ScholarDigital Library
R. Xu and D. C. Wunsch. Clustering. Wiley-IEEE Press, 2008. Google ScholarDigital Library
Y. Xu, K. Wang, A. W.-C. Fu, and P. S. Yu. Anonymizing transaction databases for publication. In KDD, pages 767--775, 2008. Google ScholarDigital Library
Z. Zheng, R. Kohavi, and L. Mason. Real world performance of association rule algorithms. In KDD, pages 401--406, 2001. Google ScholarDigital Library

Index Terms

PCTA: privacy-constrained clustering-based transaction data anonymization

Recommendations

Efficient and flexible anonymization of transaction data

Transaction data are increasingly used in applications, such as marketing research and biomedical studies. Publishing these data, however, may risk privacy breaches, as they often contain personal information about individuals. Approaches to anonymizing ...
Read More
Anonymizing transaction data by integrating suppression and generalization
PAKDD'10: Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I

Privacy protection in publishing transaction data is an important problem. A key feature of transaction data is the extreme sparsity, which renders any single technique ineffective in anonymizing such data. Among recent works, some incur high ...
Read More
Freedom of Privacy: Anonymous Data Collection with Respondent-Defined Privacy Protection

The massive amount of sensitive survey data about individuals that agencies collect and share through the Internet is causing a great deal of privacy concerns. These concerns may discourage individuals from revealing their sensitive information. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PAIS '11: Proceedings of the 4th International Workshop on Privacy and Anonymity in the Information Society
March 2011
62 pages
ISBN:9781450306119
DOI:10.1145/1971690
Editors:
Traian Marius Truta
Northern Kentucky University
,
Li Xiong
Emory University
,
Farshad Fotouh
Wayne State University
,
Kjell Orsborn
Uppsala University, Sweden
,
Silvia Stefanova
Uppsala University, Sweden
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 March 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
anonymity
clustering
database utility
privacy
privacy-preserving data mining
transaction data
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 16
  Total Citations
  View Citations
- 204
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

PCTA: privacy-constrained clustering-based transaction data anonymization

PAIS '11: Proceedings of the 4th International Workshop on Privacy and Anonymity in the Information Society

ABSTRACT

References

Cited By

Index Terms

Recommendations

Efficient and flexible anonymization of transaction data

Anonymizing transaction data by integrating suppression and generalization

Freedom of Privacy: Anonymous Data Collection with Respondent-Defined Privacy Protection

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

PCTA: privacy-constrained clustering-based transaction data anonymization

PAIS '11: Proceedings of the 4th International Workshop on Privacy and Anonymity in the Information Society

ABSTRACT

References

Cited By

Index Terms

Recommendations

Efficient and flexible anonymization of transaction data

Anonymizing transaction data by integrating suppression and generalization

Freedom of Privacy: Anonymous Data Collection with Respondent-Defined Privacy Protection

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media