skip to main content
research-article

Centralized and Distributed Anonymization for High-Dimensional Healthcare Data

Published: 01 October 2010 Publication History

Abstract

Sharing healthcare data has become a vital requirement in healthcare system management; however, inappropriate sharing and usage of healthcare data could threaten patients’ privacy. In this article, we study the privacy concerns of sharing patient information between the Hong Kong Red Cross Blood Transfusion Service (BTS) and the public hospitals. We generalize their information and privacy requirements to the problems of centralized anonymization and distributed anonymization, and identify the major challenges that make traditional data anonymization methods not applicable. Furthermore, we propose a new privacy model called LKC-privacy to overcome the challenges and present two anonymization algorithms to achieve LKC-privacy in both the centralized and the distributed scenarios. Experiments on real-life data demonstrate that our anonymization algorithms can effectively retain the essential information in anonymous data for data analysis and is scalable for anonymizing large datasets.

References

[1]
Adam, N. R. and Wortman, J. C. 1989. Security control methods for statistical databases. ACM Comput. Surv. 21, 4, 515--556.
[2]
Aggarwal, C. C. 2005. On k-anonymity and the curse of dimensionality. In Proceedings of the International Conference on Very Large Databases.
[3]
Aggarwal, C. C. and Yu, P. S. 2008. Privacy Preserving Data Mining: Models and Algorithms. Springer.
[4]
Agrawal, R. and Srikant, R. 2000. Privacy preserving data mining. In Proceedings of the ACM SIGMOD International Conference on Management of Data.
[5]
Bayardo, R. J. and Agrawal, R. 2005. Data privacy through optimal k-anonymization. In Proceedings of the International Conference on Data Engineering.
[6]
Blum, A., Dwork, C., McSherry, F., and Nissim, K. 2005. Practical privacy: the sulq framework. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems.
[7]
Carlisle, D. M., Rodrian, M. L., and Diamond, C. L. 2007. California inpatient data reporting manual, medical information reporting for California, 5th edition. Tech. rep., Office of Statewide Health Planning and Development.
[8]
Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X., and Zhu, M. Y. 2002. Tools for privacy preserving distributed data mining. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Explor. Newslett. 4, 2, 28--34.
[9]
Dinur, I. and Nissim, K. 2003. Revealing information while preserving privacy. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems.
[10]
Du, W., Han, Y. S., and Chen, S. 2004. Privacy-preserving multivariate statistical analysis: Linear regression and classification. In Proceedings of the SIAM International Conference on Data Mining.
[11]
Du, W. and Zhan, Z. 2002. Building decision tree classifier on private data. In Proceedings of the IEEE ICDM Workshop on Privacy, Security, and Data Mining.
[12]
Du, W. L. 2001. A study of several specific secure two-party computation problems. PhD thesis, Purdue University, West Lafayette.
[13]
Dwork, C. 2006. Differential privacy. In Proceedings of the International Colloquium on Automata, Languages, and Programming.
[14]
Dwork, C., McSherry, F., Nissim, K., and Smith, A. 2006. Calibrating noise to sensitivity in private data analysis. In Proceedings of the Theory of Cryptography Conference.
[15]
Fuller, W. A. 1993. Masking procedures for microdata disclosure limitation. Official Statistics.
[16]
Fung, B. C. M., Wang, K., Chen, R., and Yu, P. S. 2010. Privacy-preserving data publishing: A survey of recent developments. ACM Comput. Surv. 42, 4, 1--53.
[17]
Fung, B. C. M., Wang, K., and Yu, P. S. 2007. Anonymizing classification data for privacy preservation. IEEE Trans. Knowl. Data Engin. 19, 5, 711--725.
[18]
Gardner, J. and Xiong, L. 2009. An integrated framework for de-identifying heterogeneous data. Data Knowl. Engin.
[19]
Ghinita, G., Tao, Y., and Kalnis, P. 2008. On the anonymization of sparse high-dimensional data. In Proceedings of the International Conference on Data Engineering.
[20]
Iyengar, V. S. 2002. Transforming data to satisfy privacy constraints. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
[21]
Jiang, W. and Clifton, C. 2005. Privacy-preserving distributed k-anonymity. In Proceedings of the Working Conference on Data and Applications Security.
[22]
Jiang, W. and Clifton, C. 2006. A secure distributed framework for achieving k-anonymity. J. VLDB 15, 4, 316--333.
[23]
Jurczyk, P. and Xiong, L. 2008. Towards privacy-preserving integration of distributed heterogeneous data. In Proceedings of the PhD Workshop on Information and Knowledge Management (PIKM).
[24]
Jurczyk, P. and Xiong, L. 2009. Distributed anonymization: Achieving privacy for both data subjects and data providers. In Proceedings of the Working Conference on Data and Applications Security.
[25]
Kim, J. and Winkler, W. 1995. Masking microdata files. In Proceedings of the ASA Section on Survey Research Methods.
[26]
LeFevre, K., DeWitt, D. J., and Ramakrishnan, R. 2006. Mondrian multidimensional k-anonymity. In Proceedings of the International Conference on Data Engineering.
[27]
LeFevre, K., DeWitt, D. J., and Ramakrishnan, R. 2008. Workload-aware anonymization techniques for large-scale datasets. ACM Trans. Datab. Syst.
[28]
Machanavajjhala, A., Kifer, D., Gehrke, J., and Venkitasubramaniam, M. 2007. ℓ-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data.
[29]
Mohammed, N., Fung, B. C. M., Hung, P. C. K., and Lee, C. 2009a. Anonymizing healthcare data: A case study on the blood transfusion service. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
[30]
Mohammed, N., Fung, B. C. M., Wang, K., and Hung, P. C. K. 2009b. Privacy-preserving data mashup. In Proceedings of the International Conference on Extending Database Technology.
[31]
Newman, D. J., Hettich, S., Blake, C. L., and Merz, C. J. 1998. UCI Repository of Machine Learning Databases.
[32]
Quinlan, J. R. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann.
[33]
Samarati, P. 2001. Protecting respondents’ identities in microdata release. IEEE Trans. Knowl. Data Engin.
[34]
Schneier, B. 1995. Applied Cryptography. 2nd Ed. John Wiley & Sons.
[35]
Skowron, A. and Rauszer, C. 1992. The discernibility matrices and functions in information systems. In Intelligent Decision Support: Handbook of Applications and Advances of the Rough Set Theory.
[36]
Sweeney, L. 2002. k-anonymity: A model for protecting privacy. Int. J. Uncert. Fuzz. Knowl. Based Syst.
[37]
Terrovitis, M., Mamoulis, N., and Kalnis, P. 2008. Privacy-preserving anonymization of set-valued data. In Proceedings of the International Conference on Very Large Databases.
[38]
Vaidya, J. and Clifton, C. 2002. Privacy preserving association rule mining in vertically partitioned data. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
[39]
Vaidya, J. and Clifton, C. 2003. Privacy-preserving k-means clustering over vertically partitioned data. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
[40]
Wang, K., Fung, B. C. M., and Yu, P. S. 2007. Handicapping attacker’s confidence: An alternative to k-anonymization. Knowl. Inform. Syst. 11, 3, 345--368.
[41]
Wong, R. C. W., Li., J., Fu, A. W. C., and Wang, K. 2006. α, k-anonymity: An enhanced k-anonymity model for privacy preserving data publishing. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
[42]
Xiao, X. and Tao, Y. 2006a. Anatomy: Simple and effective privacy preservation. In Proceedings of the International Conference on Very Large Databases.
[43]
Xiao, X. and Tao, Y. 2006b. Personalized privacy preservation. In Proceedings of the ACM SIGMOD International Conference on Management of Data.
[44]
Xu, Y., Wang, K., Fu, A. W. C., and Yu, P. S. 2008. Anonymizing transaction databases for publication. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
[45]
Yang, Z., Zhong, S., and Wright, R. N. 2005. Privacy-preserving classification of customer data without loss of accuracy. In Proceedings of the SIAM International Conference on Data Mining.
[46]
Zhao, K., Liu, B., Tirpak, T. M., and Xiao, W. 2005. A visual data mining framework for convenient identification of useful knowledge. In Proceedings of the IEEE ICDM: IEEE International Conference on Data Mining.

Cited By

View all
  • (2025)Privacy-preserving multidimensional big data analytics models, methods and techniques: A comprehensive surveyExpert Systems with Applications10.1016/j.eswa.2025.126387270(126387)Online publication date: Apr-2025
  • (2024)Optimizing Privacy While Limiting Information Loss in Distributed Data Anonymization2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825321(352-359)Online publication date: 15-Dec-2024
  • (2024)A Review of Anonymization for Healthcare DataBig Data10.1089/big.2021.016912:6(538-555)Online publication date: 1-Dec-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 4, Issue 4
October 2010
121 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/1857947
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2010
Accepted: 01 June 2010
Received: 01 January 2010
Published in TKDD Volume 4, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Privacy
  2. anonymity
  3. classification
  4. healthcare

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)60
  • Downloads (Last 6 weeks)10
Reflects downloads up to 27 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Privacy-preserving multidimensional big data analytics models, methods and techniques: A comprehensive surveyExpert Systems with Applications10.1016/j.eswa.2025.126387270(126387)Online publication date: Apr-2025
  • (2024)Optimizing Privacy While Limiting Information Loss in Distributed Data Anonymization2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825321(352-359)Online publication date: 15-Dec-2024
  • (2024)A Review of Anonymization for Healthcare DataBig Data10.1089/big.2021.016912:6(538-555)Online publication date: 1-Dec-2024
  • (2024)A divide-and-conquer approach to privacy-preserving high-dimensional big data releaseJournal of Information Security and Applications10.1016/j.jisa.2024.10375683(103756)Online publication date: Jun-2024
  • (2024)Privacy-preserving edge Federated Learning for intelligent mobile-health systemsFuture Generation Computer Systems10.1016/j.future.2024.07.035Online publication date: Jul-2024
  • (2024) Algorithm to satisfy l ‐diversity by combining dummy records and grouping SECURITY AND PRIVACY10.1002/spy2.3737:3Online publication date: 7-Feb-2024
  • (2023)Survey on Privacy-Preserving Techniques for Microdata PublicationACM Computing Surveys10.1145/358876555:14s(1-42)Online publication date: 28-Mar-2023
  • (2023)Leveraging Data Science to Advance Implementation Science: The Case of School Mental HealthJournal of School Health10.1111/josh.1338593:11(1045-1048)Online publication date: 14-Aug-2023
  • (2023)Leveraging Generative AI Models for Synthetic Data Generation in Healthcare: Balancing Research and Privacy2023 International Conference on Smart Applications, Communications and Networking (SmartNets)10.1109/SmartNets58706.2023.10215825(1-4)Online publication date: 25-Jul-2023
  • (2023)Cogni-Sec: A secure cognitive enabled distributed reinforcement learning model for medical cyber–physical systemInternet of Things10.1016/j.iot.2023.10097824(100978)Online publication date: Dec-2023
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media