research-article

On static and dynamic methods for condensation-based privacy-preserving data mining

Authors:

Charu C. Aggarwal,

Philip S. YuAuthors Info & Claims

ACM Transactions on Database Systems (TODS), Volume 33, Issue 1

Article No.: 2, Pages 1 - 39

https://doi.org/10.1145/1331904.1331906

Published: 21 March 2008 Publication History

Abstract

In recent years, privacy-preserving data mining has become an important problem because of the large amount of personal data which is tracked by many business applications. In many cases, users are unwilling to provide personal information unless the privacy of sensitive information is guaranteed. In this paper, we propose a new framework for privacy-preserving data mining of multidimensional data. Previous work for privacy-preserving data mining uses a perturbation approach which reconstructs data distributions in order to perform the mining. Such an approach treats each dimension independently and therefore ignores the correlations between the different dimensions. In addition, it requires the development of a new distribution-based algorithm for each data mining problem, since it does not use the multidimensional records, but uses aggregate distributions of the data as input. This leads to a fundamental re-design of data mining algorithms. In this paper, we will develop a new and flexible approach for privacy-preserving data mining that does not require new problem-specific algorithms, since it maps the original data set into a new anonymized data set. These anonymized data closely match the characteristics of the original data including the correlations among the different dimensions. We will show how to extend the method to the case of data streams. We present empirical results illustrating the effectiveness of the method. We also show the efficiency of the method for data streams.

References

[1]

Aggarwal, C. C. 2005. On k-anonymity and the curse of dimensionality. In Proceedings International Conference on Very Large Databases.

Digital Library

[2]

Aggarwal, C. C. and Yu, P. S. 2004. A condensation approach to privacy-preserving data mining. In Proceedings of the International Conference on Extending Database Technology.

[3]

Agrawal, D. and Aggarwal, C. C. 2000. On the design and quantification of privacy preserving data mining algorithms. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Conference on Principles of Database Systems.

Digital Library

[4]

Agrawal, R. and Srikant, R. 1994. Fast algorithms for mining association rules. In Proceedings of the International Conference on Very Large Databases.

Digital Library

[5]

Agrawal, R. and Srikant, R. 2000. Privacy-preserving data mining. In Proceedings of the ACM SIGMOD Conference.

Digital Library

[6]

Agrawal, S. and Haritsa, J. 2005. A framework for high accuracy privacy-preserving data mining. In Proceedings of the International Conference on Data Engineering.

Digital Library

[7]

Atzori, M., Bonchi, F., Giannotti, F., and Pedreschi, D. 2008. Anonymity preserving pattern discovery. VLDB J. To appear.

Digital Library

[8]

Bayardo, R. J. and Agrawal, R. 2005. Data privacy through optimal k-anonymization. In Proceedings of the International Conference on Data Engineering.

Digital Library

[9]

Benassi, P. 1999. Truste: An online privacy seal program. Comm. ACM 42, 2, 56--59.

Digital Library

[10]

Blum, A., Dwork, C., McSherry, F., and Nissim, K. 2005. Practical privacy: The SuLQ framework. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Conference on Principles of Database Systems.

Digital Library

[11]

Brucker, P. 1977. On the complexity of clustering problems. R. Henn, B. Korte, W. Oettli, Eds. Optimization and Operations Research, Springer-Verlag, 45--54.

[12]

Chawla, C., Dwork, C., McSherry, F., Smith, A., and Wee, H. 2004. Towards privacy in public databases. In Proceedings of the Electronic Data Workshop on Multiparty Protocols.

[13]

Domingo-Ferrer, J. and Mateo-Sanz, J. M. 2002. Practical data-oriented micro-aggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14, 1, 189--201.

Digital Library

[14]

Estivill-Castro, V. and Brankovic, L. 1999. Data swapping: Balancing privacy against precision in mining for logic rules. Lecture Notes in Computer Science Vol. 1676, Springer Verlag, 389--398.

Digital Library

[15]

Evfimievski, A., Srikant, R., Agrawal, R., and Gehrke, J. 2002. Privacy-preserving mining of association rules. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining.

Digital Library

[16]

Friedman, A., Schuster, A., and Wolff, R. 2006. Providing k-anonymity in data mining. VLDB Journal, to appear, Online version available since November 2006.

Digital Library

[17]

Fung, B., Wang, K., and Yu, P. 2002. Top down specialization for information and privacy-preservation. In Proceedings of the International Conference on Data Engineering.

Digital Library

[18]

Fung, B., Wang, K., and Yu, P. 2007. Anonymizing classification data for privacy preservation. IEEE Trans. Knowl. Data Eng. 19, 5, 711--725.

Digital Library

[19]

Hinneburg, D. A. and Keim, D. A. 1998. An efficient approach to clustering in large multimedia databases with noise. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining.

[20]

Iyengar, V. S. 2002. Transforming data to satisfy privacy constraints. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining.

Digital Library

[21]

Kargupta, H., Datta, S., Wang, Q., and Sivakumar, K. 2003. On the privacy-preserving properties of random data perturbation techniques. In Proceedings of the International Conference on Data Mining.

Digital Library

[22]

Lakshmanan, L. V. S., Ng, R. T., and Ramesh, G. 2005. To do or not to do: The dilemma of disclosing anonymized data. In Proceedings of the ACM SIGMOD Conference.

Digital Library

[23]

Lau, T., Etzioni, O., and Weld, D. S. 1999. Privacy interfaces for information management. Comm. ACM 42, 10, 89--94.

Digital Library

[24]

LeFevre, K., DeWitt, D., and Ramakrishnan, R. 2005. Incognito: Efficient full domain k-anonymity. In Proceedings of the ACM SIGMOD Conference.

Digital Library

[25]

LeFevre, K., DeWitt, D., and Ramakrishnan, R. 2006. Workload-aware anonymization. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining.

Digital Library

[26]

Liew, C. K., Choi, U. J., and Liew, C. J. 1985. A data distortion by probability distribution. ACM Trans. Datab. Syst. 10, 3, 395--411.

Digital Library

[27]

Liu, K., Kargupta, H., and Ryan, J. 2006. Random projection based multiplicative data perturbation for privacy preserving data mining. IEEE Trans. Knowl. Data Eng. 18, 1.

Digital Library

[28]

Machanavajjhala, A., Gehrke, J., and Kifer, D. 2006. i-Diversity: Privacy beyond k-anonymity. In Proceedings of the IEEE International Conference on Data Engineering.

Digital Library

[29]

Meyerson, A. and Williams, R. 2004. On the complexity of optimal k-anonymity. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Conference on Principles of Database Systems.

Digital Library

[30]

Mishra, N. and Sandler, M. 2006. Privacy via pseudo-random sketches. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Conference on Principles of Database Systems, 143--152.

Digital Library

[31]

Murthy, S. 1998. Automatic construction of decision trees from data: A multi-disciplinary survey. Data Min. Knowl. Discov. 2, 345--389.

Digital Library

[32]

Nabar, S., Marthi, B., Kenthapadi, K., Mishra, N., and Motwani, R. 2005. Towards robustness in query auditing. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Conference on Principles of Database Systems.

[33]

Pferschy, U., Rudolf, R., and Woeginger, G. 1994. Some geometric clustering problem. Nordic J. Comput. 246--263.

Digital Library

[34]

Reiss, S. 1984. Practical data swapping: The first steps. ACM Trans. Datab. Syst. 9, 1, 20--37.

Digital Library

[35]

Rizvi, S. and Haritsa, J. 2002. Maintaining data privacy in association rule mining. In Proceedings of the International Conference on Very Large Databases.

Digital Library

[36]

Samarati, P. 2001. Protecting respondents identities in microdata release. IEEE Trans. Knowl. Data Eng. 13, 6, 1010--1027.

Digital Library

[37]

Silverman, B. W. 1986. Density Estimation for Statistics and Data Analysis. Chapman and Hall.

[38]

Vaidya, J. and Clifton, C. 2002. Privacy-preserving association rule mining in vertically partitioned data. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining.

Digital Library

[39]

Xiao, X. and Tao, Y. 2006. Personalized privacy preservation. In Proceedings of the ACM SIGMOD Conference.

Digital Library

[40]

Zhong, S., Yang, Z., and Wright, R. 2005. Privacy-Enhancing k-anonymization of customer data. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Conference on Principles of Database Systems.

Digital Library

Cited By

Aleroud AShariah MMalkawi RKhamaiseh SAl-Alaj A(2023)A privacy-enhanced human activity recognition using GAN & entropy ranking of microaggregated dataCluster Computing10.1007/s10586-023-04063-127:2(2117-2132)Online publication date: 22-Jun-2023
https://dl.acm.org/doi/10.1007/s10586-023-04063-1
Hewage USinha RNaeem M(2023)Privacy-preserving data (stream) mining techniques and their impact on data mining accuracy: a systematic literature reviewArtificial Intelligence Review10.1007/s10462-023-10425-356:9(10427-10464)Online publication date: 22-Feb-2023
https://dl.acm.org/doi/10.1007/s10462-023-10425-3
Aleroud AShariah MMalkawi R(2022)Privacy Preserving Human Activity Recognition Using Microaggregated Generative Deep Learning2022 IEEE International Conference on Cyber Security and Resilience (CSR)10.1109/CSR54599.2022.9850328(357-363)Online publication date: 27-Jul-2022
https://doi.org/10.1109/CSR54599.2022.9850328
Show More Cited By

Index Terms

On static and dynamic methods for condensation-based privacy-preserving data mining
1. Information systems
  1. Information systems applications

Recommendations

A Review on Privacy-Preserving Data Mining
CIT '14: Proceedings of the 2014 IEEE International Conference on Computer and Information Technology

Data mining has been widely studied and applied into many fields such as Internet of Things (IoT) and business development. However, data mining techniques also occur serious challenges due to increased sensitive information disclosure and privacy ...
Privacy-preserving data mining: A feature set partitioning approach

In privacy-preserving data mining (PPDM), a widely used method for achieving data mining goals while preserving privacy is based on k-anonymity. This method, which protects subject-specific sensitive data by anonymizing it before it is released for data ...
Privacy Preserving in Data Mining Using Hybrid Approach
CICN '12: Proceedings of the 2012 Fourth International Conference on Computational Intelligence and Communication Networks

Data sharing between two organizations is common in many application areas like business planning or marketing. When data are to be shared between parties, there could be some sensitive data which should not be disclosed to the other parties. Also ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems

ACM Transactions on Database Systems Volume 33, Issue 1

March 2008

211 pages

ISSN:0362-5915

EISSN:1557-4644

DOI:10.1145/1331904

Issue’s Table of Contents

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 March 2008

Accepted: 01 October 2007

Revised: 01 October 2006

Received: 01 April 2006

Published in TODS Volume 33, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

27
Total Citations
View Citations
1,674
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)2

Reflects downloads up to 04 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Aleroud AShariah MMalkawi RKhamaiseh SAl-Alaj A(2023)A privacy-enhanced human activity recognition using GAN & entropy ranking of microaggregated dataCluster Computing10.1007/s10586-023-04063-127:2(2117-2132)Online publication date: 22-Jun-2023
https://dl.acm.org/doi/10.1007/s10586-023-04063-1
Hewage USinha RNaeem M(2023)Privacy-preserving data (stream) mining techniques and their impact on data mining accuracy: a systematic literature reviewArtificial Intelligence Review10.1007/s10462-023-10425-356:9(10427-10464)Online publication date: 22-Feb-2023
https://dl.acm.org/doi/10.1007/s10462-023-10425-3
Aleroud AShariah MMalkawi R(2022)Privacy Preserving Human Activity Recognition Using Microaggregated Generative Deep Learning2022 IEEE International Conference on Cyber Security and Resilience (CSR)10.1109/CSR54599.2022.9850328(357-363)Online publication date: 27-Jul-2022
https://doi.org/10.1109/CSR54599.2022.9850328
Saifan ALataifeh Z(2021)Privacy preserving defect prediction using generalization and entropy-based data reductionIntelligent Data Analysis10.3233/IDA-20550425:6(1369-1405)Online publication date: 29-Oct-2021
https://doi.org/10.3233/IDA-205504
Yao LChen ZWang XLiu DWu G(2021)Sensitive Label Privacy Preservation with Anatomization for Data PublishingIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2019.291983318:2(904-917)Online publication date: 1-Mar-2021
https://doi.org/10.1109/TDSC.2019.2919833
Rafiei Mvan der Aalst W(2020)Privacy-Preserving Data Publishing in Process MiningBusiness Process Management Forum10.1007/978-3-030-58638-6_8(122-138)Online publication date: 2-Sep-2020
https://doi.org/10.1007/978-3-030-58638-6_8
Chamikara MBertok PLiu DCamtepe SKhalil I(2019)An Efficient and Scalable Privacy Preserving Algorithm for Big Data and Data StreamsComputers & Security10.1016/j.cose.2019.101570(101570)Online publication date: Jul-2019
https://doi.org/10.1016/j.cose.2019.101570
Li XZhao HYu DWang LLiu P(2019)Multidimensional Correlation Hierarchical Differential Privacy for Medical Data with Multiple Privacy RequirementsProceedings of the 2nd International Conference on Healthcare Science and Engineering10.1007/978-981-13-6837-0_12(153-173)Online publication date: 10-May-2019
https://doi.org/10.1007/978-981-13-6837-0_12
Solanki UKadhiwala B(2019)Comparative Analysis of Privacy Preserving Approaches for Collaborative Data ProcessingIntelligent Communication Technologies and Virtual Mobile Networks10.1007/978-3-030-28364-3_18(199-206)Online publication date: 13-Aug-2019
https://doi.org/10.1007/978-3-030-28364-3_18
Chamikara MBertok PLiu DCamtepe SKhalil I(2018)Efficient data perturbation for privacy preserving and accurate data stream miningPervasive and Mobile Computing10.1016/j.pmcj.2018.05.00348(1-19)Online publication date: Aug-2018
https://doi.org/10.1016/j.pmcj.2018.05.003
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents