skip to main content
research-article

On static and dynamic methods for condensation-based privacy-preserving data mining

Published: 21 March 2008 Publication History

Abstract

In recent years, privacy-preserving data mining has become an important problem because of the large amount of personal data which is tracked by many business applications. In many cases, users are unwilling to provide personal information unless the privacy of sensitive information is guaranteed. In this paper, we propose a new framework for privacy-preserving data mining of multidimensional data. Previous work for privacy-preserving data mining uses a perturbation approach which reconstructs data distributions in order to perform the mining. Such an approach treats each dimension independently and therefore ignores the correlations between the different dimensions. In addition, it requires the development of a new distribution-based algorithm for each data mining problem, since it does not use the multidimensional records, but uses aggregate distributions of the data as input. This leads to a fundamental re-design of data mining algorithms. In this paper, we will develop a new and flexible approach for privacy-preserving data mining that does not require new problem-specific algorithms, since it maps the original data set into a new anonymized data set. These anonymized data closely match the characteristics of the original data including the correlations among the different dimensions. We will show how to extend the method to the case of data streams. We present empirical results illustrating the effectiveness of the method. We also show the efficiency of the method for data streams.

References

[1]
Aggarwal, C. C. 2005. On k-anonymity and the curse of dimensionality. In Proceedings International Conference on Very Large Databases.
[2]
Aggarwal, C. C. and Yu, P. S. 2004. A condensation approach to privacy-preserving data mining. In Proceedings of the International Conference on Extending Database Technology.
[3]
Agrawal, D. and Aggarwal, C. C. 2000. On the design and quantification of privacy preserving data mining algorithms. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Conference on Principles of Database Systems.
[4]
Agrawal, R. and Srikant, R. 1994. Fast algorithms for mining association rules. In Proceedings of the International Conference on Very Large Databases.
[5]
Agrawal, R. and Srikant, R. 2000. Privacy-preserving data mining. In Proceedings of the ACM SIGMOD Conference.
[6]
Agrawal, S. and Haritsa, J. 2005. A framework for high accuracy privacy-preserving data mining. In Proceedings of the International Conference on Data Engineering.
[7]
Atzori, M., Bonchi, F., Giannotti, F., and Pedreschi, D. 2008. Anonymity preserving pattern discovery. VLDB J. To appear.
[8]
Bayardo, R. J. and Agrawal, R. 2005. Data privacy through optimal k-anonymization. In Proceedings of the International Conference on Data Engineering.
[9]
Benassi, P. 1999. Truste: An online privacy seal program. Comm. ACM 42, 2, 56--59.
[10]
Blum, A., Dwork, C., McSherry, F., and Nissim, K. 2005. Practical privacy: The SuLQ framework. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Conference on Principles of Database Systems.
[11]
Brucker, P. 1977. On the complexity of clustering problems. R. Henn, B. Korte, W. Oettli, Eds. Optimization and Operations Research, Springer-Verlag, 45--54.
[12]
Chawla, C., Dwork, C., McSherry, F., Smith, A., and Wee, H. 2004. Towards privacy in public databases. In Proceedings of the Electronic Data Workshop on Multiparty Protocols.
[13]
Domingo-Ferrer, J. and Mateo-Sanz, J. M. 2002. Practical data-oriented micro-aggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14, 1, 189--201.
[14]
Estivill-Castro, V. and Brankovic, L. 1999. Data swapping: Balancing privacy against precision in mining for logic rules. Lecture Notes in Computer Science Vol. 1676, Springer Verlag, 389--398.
[15]
Evfimievski, A., Srikant, R., Agrawal, R., and Gehrke, J. 2002. Privacy-preserving mining of association rules. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining.
[16]
Friedman, A., Schuster, A., and Wolff, R. 2006. Providing k-anonymity in data mining. VLDB Journal, to appear, Online version available since November 2006.
[17]
Fung, B., Wang, K., and Yu, P. 2002. Top down specialization for information and privacy-preservation. In Proceedings of the International Conference on Data Engineering.
[18]
Fung, B., Wang, K., and Yu, P. 2007. Anonymizing classification data for privacy preservation. IEEE Trans. Knowl. Data Eng. 19, 5, 711--725.
[19]
Hinneburg, D. A. and Keim, D. A. 1998. An efficient approach to clustering in large multimedia databases with noise. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining.
[20]
Iyengar, V. S. 2002. Transforming data to satisfy privacy constraints. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining.
[21]
Kargupta, H., Datta, S., Wang, Q., and Sivakumar, K. 2003. On the privacy-preserving properties of random data perturbation techniques. In Proceedings of the International Conference on Data Mining.
[22]
Lakshmanan, L. V. S., Ng, R. T., and Ramesh, G. 2005. To do or not to do: The dilemma of disclosing anonymized data. In Proceedings of the ACM SIGMOD Conference.
[23]
Lau, T., Etzioni, O., and Weld, D. S. 1999. Privacy interfaces for information management. Comm. ACM 42, 10, 89--94.
[24]
LeFevre, K., DeWitt, D., and Ramakrishnan, R. 2005. Incognito: Efficient full domain k-anonymity. In Proceedings of the ACM SIGMOD Conference.
[25]
LeFevre, K., DeWitt, D., and Ramakrishnan, R. 2006. Workload-aware anonymization. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining.
[26]
Liew, C. K., Choi, U. J., and Liew, C. J. 1985. A data distortion by probability distribution. ACM Trans. Datab. Syst. 10, 3, 395--411.
[27]
Liu, K., Kargupta, H., and Ryan, J. 2006. Random projection based multiplicative data perturbation for privacy preserving data mining. IEEE Trans. Knowl. Data Eng. 18, 1.
[28]
Machanavajjhala, A., Gehrke, J., and Kifer, D. 2006. i-Diversity: Privacy beyond k-anonymity. In Proceedings of the IEEE International Conference on Data Engineering.
[29]
Meyerson, A. and Williams, R. 2004. On the complexity of optimal k-anonymity. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Conference on Principles of Database Systems.
[30]
Mishra, N. and Sandler, M. 2006. Privacy via pseudo-random sketches. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Conference on Principles of Database Systems, 143--152.
[31]
Murthy, S. 1998. Automatic construction of decision trees from data: A multi-disciplinary survey. Data Min. Knowl. Discov. 2, 345--389.
[32]
Nabar, S., Marthi, B., Kenthapadi, K., Mishra, N., and Motwani, R. 2005. Towards robustness in query auditing. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Conference on Principles of Database Systems.
[33]
Pferschy, U., Rudolf, R., and Woeginger, G. 1994. Some geometric clustering problem. Nordic J. Comput. 246--263.
[34]
Reiss, S. 1984. Practical data swapping: The first steps. ACM Trans. Datab. Syst. 9, 1, 20--37.
[35]
Rizvi, S. and Haritsa, J. 2002. Maintaining data privacy in association rule mining. In Proceedings of the International Conference on Very Large Databases.
[36]
Samarati, P. 2001. Protecting respondents identities in microdata release. IEEE Trans. Knowl. Data Eng. 13, 6, 1010--1027.
[37]
Silverman, B. W. 1986. Density Estimation for Statistics and Data Analysis. Chapman and Hall.
[38]
Vaidya, J. and Clifton, C. 2002. Privacy-preserving association rule mining in vertically partitioned data. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining.
[39]
Xiao, X. and Tao, Y. 2006. Personalized privacy preservation. In Proceedings of the ACM SIGMOD Conference.
[40]
Zhong, S., Yang, Z., and Wright, R. 2005. Privacy-Enhancing k-anonymization of customer data. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Conference on Principles of Database Systems.

Cited By

View all
  • (2023)A privacy-enhanced human activity recognition using GAN & entropy ranking of microaggregated dataCluster Computing10.1007/s10586-023-04063-127:2(2117-2132)Online publication date: 22-Jun-2023
  • (2023)Privacy-preserving data (stream) mining techniques and their impact on data mining accuracy: a systematic literature reviewArtificial Intelligence Review10.1007/s10462-023-10425-356:9(10427-10464)Online publication date: 22-Feb-2023
  • (2022)Privacy Preserving Human Activity Recognition Using Microaggregated Generative Deep Learning2022 IEEE International Conference on Cyber Security and Resilience (CSR)10.1109/CSR54599.2022.9850328(357-363)Online publication date: 27-Jul-2022
  • Show More Cited By

Index Terms

  1. On static and dynamic methods for condensation-based privacy-preserving data mining

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Database Systems
    ACM Transactions on Database Systems  Volume 33, Issue 1
    March 2008
    211 pages
    ISSN:0362-5915
    EISSN:1557-4644
    DOI:10.1145/1331904
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 March 2008
    Accepted: 01 October 2007
    Revised: 01 October 2006
    Received: 01 April 2006
    Published in TODS Volume 33, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. k-anonymity
    2. Privacy
    3. databases data mining

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)11
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 04 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)A privacy-enhanced human activity recognition using GAN & entropy ranking of microaggregated dataCluster Computing10.1007/s10586-023-04063-127:2(2117-2132)Online publication date: 22-Jun-2023
    • (2023)Privacy-preserving data (stream) mining techniques and their impact on data mining accuracy: a systematic literature reviewArtificial Intelligence Review10.1007/s10462-023-10425-356:9(10427-10464)Online publication date: 22-Feb-2023
    • (2022)Privacy Preserving Human Activity Recognition Using Microaggregated Generative Deep Learning2022 IEEE International Conference on Cyber Security and Resilience (CSR)10.1109/CSR54599.2022.9850328(357-363)Online publication date: 27-Jul-2022
    • (2021)Privacy preserving defect prediction using generalization and entropy-based data reductionIntelligent Data Analysis10.3233/IDA-20550425:6(1369-1405)Online publication date: 29-Oct-2021
    • (2021)Sensitive Label Privacy Preservation with Anatomization for Data PublishingIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2019.291983318:2(904-917)Online publication date: 1-Mar-2021
    • (2020)Privacy-Preserving Data Publishing in Process MiningBusiness Process Management Forum10.1007/978-3-030-58638-6_8(122-138)Online publication date: 2-Sep-2020
    • (2019)An Efficient and Scalable Privacy Preserving Algorithm for Big Data and Data StreamsComputers & Security10.1016/j.cose.2019.101570(101570)Online publication date: Jul-2019
    • (2019)Multidimensional Correlation Hierarchical Differential Privacy for Medical Data with Multiple Privacy RequirementsProceedings of the 2nd International Conference on Healthcare Science and Engineering10.1007/978-981-13-6837-0_12(153-173)Online publication date: 10-May-2019
    • (2019)Comparative Analysis of Privacy Preserving Approaches for Collaborative Data ProcessingIntelligent Communication Technologies and Virtual Mobile Networks10.1007/978-3-030-28364-3_18(199-206)Online publication date: 13-Aug-2019
    • (2018)Efficient data perturbation for privacy preserving and accurate data stream miningPervasive and Mobile Computing10.1016/j.pmcj.2018.05.00348(1-19)Online publication date: Aug-2018
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media