Abstract
To preserve client privacy in the data mining process, a variety of techniques based on random perturbation of individual data records have been proposed recently. In this paper, we present FRAPP, a generalized matrix-theoretic framework of random perturbation, which facilitates a systematic approach to the design of perturbation mechanisms for privacy-preserving mining. Specifically, FRAPP is used to demonstrate that (a) the prior techniques differ only in their choices for the perturbation matrix elements, and (b) a symmetric positive-definite perturbation matrix with minimal condition number can be identified, substantially enhancing the accuracy even under strict privacy requirements. We also propose a novel perturbation mechanism wherein the matrix elements are themselves characterized as random variables, and demonstrate that this feature provides significant improvements in privacy at only a marginal reduction in accuracy. The quantitative utility of FRAPP, which is a general-purpose random-perturbation-based privacy-preserving mining technique, is evaluated specifically with regard to association and classification rule mining on a variety of real datasets. Our experimental results indicate that, for a given privacy requirement, either substantially lower modeling errors are incurred as compared to the prior techniques, or the errors are comparable to those of direct mining on the true database.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Adam N, Wortman J (1989) Security control methods for statistical databases. ACM Comput Surv 21(4): 515–556
Aggarwal C, Yu P (2004, March) A condensation approach to privacy preserving data mining. In: Proceedings of the 9th international conference on extending database technology (EDBT), Heraklion, Crete, Greece
Agrawal D, Aggarwal C (2001, May) On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the ACM symposium on principles of database systems (PODS), Santa Barbara, California, USA
Agrawal R, Bayardo R, Faloutsos C, Kiernan J, Rantzau R, Srikant R (2004, August) Auditing compliance with a hippocratic database. In: Proceedings of the 30th international conference on very large data bases (VLDB), Toronto, Canada
Agrawal R, Kiernan J, Srikant R, Xu Y (2002, August) Hippocratic databases. In: Proceedings of the 28th international conference on very large data bases (VLDB), Hong Kong, China
Agrawal R, Kini A, LeFevre K, Wang A, Xu Y, Zhou D (2004, June) Managing healthcare data hippocratically. In: Proceedings of the ACM SIGMOD international conference on management of data, Paris, France
Agrawal R, Srikant R (1994, September) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large data bases (VLDB), Santiago de Chile, Chile
Agrawal R, Srikant R (2000, May) Privacy-preserving data mining. In: Proceedings of the ACM SIGMOD international conference on management of data, Dallas, Texas, USA
Agrawal R, Srikant R, Thomas D (2005, June) Privacy-preserving OLAP. In: Proceedings of the ACM SIGMOD international conference on management of data, Baltimore, Maryland, USA
Agrawal S, Krishnan V, Haritsa J (2004, March) On addressing efficiency concerns in privacy-preserving mining. In: Proceedings of the 9th international conference on database systems for advanced applications (DASFAA), Jeju Island, Korea
Atallah M, Bertino E, Elmagarmid A, Ibrahim M, Verykios V (1999, November) Disclosure limitation of sensitive rules. In: Proceedings of the IEEE knowledge and data engineering exchange workshop (KDEX), Chicago, Illinois, USA
Cranor L, Reagle J, Ackerman M (1999, April) Beyond concern: understanding net users’ attitudes about online privacy, AT&T labs research technical report TR 99.4.3
Dasseni E, Verykios V, Elmagarmid A, Bertino E (2001, April) Hiding association rules by using confidence and support. In: Proceedings of the 4th international information hiding workshop (IHW), Pittsburgh, Pennsylvania, USA
de Wolf P, Gouweleeuw J, Kooiman P, Willenborg L (1998, March) Reflections on PRAM. In: Proceedings of the statistical data protection conference, Lisbon, Portugal
Denning D (1982) Cryptography and data security. Addison-Wesley
Duncan G, Pearson R (1991) Enhancing access to microdata while protecting confidentiality: prospects for the future. Stat Sci 6(3): 219–232
Evfimievski A, Gehrke J, Srikant R (2003, June) Limiting privacy breaches in privacy preserving data mining. In: Proceedings of the ACM symposium on principles of database systems (PODS), San Diego, California, USA
Evfimievski A, Srikant R, Agrawal R, Gehrke J (2002, July) Privacy preserving mining of association rules. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), Edmonton, Alberta, Canada
Feller W (1988) An introduction to probability theory and its applications, vol I. Wiley
Gouweleeuw J, Kooiman P, Willenborg L, de Wolf P (1998) Post randomisation for statistical disclosure control: Theory and implementation. J Off Stat 14(4): 485–502
Kantarcioglu M, Clifton C (2002, June) Privacy-preserving distributed mining of association rules on horizontally partitioned data. In: Proceedings of the ACM SIGMOD workshop on research issues in data mining and knowledge discovery (DMKD), Madison, Wisconsin, USA
Kargupta H, Datta S, Wang Q, Sivakumar K (2003, December) On the privacy preserving properties of random data perturbation techniques. In: Proceedings of the 3rd IEEE international conference on data mining (ICDM), Melbourne, Florida, USA
LeFevre K, Agrawal R, Ercegovac V, Ramakrishnan R, Xu Y, DeWitt D (2004, August) Limiting disclosure in hippocratic databases. In: Proceedings of the 30th international conference on very large data bases (VLDB), Toronto, Canada
Mishra N, Sandler M (2006, June) Privacy via pseudorandom sketches. In: Proceedings of the ACM symposium on principles of database systems (PODS), Chicago, Illinois, USA
Mitchell T (1997) Machine learning. McGraw Hill
Motwani R, Raghavan P (1995) Randomized algorithms. Cambridge University Press
Pudi V, Haritsa J (2000) Quantifying the utility of the past in mining large databases. Inf Sys 25(5): 323–344
Quinlan JR (1993) C4.5: Programs for machine learning. Morgan Kaufmann
Rastogi V, Suciu D, Hong S (2007, September) The boundary between privacy and utility in data publishing. In: Proceedings of the 33rd international conference on very large data bases (VLDB), Vienna, Austria
Rizvi S, Haritsa J (2002, August) Maintaining data privacy in association rule mining. In: Proceedings of the 28th international conference on very large databases (VLDB), Hong Kong, China
Samarati P, Sweeney L (1998, June) Generalizing data to provide anonymity when disclosing information. In: Proceedings of the ACM symposium on principles of database systems (PODS), Seattle, Washington, USA
Saygin Y, Verykios V, Clifton C (2001) Using unknowns to prevent discovery of association rules. ACM SIGMOD Rec 30(4): 45–54
Saygin Y, Verykios V, Elmagarmid A (2002, February) Privacy preserving association rule mining. In: Proceedings of the 12th international workshop on research issues in data engineering (RIDE), San Jose, California, USA
Shoshani A (1982, September) Statistical databases: characteristics, problems and some solutions. In: Proceedings of the 8th international conference on very large databases (VLDB), Mexico City, Mexico
Strang G (1988) Linear algebra and its applications. Thomson Learning Inc
Vaidya J, Clifton C (2002, July) Privacy preserving association rule mining in vertically partitioned data. In: Proceedings of the 8th ACM SIKGDD international conference on knowledge discovery and data mining (KDD), Edmonton, Alberta, Canada
Vaidya J, Clifton C (2003, August) Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), Washington, DC, USA
Vaidya J, Clifton C (2004, April) Privacy preserving naive bayes classifier for vertically partitioned data. In: Proceedings of the SIAM international conference on data mining (SDM), Toronto, Canada
Wang Y (1993) On the number of successes in independent trials. Statistica Silica 3
Warner S (1965) Randomized response: a survey technique for eliminating evasive answer bias. J Am Stat Assoc 60: 63–69
Westin A (1999, July) Freebies and privacy: what net users think. Technical report, Opinion Research Corporation
Zhang N, Wang S, Zhao W (2004, September) A new scheme on privacy-preserving association rule mining. In: Proceedings of the 8th European conference on principles and practice of knowledge discovery in databases (PKDD), Pisa, Italy
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Johannes Gehrke.
A partial and preliminary version of this paper appeared in the Proc. of the 21st IEEE Intl. Conf. on Data Engineering (ICDE), Tokyo, Japan, 2005, pgs. 193–204.
Rights and permissions
About this article
Cite this article
Agrawal, S., Haritsa, J.R. & Prakash, B.A. FRAPP: a framework for high-accuracy privacy-preserving mining. Data Min Knowl Disc 18, 101–139 (2009). https://doi.org/10.1007/s10618-008-0119-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-008-0119-9