Data mining services require accurate input data for their results to be meaningful, but privacy concerns may impel users to provide spurious information. In this chapter, we study whether users can be encouraged to provide correct information by ensuring that the mining process cannot, with any reasonable degree of certainty, violate their privacy. Our analysis is in the context of extracting association rules from large historical databases, a popular mining process that identifies interesting correlations between database attributes. We analyze the various schemes that have been proposed for this purpose with regard to a variety of parameters including the degree of trust, privacy metric, model accuracy and mining efficiency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
N. Adam and J. Wortman. Security control methods for statistical databases. ACM Computing Surveys, 21(4), 1989.
C. Aggarwal and P. Yu. A condensation approach to privacy preserving data mining. Proc. of 9th Intl. Conf. on Extending Database Technology (EDBT), March 2004.
D. Agrawal and C. Aggarwal. On the design and quantification of privacy preserving data mining algorithms. Proc. of ACM Symp. on Principles of Database Systems (PODS), May 2001.
R. Agrawal, R. Bayardo, C. Faloutsos, J. Kiernan, R. Rantzau and R. Srikant. Auditing compliance with a hippocratic database. Proc. of 30th Intl. Conf. on Very Large Data Bases (VLDB), August 2004.
R. Agrawal, J. Kiernan, R. Srikant and Y. Xu. Hippocratic databases. Proc. of 28th Intl. Conf. on Very Large Data Bases (VLDB), August 2002.
R. Agrawal, A. Kini, K. LeFevre, A. Wang, Y. Xu and D. Zhou. Managing healthcare data hippocratically. Proc. of ACM SIGMOD Intl. Conf. on Management of Data, June 2004.
R. Agrawal, T. Imielinski and A. Swami. Mining association rules between sets of items in large databases. Proc. of ACM SIGMOD Intl. Conf. on Management of Data, May 1993.
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. Proc. of 20th Intl. Conf. on Very Large Data Bases (VLDB), September 1994.
R. Agrawal and R. Srikant. Privacy-preserving data mining. Proc. of ACM SIGMOD Intl. Conf. on Management of Data, May 2000.
S. Agrawal and J. Haritsa. A Framework for High-Accuracy Privacy-Preserving Mining. Proc. of 21st IEEE Intl. Conf. on Data Engineering (ICDE), April 2005.
S. Agrawal and J. Haritsa. A Framework for High-Accuracy Privacy-Preserving Mining. Tech. Rep. TR-2004-02, DSL/SERC, Indian Institute of Science, 2004. http://dsl.serc.iisc.ernet.in/pub/TR/TR-2004-02.pdf
S. Agrawal, V. Krishnan and J. Haritsa. On addressing efficiency concerns in privacy-preserving mining. Proc. of 9th Intl. Conf. on Database Systems for Advanced Applications (DASFAA), March 2004.
M. Atallah, E. Bertino, A. Elmagarmid, M. Ibrahim and V. Verykios. Disclosure limitation of sensitive rules. Proc. of IEEE Knowledge and Data Engineering Exchange Workshop (KDEX), November 1999.
L. Cranor, J. Reagle and M. Ackerman. Beyond concern: Understanding net users’ attitudes about online privacy. AT&T Tech. Rep. 99.4.3, April 1999.
E. Dasseni, V. Verykios, A. Elmagarmid and E. Bertino. Hiding association rules by using confidence and support. Proc. of 4th Intl. Information Hiding Workshop (IHW), April 2001.
P. de Wolf, J. Gouweleeuw, P. Kooiman, and L. Willenborg. Reflections on PRAM. Proc. of Statistical Data Protection Conf., March 1998.
D. Denning. Cryptography and Data Security. Addison-Wesley, 1982.
A. Evfimievski, J. Gehrke and R. Srikant. Limiting privacy breaches in privacy preserving data mining. Proc. of ACM Symp. on Principles of Database Systems (PODS), June 2003.
A. Evfimievski, R. Srikant, R. Agrawal and J. Gehrke. Privacy preserving mining of association rules. Proc. of 8th ACM Intl. Conf. on Knowledge Discovery and Data Mining (KDD), July 2002.
W. Feller. An Introduction to Probability Theory and its Applications (Vol. I). Wiley, 1988.
M. Garey and D. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, 1979.
A. Gkoulalas-Divanis and V. Verykios. An integer programming approach for frequent itemset hiding. Proc. of 15th ACM Conf. on Information and Knowledge Management (CIKM), November 2006.
O. Goldreich. Secure Multi-party Computation. www.wisdom.weizmann.ac.il/Ëœoded/pp.html, 1998.
J. Gouweleeuw, P. Kooiman, L. Willenborg and P. de Wolf. Post randomisation for statistical disclosure control: Theory and implementation. Journal of Official Statistics, 14(4), 1998.
M. Kantarcioglu and C. Clifton. Privacy-preserving distributed mining of association rules on horizontally partitioned data. Proc. of ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD), June 2002.
H. Kargupta, S. Datta, Q. Wang and K. Sivakumar. On the privacy preserving properties of random data perturbation techniques. Proc. of the 3rd IEEE Intl. Conf. on Data Mining (ICDM), December 2003.
K. LeFevre, R. Agrawal, V. Ercegovac, R. Ramakrishnan, Y. Xu and D. DeWitt. Limiting disclosure in hippocratic databases. Proc. of 30th Intl. Conf. on Very Large Data Bases (VLDB), 2004.
N. Mishra and M. Sandler. Privacy via pseudorandom sketches. Proc. of 25th ACM Symp. on Principles of Database Systems (PODS), 2006.
T. Mitchell. Machine Learning. McGraw Hill, 1997.
G. Moustakides and V. Verykios. A Max-Min Approach for Hiding Frequent Itemsets. Proc. of 6th IEEE Intl. Conf. on Data Mining - Workshops, December 2006.
R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, 1995.
V. Pudi and J. Haritsa. Quantifying the Utility of the Past in Mining Large Databases. Information Systems, Elsevier Science Publishers, vol. 25, no. 5, July 2000, pgs. 323–344
J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
S. Rizvi and J. Haritsa. Maintaining data privacy in association rule mining. Proc. of 28th Intl. Conf. on Very Large Databases (VLDB), August 2002.
P. Samarati and L. Sweeney. Generalizing data to provide anonymity when disclosing information. Proc. of 17th ACM Symp. on Principles of Database Systems (PODS), June 1998.
Y. Saygin, V. Verykios and C. Clifton. Using unknowns to prevent discovery of association rules. ACM SIGMOD Record, vol. 30, no. 4, 2001.
Y. Saygin, V. Verykios and A. Elmagarmid. Privacy preserving association rule mining. Proc. of 12th Intl. Workshop on Research Issues in Data Engineering (RIDE), February 2002.
A. Shoshani. Statistical databases: Characteristics, problems and some solutions. Proc. of 8th Intl. Conf. on Very Large Databases (VLDB), September 1982.
G. Strang. Linear Algebra and its Applications. Thomson Learning Inc., 1988.
H. Toivonen. Sampling large databases for association rules. Proc. of 22nd Intl. Conf. on Very Large Databases (VLDB), August 1996.
J. Vaidya and C. Clifton. Privacy preserving association rule mining in vertically partitioned data. Proc. of 8th ACM Intl. Conference on Knowledge Discovery and Data Mining (KDD), July 2002.
J. Vaidya and C. Clifton. Privacy-preserving k-means clustering over vertically partitioned data. Proc. of 9th ACM Intl. Conf. on Knowledge Discovery and Data Mining (KDD), August 2003.
J. Vaidya and C. Clifton. Privacy preserving naive bayes classifier for vertically partitioned data. Proc. of SIAM Intl. Conf. on Data Mining, April 2004.
V. Verykios, A. Elmagarmid, E. Bertino, Y. Saygin and E. Dasseni. Association Rule Hiding. IEEE Trans. on Knowledge and Data Engineering, 16(4), 2004.
Y. Wang. On the number of successes in independent trials. Statistica Silica 3, 1993.
A. Westin. Freebies and privacy: What net users think. Tech. Rep., Opinion Research Corporation, 1999.
N. Zhang, S. Wang and W. Zhao. A new scheme on privacy-preserving association rule mining. Proc. of 8th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), September 2004.
Data from US Census beaurau : National Health Interview Survey : Person, 1993. http://dataferrett.census.gov.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Haritsa, J.R. (2008). Mining Association Rules under Privacy Constraints. In: Aggarwal, C.C., Yu, P.S. (eds) Privacy-Preserving Data Mining. Advances in Database Systems, vol 34. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-70992-5_10
Download citation
DOI: https://doi.org/10.1007/978-0-387-70992-5_10
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-70991-8
Online ISBN: 978-0-387-70992-5
eBook Packages: Computer ScienceComputer Science (R0)