Privacy-Preserving Data Mining

Brankovic, Ljiljana; Islam, Md. Zahidul; Giggins, Helen

doi:10.1007/978-3-540-69861-6_11

Ljiljana Brankovic³,
Md. Zahidul Islam³ &
Helen Giggins³

Part of the book series: Data-Centric Systems and Applications ((DCSA))

2264 Accesses
3 Citations

Abstract

Despite enormous benefits and the extremely fast proliferation of data mining in recent years, data owners and researchers alike have acknowledged that data mining also revives old and introduces new threats to individual privacy. Many believe that data mining is, and will continue to be, one of the most significant privacy challenges in years to come.

We live in an information age where vast amounts of personal data are regularly collected in the process of bank transactions, credit-card payments, making phone calls, using reward cards, visiting doctors and renting videos and cars, to mention but a few examples. All these data are typically used for data mining and statistical analysis and are often sold to other companies and organizations.

A breach of privacy occurs when individuals are not aware that the data have been collected in the first place, have been passed onto other companies and organizations, or have been used for purposes other than the one for which they were originally collected. Even when individuals approve of use of their personal records for data mining and statistical analysis, for example in medical research, it is still assumed that only aggregate values will be made available to researchers and that no individual values will be disclosed.

Various techniques can be employed in order to ensure the confidentiality of individual records and other sensitive information. They include adding noise to the original data, so that disclosing perturbed data does not necessarily reveal the confidential individual values. Some techniques were developed specifically for mining vertically and/or horizontally partitioned data. In this scenario each partition belongs to a different party (e.g., a hospital), and no party is willing to share their data but they all have interest in mining the total data set comprising all of the partitions. There are other techniques that focus on protecting confidentiality of logic rules and patterns discovered from data.

In this chapter we introduce the main issues in privacy-preserving data mining, provide a classification of existing techniques and survey the most important results in this area.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

D.E.R. Denning. Cryptography and Data Security. Addison-Wesley, 1982.
Google Scholar
A. Cavoukian. Data mining: Staking a claim on your privacy. Information and Privacy Commissioner Ontario, pages 1–22, 1998.
Google Scholar
O.H. Gandy Jr. and H.I. Schiller. Data mining and surveillance in the post-9.11 environment. In Political Economy section, IAMCR, pages 1–18, Barcelona, July, 2002.
Google Scholar
M.J. Culnan. How did they get my name: An exploratory investigation of consumer attitudes towards secondary information use. MIS Quarterly, 17:341–361, 1993.
Article Google Scholar
L. Willenborg and T. de Waal. Statistical Disclosure Control in Practice. Lecture Notes in Statistics. 1996. Springer.
Google Scholar
L. Brankovic and V. Estivill-Castro. Privacy issues in knowledge discovery and data mining. In Proc. of Australian Institute of Computer Ethics Conference (AICEC99), pages 89–99, Melbourne, Victoria, Australia, July 1999.
Google Scholar
M. Trottini and S.E. Feinberg. Modelling user uncertainty for disclosure risk and data utility. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):511–527, 2002.
Article MATH Google Scholar
K. Muralidhar and R. Sarathy. Data access, data utility, and disclosure risk are not always mutually exclusive. In NSF Workshop of Confidentiality, Washington, DC, May 2003.
Google Scholar
V.S. Verykios, E. Bertino, I. Nai Fovino, L. Parasiliti Provenza, Y. Saygin, and Y. Theodoridis. State-of-the-art in privacy preserving data mining. SIGMOD Record, 33(1):50–57, 2004.
Article Google Scholar
W. Du and M.J. Atallah. Secure multiparty computation problems and their applications: A review and open problems. In Proceedings of New Security Paradigms Workshop, pages 11–20, Cloudcroft, New Mexico, USA, September 11–13 2001.
Google Scholar
A.C. Yao. Protocols for secure computations. In Proceedings of the 23rd Annual IEEE Symposium on Foundations of Computer Science, 1982.
Google Scholar
C. Clifton, M. Kantarcioglu, J. Vaidya, X. Lin, and M. Y. Zhu. Tools for privacy preserving data mining. SIGKDD Explorations, 4(2):28–34, 2002.
Article Google Scholar
B. Gilburd, A. Schuster, and R. Wolff. Privacy-preserving data mining on data grids in the presence of malicious participants. In Proceedings of 13th International Symposium on High-Performance Distributed Computing (HPDC-13 2004), pages 225–234, Honolulu, Hawaii, USA, June 2004.
Google Scholar
J. Vaidya and C. Clifton. Privacy preserving association rule mining in vertically partitioned data. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 639–644, Edmonton, Alberta, Canada, July 2002.
Google Scholar
M. Kantarcioglu and J. Vaidya. Privacy preserving naive bayes classifier for horizontally partitioned data. In Proceedings of IEEE ICDM Workshop on Privacy Preserving Data Mining, pages 3–9, Melbourne, Florida, USA, November 2003.
Google Scholar
A. Veloso, W. Meira Jr., S. Parthasarathy, and M. de Carvalho. Efficient, accurate and privacy-preserving data mining for frequent itemsets in distributed databases. In Proceedings of XVIII Simpósio Brasileiro de Bancos de Dados (SBBD), pages 281–292, Manaus, Amazonas, Brasil, 2003.
Google Scholar
J. Vaidya and C. Clifton. Privacy preserving naïve bayes classifier for vertically partitioned data. In Proceedings of the Fourth SIAM International Conference on Data Mining, Lake Buena Vista, Florida, USA, April 2004.
Google Scholar
R.N. Wright and Z. Yang. Privacy-preserving bayesian network structure computation on distributed heterogeneous data. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 713–718, Seattle, Washington, USA, August 2004.
Google Scholar
W. Du and Z. Zhan. Building decision tree classifier on private data. In Workshop on Privacy, Security, and Data Mining at The 2002 IEEE International Conference on Data Mining (ICDM02), Maebashi City, Japan, December 9 2002.
Google Scholar
W. Du, Y.S. Han, and S. Chen. Privacy-preserving multivariate statistical analysis: Linear regression and classification. In Proceedings of the Fourth SIAM International Conference on Data Mining, Lake Buena Vista, Florida, USA, April 22–24 2004.
Google Scholar
Y. Lindell and B. Pinkas. Privacy preserving data mining. In Proceedings of Advances in Cryptology-CRYPTO 2000, 20th Annual International Cryptology Conference, pages 36–54, Santa Barbara, California, USA, 2000.
Google Scholar
J. Vaidya and C. Clifton. Privacy-preserving k-means clustering over vertically partitioned data. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 206–215, Washington, DC, USA, August 2003.
Google Scholar
A.P. Sanil, A.F. Karr, X. Lin, and J.P. Reiter. Privacy preserving regression modelling via distributed computation. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 677–682, Seattle, Washington, USA, August 2004.
Google Scholar
J. Vaidya and C. Clifton. Privacy-preserving outlier detection. In Proceedings of the 4th IEEE International Conference on Data Mining (ICDM 2004), pages 233–240, Brighton, UK, November 2004.
Google Scholar
J.L. Lin and M.H. Dunham. Mining association rules: Anti-skew algorithms. In Proc. of 1998 International Conference on Data Engineering, pages 486–493, 1998.
Google Scholar
T. Dalenius and S.P. Reiss. Data-swapping: A technique for disclosure control. Journal of Statistical Planning and Inference, 6(1):73–85, 1982.
Article MATH MathSciNet Google Scholar
V. Estivill-Castro and L. Brankovic. Data swapping: Balancing privacy against precision in mining for logic rules. In Proc. of Data Warehousing and Knowledge Discovery (DaWaK99), pages 389–398, 1999.
Google Scholar
S.R.M. Oliveira and O.R. Zaïane. Algorithms for balancing privacy and knowledge discovery in association rule mining. In Proc. of the 7th International Database Engineering and Applications Symposium (IDEAS03), page 5463, Hong Kong, China, July 2003.
Google Scholar
V.S. Verykios, A.K. Elmagarmid, E. Bertino, Y. Saygin, and E. Dasseni. Association rule hiding. IEEE Trans. Knowl. Data Eng., 16(4):434–447, 2004.
Article Google Scholar
J.J. Kim. A method for limiting disclosure in microdata based on random noise and transformation. In American Statistical Association, Proceedings of the Section on Survey Research Methods, pages 303–308, 1986.
Google Scholar
P. Tendick and N.S. Matloff. A modified random perturbation method for database security. ACM Trans. Database Syst., 19(1):47–63, 1994.
Article Google Scholar
K. Muralidhar, R. Parsa, and R. Sarathy. A general additive data perturbation method for database security. Management Science, 45(10):1399–1415, 1999.
Article Google Scholar
R.L. Wilson and P.A. Rosen. The impact of data perturbation techniques on data mining accuracy. In Proceedings of the 33rd Annual Meeting of the Decision Sciences Institute, pages 181–185, 2002.
Google Scholar
Md.Z. Islam and L. Brankovic. A framework for privacy preserving classification in data mining. In Proceedings of Workshop on Data Mining and Web Intelligence (DMWI2004), pages 163–168, 2004.
Google Scholar
Md.Z. Islam and L. Brankovic. Detective: A decision tree based categorical value clustering and perturbation technique in privacy preserving data mining. In Proceedings of the 3rd International IEEE Conference on Industrial Informatics (INDIN 2005), Perth, Australia, 2005.
Google Scholar
Rakesh Agrawal and Ramakrishnan Srikant. Privacy-preserving data mining. In Proc. of the ACM SIGMOD Conference on Management of Data, pages 439–450. ACM Press, May 2000.
Google Scholar
D. Agrawal and C.C. Aggarwal. On the design and quantification of privacy preserving data mining algorithms. In Proceedings of the Twentieth ACM SIGACTSIGMOD-SIGART Symposium on Principles of Database Systems, Santa Barbara, California, USA, May 2001.
Google Scholar
H. Kargupta, S. Datta, Q. Wang, and K. Sivakumar. Random-data perturbation techniques and privacy-preserving data mining. Knowledge and Information Systems, 7:387–414, 2005.
Article Google Scholar
K. Liu, H. Kargupta, and J. Ryan. Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Transactions on Knowledge and Data Engineering, 18(1):92–106, 2006.
Article Google Scholar
H. Giggins and L. Brankovic. Protecting privacy in genetic databases. In R. L. May and W. F. Blyth, editors, Proceedings of the Sixth Engineering Mathematics and Applications Conference, pages 73–78, Sydney, Australia, 2003.
Google Scholar
Y. Li, S. Zhu, L. Wang, and S. Jajodia. A privacy-enhanced microaggregation method. In Proc. of 2nd International Symposium on Foundations of Information and Knowledge Systems, pages 148–159, 2002.
Google Scholar
S.V. Iyengar. Transforming data to satisy privacy constraints. In Proc. of SIGKDD’02, Edmonton, Alberta, Canada, 2002.
Google Scholar
A.A. Hintoglu and Y. Saygin. Suppressing microdata to prevent probabilistic classification based inference. In Proceedings of Secure Data Management, Second VLDB Workshop, SDM 2005, pages 155–169, Trondheim, Norway, 2005.
Google Scholar
S. Rizvi and J.R. Haritsa. Maintaining data privacy in association rule mining. In Proceedings of the 28th VLDB Conference, pages 682–693, Hong Kong, China, 2002.
Google Scholar
Y. Saygin, V.S. Verykios, and A.K. Elmagarmid. Privacy preserving association rule mining. In RIDE, pages 151–158, 2002.
Google Scholar

Download references

Author information

Authors and Affiliations

The University of Newcastle, Australia
Ljiljana Brankovic, Md. Zahidul Islam & Helen Giggins

Authors

Ljiljana Brankovic
View author publications
You can also search for this author in PubMed Google Scholar
Md. Zahidul Islam
View author publications
You can also search for this author in PubMed Google Scholar
Helen Giggins
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Philips Research Europe, High Tech Campus 34, 5656 AE, Eindhoven, The Netherlands
Milan Petković
Philips Research Europe, Philips Research / Twente University, High Tech Campus 34, 5656 AE, Eindhoven, The Netherlands
Willem Jonker

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Brankovic, L., Islam, M.Z., Giggins, H. (2007). Privacy-Preserving Data Mining. In: Petković, M., Jonker, W. (eds) Security, Privacy, and Trust in Modern Data Management. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69861-6_11

Download citation

DOI: https://doi.org/10.1007/978-3-540-69861-6_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69860-9
Online ISBN: 978-3-540-69861-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics