Skip to main content
Log in

Guided perturbation: towards private and accurate mining

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

There have been two methods for privacy- preserving data mining: the perturbation approach and the cryptographic approach. The perturbation approach is typically very efficient, but it suffers from a tradeoff between accuracy and privacy. In contrast, the cryptographic approach usually maintains accuracy, but it is more expensive in computation and communication overhead. We propose a novel perturbation method, called guided perturbation. Specifically, we focus on a central problem of privacy-preserving data mining—the secure scalar product problem of vertically partitioned data, and give a solution based on guided perturbation, with good, provable privacy guarantee. Our solution achieves accuracy comparable to the cryptographic solutions, while keeping the efficiency of perturbation solutions. Our experimental results show that it can be more than one hundred times faster than a typical cryptographic solution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agrawal, D., Aggarwal, C.: On the design and quantification of privacy preserving data mining algorithms. In: Proc. 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 247–255 (2001)

  2. Agrawal, R., Evfimievski, A., Srikant, R.: Information sharing across private databases. In: Proc. 2003 ACM SIGMOD International Conference on Management of Data, pp. 86–97. ACM Press (2003)

  3. Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proc.ACM SIGMOD Conference on Management of Data, pp. 439–450. ACM Press (2000)

  4. Atallah, M., Du, W.: Secure multi-party computational geometry. In: Proc. of the Seventh International Workshop on Algorithms and Data Structures, pp.165–179. Springer, Heidelberg (2001)

  5. Blake, C., Merz, C.: UCI repository of machine learning databases (1998)

  6. Canetti, R., Ishai, Y., Kumar, R., Reiter, M., Rubinfeld, R., Wright, R.: Selective private function evaluation with applications to private statistics. In: Proc. of the 20th Annual ACM Symposium on Principles of Distributed Computing, pp. 293–304. ACM Press (2001)

  7. Chin F. (1986). Security problems on inference control for sum, max and min queries. J. ACM 33(3): 451–464

    Article  MathSciNet  Google Scholar 

  8. Du, W., Zhan, Z.: Using randomized response techniques for privacy-preserving data mining. In: Proc. of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 505–510. ACM Press (2003)

  9. Du, W., Han, Y., Chen, S.: Privacy-preserving multivariate statistical analysis: Linear regression and classification. In: Proceedings of the 4th SIAM International Conference on Data Mining, Florida (2004)

  10. European Parliament. Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data. Official Journal of the European Communities, p. 31 (1995)

  11. European Parliament. Directive 97/66/EC of the European Parliament and of the Council of 15 December 1997 concering the processing of personal data and the protection of privacy in the telecommunications sector. Official Journal of the European Communities, pp. 1–8 (1998)

  12. Evfimievski, A., Gehrke, J., Srikant, R.: Limiting privacy breaches in privacy preserving data mining. In: Proc. 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 211–222. ACM Press (2003)

  13. Evfimievski, A., Srikant, R., Agrawal, R., Gehrke, J.: Privacy preserving mining of association rules. In: Proc. of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 217–228. ACM Press (2002)

  14. Freedman, M., Nissim, K., Pinkas, B.: Efficient private matching and set intersection. In: Advances in Cryptology—EUROCRYPT 2004, LNCS 3027, pp. 1–19. Springer, Heidelberg (2004)

  15. Goethals, B., Laur, S., Lipmaa, H., Mielikäinen, T.: On private scalar product computation for privacy-preserving data mining. In: Proceedings of the Seventh Annual International Conference in Information Security and Cryptology, LNCS. Springer, Heidelberg (2004) (to appear)

  16. Goldreich O. (2001). Foundations of Cryptography, vol. 1. Cambridge University Press, Cambridge

    Google Scholar 

  17. Goldreich O. (2004). Foundations of Cryptography, vol. 2. Cambridge University Press, Cambridge

    Google Scholar 

  18. Goldreich, O., Micali, S., Wigderson, A.: How to play any mental game. In: Proc. of the 19th Annual ACM Conference on Theory of Computing, pp. 218–229. ACM Press (1987)

  19. Golub, G.H., Van Loan, C.F.: Matrix Computation, 3rd edn. The Johns Hopkins University Press (1996)

  20. HIPAA. The health insurance portability and accountability act of 1996, October 1998. Available at www.cms.hhs.gov/hipaa

  21. Huang, Z., Du, W., Chen, B.: Deriving private information from randomized data. In: SIGMOD ’05: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 37–48, New York. ACM Press (2005)

  22. Papadimitriou, C.H., Kleinberg, J.M., Raghavan, P.: Auditing boolean attributes. In: Proc. of PODS, pp. 86–91 (2000)

  23. Jagannathan, G., Wright, R.N.: Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) (2005)

  24. Kantarcioglu, M., Clifton, C.: Privacy-preserving distributed mining of association rules on horizontally partitioned data. In: The ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD’02), pp. 24–31, (2002)

  25. Kargupta, H., Datta, S., Wang, Q., Sivakumar, K.: On the privacy preserving properties of random data perturbation techniques. In: The Third IEEE International Conference on Data Mining (2003)

  26. Kargupta, H., Park, B., Hershberger, D., Johnson, E.

  27. Lindell Y. and Pinkas B. (2002). Privacy preserving data mining. J. Cryptol. 15(3): 177–206

    Article  MATH  MathSciNet  Google Scholar 

  28. Liu, K., Kargupta, H., Ryan, J.: Multiplicative noise, random projection, and privacy preserving data mining from distributed multi-party data. Technical Report TR-CS-03-24, Computer Science and Electrical Engineering Department, University of Maryland, Baltimore County (2003)

  29. Meng, D., Sivakumar, K., Kargupta, H.: Privacy-sensitive bayesian network parameter learning. In: The Fourth IEEE International Conference on Data Mining, Brighton (2004)

  30. Rizvi, S., Haritsa, J.: Maintaining data privacy in association rule mining. In: Proc. of the 28th VLDB Conference (2002)

  31. Schneier B. (1996). Applied Cryptography, 2nd edn. Wiley, New York

    Google Scholar 

  32. Stolfo, S., Prodromidis, A., Tselepis, S., Lee, W., Fan, D., Chan, P.: JAM: Java agents for meta-learning over distributed databases. In: Knowledge Discovery and Data Mining, pp. 74–81 (1997)

  33. Subramaniam, H., Wright, R.N., Yang, Z.: Experimental analysis of privacy-preserving statistics computation. In: Proc. of the VLDB Worshop on Secure Data Management, pp. 55–66 (2004)

  34. Vaidya J., Clifton, C.: Privacy preserving association rule mining in vertically partitioned data. In: Proc. of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 639–644. ACM Press (2002)

  35. Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: Proc. of the Ninth ACM SIGKDD International Conference on Knowledge discovery and data mining, pp. 206–215. ACM Press (2003)

  36. Vaidya, J., Clifton, C.: Privacy preserving naive Bayes classifier on vertically partitioned data. In: 2004 SIAM International Conference on Data Mining (2004)

  37. Vaidya, J., Clifton, C.: Secure set intersection cardinality with application to association rule mining. J. Comput. Security (to appear)

  38. Wright, R.N., Yang, Z.: Privacy-preserving Bayesian network structure computation on distributed heterogeneous data. In: Proc. of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 713–718. ACM Press (2004)

  39. Yang, Z., Wright, R.N.: Improved privacy-preserving Bayesian network parameter learning on vertically partitioned data. In: Proceedings of the International Workshop on Privacy Data Management (Held in Conjunction with ICDE ’05), Tokyo (2005)

  40. Yang, Z., Zhong, S., Wright, R.N.: Privacy-preserving classification of customer data without loss of accuracy. In: Proc. of the 2005 SIAM International Conference on Data Mining (SDM), Newport Beach (2005)

  41. Yang, Z., Zhong, S., Wright R.N.: Anonymity-preserving data collection. In: Proc. of the 11st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago (2005)

  42. Zhang, N., Wang, S., Zhao, W.: A new scheme on privacy-preserving data classification. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2005) (to appear)

  43. Zhong, S., Yang, Z., Wright, R.N.: Privacy-enhancing k-anonymization of customer data. In: Proc. of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Baltimore (2005)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sheng Zhong.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhong, S., Yang, Z. Guided perturbation: towards private and accurate mining. The VLDB Journal 17, 1165–1177 (2008). https://doi.org/10.1007/s00778-007-0056-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-007-0056-z

Keywords

Navigation