Skip to main content
Log in

Determining error bounds for spectral filtering based reconstruction methods in privacy preserving data mining

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Additive randomization has been a primary tool for hiding sensitive private information. Previous work empirically showed that individual data values can be approximately reconstructed from the perturbed values, using spectral filtering techniques. This poses a serious threat of privacy breaches. In this paper we conduct a theoretical study on how the reconstruction error varies, for different types of additive noise. In particular, we first derive an upper bound for the reconstruction error using matrix perturbation theory. Attackers who use spectral filtering techniques to estimate the true data values may leverage this bound to determine how close their estimates are to the original data. We then derive a lower bound for the reconstruction error, which can help data owners decide how much noise should be added to satisfy a given threshold of the tolerated privacy breach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Adam NR, Wortman JC (1989) Security-control methods for statistical databases. ACM Comput Surv 21(4): 515–556

    Article  Google Scholar 

  2. Aggarwal C, Yu P (2004) A condensation approach to privacy preserving data mining. In: Proceedings of international conference on extending database technology, LNCS, vol 2992, pp 183–199

  3. Agrawal D, Agrawal C (2001) On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the 20th symposium on principles of database systems, Santa Barbara, pp 247–255

  4. Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the ACM SIGMOD international conference on management of data, Dallas, pp 439–450

  5. Asuncion A, Newman DJ (2007) UCI machine learning repository. University of California, School of Information and Computer Science, Irvine. http://www.ics.uci.edu/~mlearn/MLRepository.html

  6. Chen K, Liu L (2005) Privacy preserving data classification with rotation perturbation. In: Proceedings of the 5th IEEE international conference on data mining, Houston pp 589–592

  7. Dobra A, Fienberg SE (2001) Bounds for cell entries in contingency tables induced by fixed marginal totals with applications to disclosure limitation. Stat J United Nations ECE 18, pp 363–371

    Google Scholar 

  8. Evfimievski A (2002) Randomization in privacy preserving data mining. SIGKDD Explor 4(2): 43–48

    Article  Google Scholar 

  9. Evfimievski A, Gehrke J, Srikant R (2003) Limiting privacy breaches in privacy preserving data mining. In: Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART symposium on principles of database system, pp 211–222

  10. Evfimievski A, Srikant R, Agrawal R, Gehrke J (2002) Privacy preserving mining of association rules. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, Edmonton, pp 217–228

  11. Guo S, Wu X (2006) On the use of spectral filtering for privacy preserving data mining. In: Proceedings of the 21st ACM symposium on applied computing, pp 622–626

  12. Guo S, Wu X (2007) Deriving private information from arbitrarily projected data. In: Proceedings of the 11th Pacific-Asia conference on knowledge discovery and data mining, pp 84–95

  13. Guo S, Wu X, Li Y (2006a) Deriving private information from perturbed data using iqr based approach. In: Proceedings of the 22nd international conference on data engineering workshops, pp 92–101

  14. Guo S, Wu X, Li Y (2006b) On the lower bound of reconstruction error for spectral filtering based privacy preserving data mining. In: Proceedings of the 10th European conference on principles and practice of knowledge discovery in databases, Berlin, pp 520–527

  15. Huang Z, Du W, Chen B (2005) Deriving private information from randomized data. In: Proceedings of the ACM SIGMOD conference on management of data, Baltimore, pp 37–48

  16. Kantarcioglu M, Jin J, Clifton C (2004) When do data mining results violate privacy? In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 599–604

  17. Kargupta H, Datta S, Wang Q, Sivakumar K (2003) On the privacy preserving properties of random data perturbation techniques. In: Proceedings of the 3rd international conference on data mining, pp 99–106

  18. Kargupta H, Datta S, Wang Q, Sivakumar K (2005) Random-data perturbation techniques and privacy-preserving data mining. Knowl Inf Syst 7(4): 387–414

    Article  Google Scholar 

  19. Liu K, Giannella C, Kargupta H (2006) An attacker’s view of distance preserving maps for privacy preserving data mining. In: Proceedings of the 10th European conference on principles and practice of knowledge discovery in databases (PKDD’06), Berlin, pp 297–308

  20. Liu K, Kargupta H, Ryan J (2006) Random projection based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Trans Knowl Data Eng 18(1): 92–106

    Article  Google Scholar 

  21. Stewart G, Sun J (1990) Matrix perturbation theory. Academic Press, London

    MATH  Google Scholar 

  22. Sung SY, Liu Y, Xiong H, Ng PA (2006) Privacy preservation for data cubes. Knowl Inf Syst 9(1): 38–61

    Article  Google Scholar 

  23. Wang K, Fung BCM, Yu PS (2007) Handicapping attacker’s confidence: an alternative to k-anonymization. Knowl Inf Syst 11(3): 345–368

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xintao Wu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guo, S., Wu, X. & Li, Y. Determining error bounds for spectral filtering based reconstruction methods in privacy preserving data mining. Knowl Inf Syst 17, 217–240 (2008). https://doi.org/10.1007/s10115-008-0123-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-008-0123-9

Keywords

Navigation