Skip to main content
Log in

Reversible privacy preserving data mining: a combination of difference expansion and privacy preserving

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Privacy Preserving Data Mining (PPDM) can prevent private data from disclosure in data mining. However, the current PPDM methods damaged the values of original data where knowledge from the mined data cannot be verified from the original data. In this paper, we combine the concept and technique based on the reversible data hiding to propose the reversible privacy preserving data mining scheme in order to solve the irrecoverable problem of PPDM. In the proposed privacy difference expansion (PDE) method, the original data is perturbed and embedded with a fragile watermark to accomplish privacy preserving and data integrity of mined data and to also recover the original data. Experimental tests are performed on classification accuracy, probabilistic information loss, and privacy disclosure risk used to evaluate the efficiency of PDE for privacy preserving and knowledge verification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aggarwal CC, Yu PS (2008) Privacy-preserving data mining: models and algorithms. Springer, Berlin

    Book  Google Scholar 

  2. Agrawal R, Srikant R (2000) Privacy-preserving data mining. SIGMOD Rec. 29(2):439–450. doi:10.1145/335191.335438

    Article  Google Scholar 

  3. Amari S, Wu S (1999) Improving support vector machine classifiers by modifying kernel functions. Neural Netw 12(6):783–789. doi:10.1016/S0893-6080(99)00032-5

    Article  Google Scholar 

  4. Census Bureau US (2011) Census bureau homepage. http://www.census.gov/. Accessed 20 June 2011

  5. Chen C, Chen L, Zou X, Cai P (2009) Prediction of protein secondary structure content by using the concept of Chou’s pseudo amino acid composition and support vector machine. Protein Pept Lett 16(1):27–31. doi:10.2174/092986609787049420

    Article  Google Scholar 

  6. Chen TS, Chen J, Lin YC, Tsai YC (2009) Research to protect database by shaking random sampling interference (SRSI). In: Proceedings of the 2009 global congress on intelligent systems, pp 569–572. doi:10.1109/GCIS.2009.384

    Chapter  Google Scholar 

  7. Chun JY, Hong D, Jeong IR, Lee DH (2011) Privacy-preserving disjunctive normal form operations on distributed sets. Bibliogr. - Inst. Presse Sci. Inf. 231:113–122

    MathSciNet  Google Scholar 

  8. Domingo-Ferrer J, Mateo-Sanz JM, Torra V (2001) Comparing SDC methods for microdata on the basis of information loss and disclosure risk. In: Proceedings of the international conference on new techniques and technologies for statistics: exchange of technology and knowhow, pp 807–826

    Google Scholar 

  9. Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml/. Accessed 1 March 2011

  10. Furey TS, Cristianini N, Duffy N, Bednarski DW (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10):906–914. doi:10.1093/bioinformatics/16.10.906

    Article  Google Scholar 

  11. Fung BCM, Wang K, Yu PS (2007) Anonymizing classification data for privacy preservation. IEEE Trans Knowl Data Eng 19(5):711–725. doi:10.1109/TKDE.2007.1015

    Article  Google Scholar 

  12. Herranz J, Matwin S, Nin J, Torra V (2010) Classifying data from protected statistical datasets. Comput Secur 29(8):874–890. doi:10.1016/j.cose.2010.05.005

    Article  Google Scholar 

  13. Hong W, Chen TS (2011) Reversible data embedding for high quality images using interpolation and reference pixel distribution mechanism. J Vis Commun Image Represent 22(2):131–140. doi:10.1016/j.jvcir.2010.11.004

    Article  MathSciNet  Google Scholar 

  14. Hong TP, Tseng LH, Chien BC (2010) Mining from incomplete quantitative data by fuzzy rough sets. Expert Syst Appl 37(3):2644–2653. doi:10.1016/j.eswa.2009.08.002

    Article  Google Scholar 

  15. Impagliazzo R, Shaltiel R, Wigderson A (2000) Extractors and pseudo-random generators with optimal seed length. In: Proceedings of the thirty-second annual ACM symposium on theory of computing, pp 1–10. doi:10.1145/335305.335306

    Google Scholar 

  16. Jolliffe IT (2002) Principal component analysis. Springer, New York

    MATH  Google Scholar 

  17. Kabir SMA, Youssef AM, Elhakeem AK (2007) On data distortion for privacy preserving data mining. In: Proceedings of the 20th Canadian conference on electrical and computer engineering, pp 308–311. doi:10.1109/CCECE.2007.83

    Google Scholar 

  18. Kim JJ, Winkler WE (2003) Multiplicative noise for masking continuous data. Census statistical research report series: RRS2003/01. http://www.census.gov/srd/papers/pdf/rrs2003-01.pdf

  19. Liu K, Kargupta H (2006) Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Trans Knowl Data Eng 18(1):92–106. doi:10.1109/TKDE.2006.14

    Article  Google Scholar 

  20. Matatov N, Rokach L, Maimon O (2010) Privacy-preserving data mining: a feature set partitioning approach. Inf Sci 180(14):2696–2720. doi:10.1016/j.ins.2010.03.011

    Article  MathSciNet  Google Scholar 

  21. Mateo-Sanz JM, Domingo-Ferrer J, Sebé F (2005) Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Min Knowl Discov 11(2):181–193. doi:10.1007/s10618-005-0011-9

    Article  MathSciNet  Google Scholar 

  22. Saygin Y, Verykios VS, Clifton C (2001) Using unknowns to prevent discovery of association rules. SIGMOD Rec. 30(4):45–54. doi:10.1145/604264.604271

    Article  Google Scholar 

  23. Tian J (2003) Reversible data embedding using a difference expansion. IEEE Trans Circuits Syst Video Technol 13(8):890–896. doi:10.1109/TCSVT.2003.815962

    Article  Google Scholar 

  24. Wang XZ Dong LC, Yan JH (2012) Maximum ambiguity-based sample selection in fuzzy decision tree induction. IEEE Trans Knowl Data Eng 24(8):1491–1505. doi:10.1109/TKDE.2011.67

    Article  Google Scholar 

  25. Wehrens R (2011) Principal component analysis. In: Chemometrics with R, use R. Springer, Berlin, pp 43–66. doi:10.1007/978-3-642-17841-2_4

    Chapter  Google Scholar 

  26. Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco

    Google Scholar 

  27. Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemom Intell Lab Syst 2(1–3):37–52. doi:10.1016/0169-7439(87)80084-9

    Article  Google Scholar 

  28. Wu XD, Yue DM, Liu FL, Wang YF, Chu CH (2006) Privacy preserving data mining algorithms by data distortion. In: Proceedings of the international conference on management science and engineering, pp 223–228

    Google Scholar 

  29. Yang W, Qiao S (2010) A novel anonymization algorithm: privacy protection and knowledge preservation. Expert Syst Appl 37(1):756–766. doi:10.1016/j.eswa.2009.05.097

    Article  Google Scholar 

  30. Zhang ML, Peña JM, Robles V (2009) Feature selection for multi-label naive Bayes classification. Inf Sci 179(19):3218–3229. doi:10.1016/j.ins.2009.06.010

    Article  MATH  Google Scholar 

  31. Zhu D, Li XB, Wu S (2009) Identity disclosure protection: a data reconstruction approach for privacy-preserving data mining. Decis Support Syst 48(1):133–140. doi:10.1016/j.dss.2009.07.003

    Article  Google Scholar 

  32. Zhu X, Davidson I (2007) Knowledge discovery and data mining: challenges and realities. information science reference. Hershey, New York

    Book  Google Scholar 

Download references

Acknowledgements

This work was supported partially by the National Science Council of the Republic of China under Grant NSC 100-2221-E-025-014-MY2.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuan-Hung Kao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, TS., Lee, WB., Chen, J. et al. Reversible privacy preserving data mining: a combination of difference expansion and privacy preserving. J Supercomput 66, 907–917 (2013). https://doi.org/10.1007/s11227-013-0926-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-013-0926-7

Keywords

Navigation