Abstract
Privacy Preserving Data Mining (PPDM) can prevent private data from disclosure in data mining. However, the current PPDM methods damaged the values of original data where knowledge from the mined data cannot be verified from the original data. In this paper, we combine the concept and technique based on the reversible data hiding to propose the reversible privacy preserving data mining scheme in order to solve the irrecoverable problem of PPDM. In the proposed privacy difference expansion (PDE) method, the original data is perturbed and embedded with a fragile watermark to accomplish privacy preserving and data integrity of mined data and to also recover the original data. Experimental tests are performed on classification accuracy, probabilistic information loss, and privacy disclosure risk used to evaluate the efficiency of PDE for privacy preserving and knowledge verification.
Similar content being viewed by others
References
Aggarwal CC, Yu PS (2008) Privacy-preserving data mining: models and algorithms. Springer, Berlin
Agrawal R, Srikant R (2000) Privacy-preserving data mining. SIGMOD Rec. 29(2):439–450. doi:10.1145/335191.335438
Amari S, Wu S (1999) Improving support vector machine classifiers by modifying kernel functions. Neural Netw 12(6):783–789. doi:10.1016/S0893-6080(99)00032-5
Census Bureau US (2011) Census bureau homepage. http://www.census.gov/. Accessed 20 June 2011
Chen C, Chen L, Zou X, Cai P (2009) Prediction of protein secondary structure content by using the concept of Chou’s pseudo amino acid composition and support vector machine. Protein Pept Lett 16(1):27–31. doi:10.2174/092986609787049420
Chen TS, Chen J, Lin YC, Tsai YC (2009) Research to protect database by shaking random sampling interference (SRSI). In: Proceedings of the 2009 global congress on intelligent systems, pp 569–572. doi:10.1109/GCIS.2009.384
Chun JY, Hong D, Jeong IR, Lee DH (2011) Privacy-preserving disjunctive normal form operations on distributed sets. Bibliogr. - Inst. Presse Sci. Inf. 231:113–122
Domingo-Ferrer J, Mateo-Sanz JM, Torra V (2001) Comparing SDC methods for microdata on the basis of information loss and disclosure risk. In: Proceedings of the international conference on new techniques and technologies for statistics: exchange of technology and knowhow, pp 807–826
Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml/. Accessed 1 March 2011
Furey TS, Cristianini N, Duffy N, Bednarski DW (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10):906–914. doi:10.1093/bioinformatics/16.10.906
Fung BCM, Wang K, Yu PS (2007) Anonymizing classification data for privacy preservation. IEEE Trans Knowl Data Eng 19(5):711–725. doi:10.1109/TKDE.2007.1015
Herranz J, Matwin S, Nin J, Torra V (2010) Classifying data from protected statistical datasets. Comput Secur 29(8):874–890. doi:10.1016/j.cose.2010.05.005
Hong W, Chen TS (2011) Reversible data embedding for high quality images using interpolation and reference pixel distribution mechanism. J Vis Commun Image Represent 22(2):131–140. doi:10.1016/j.jvcir.2010.11.004
Hong TP, Tseng LH, Chien BC (2010) Mining from incomplete quantitative data by fuzzy rough sets. Expert Syst Appl 37(3):2644–2653. doi:10.1016/j.eswa.2009.08.002
Impagliazzo R, Shaltiel R, Wigderson A (2000) Extractors and pseudo-random generators with optimal seed length. In: Proceedings of the thirty-second annual ACM symposium on theory of computing, pp 1–10. doi:10.1145/335305.335306
Jolliffe IT (2002) Principal component analysis. Springer, New York
Kabir SMA, Youssef AM, Elhakeem AK (2007) On data distortion for privacy preserving data mining. In: Proceedings of the 20th Canadian conference on electrical and computer engineering, pp 308–311. doi:10.1109/CCECE.2007.83
Kim JJ, Winkler WE (2003) Multiplicative noise for masking continuous data. Census statistical research report series: RRS2003/01. http://www.census.gov/srd/papers/pdf/rrs2003-01.pdf
Liu K, Kargupta H (2006) Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Trans Knowl Data Eng 18(1):92–106. doi:10.1109/TKDE.2006.14
Matatov N, Rokach L, Maimon O (2010) Privacy-preserving data mining: a feature set partitioning approach. Inf Sci 180(14):2696–2720. doi:10.1016/j.ins.2010.03.011
Mateo-Sanz JM, Domingo-Ferrer J, Sebé F (2005) Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Min Knowl Discov 11(2):181–193. doi:10.1007/s10618-005-0011-9
Saygin Y, Verykios VS, Clifton C (2001) Using unknowns to prevent discovery of association rules. SIGMOD Rec. 30(4):45–54. doi:10.1145/604264.604271
Tian J (2003) Reversible data embedding using a difference expansion. IEEE Trans Circuits Syst Video Technol 13(8):890–896. doi:10.1109/TCSVT.2003.815962
Wang XZ Dong LC, Yan JH (2012) Maximum ambiguity-based sample selection in fuzzy decision tree induction. IEEE Trans Knowl Data Eng 24(8):1491–1505. doi:10.1109/TKDE.2011.67
Wehrens R (2011) Principal component analysis. In: Chemometrics with R, use R. Springer, Berlin, pp 43–66. doi:10.1007/978-3-642-17841-2_4
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco
Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemom Intell Lab Syst 2(1–3):37–52. doi:10.1016/0169-7439(87)80084-9
Wu XD, Yue DM, Liu FL, Wang YF, Chu CH (2006) Privacy preserving data mining algorithms by data distortion. In: Proceedings of the international conference on management science and engineering, pp 223–228
Yang W, Qiao S (2010) A novel anonymization algorithm: privacy protection and knowledge preservation. Expert Syst Appl 37(1):756–766. doi:10.1016/j.eswa.2009.05.097
Zhang ML, Peña JM, Robles V (2009) Feature selection for multi-label naive Bayes classification. Inf Sci 179(19):3218–3229. doi:10.1016/j.ins.2009.06.010
Zhu D, Li XB, Wu S (2009) Identity disclosure protection: a data reconstruction approach for privacy-preserving data mining. Decis Support Syst 48(1):133–140. doi:10.1016/j.dss.2009.07.003
Zhu X, Davidson I (2007) Knowledge discovery and data mining: challenges and realities. information science reference. Hershey, New York
Acknowledgements
This work was supported partially by the National Science Council of the Republic of China under Grant NSC 100-2221-E-025-014-MY2.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, TS., Lee, WB., Chen, J. et al. Reversible privacy preserving data mining: a combination of difference expansion and privacy preserving. J Supercomput 66, 907–917 (2013). https://doi.org/10.1007/s11227-013-0926-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-013-0926-7