Reversible privacy preserving data mining: a combination of difference expansion and privacy preserving

Chen, Tung-Shou; Lee, Wei-Bin; Chen, Jeanne; Kao, Yuan-Hung; Hou, Pei-Wen

doi:10.1007/s11227-013-0926-7

Reversible privacy preserving data mining: a combination of difference expansion and privacy preserving

Published: 10 April 2013

Volume 66, pages 907–917, (2013)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Tung-Shou Chen¹,
Wei-Bin Lee²,
Jeanne Chen¹,
Yuan-Hung Kao² &
…
Pei-Wen Hou¹

659 Accesses
14 Citations
Explore all metrics

Abstract

Privacy Preserving Data Mining (PPDM) can prevent private data from disclosure in data mining. However, the current PPDM methods damaged the values of original data where knowledge from the mined data cannot be verified from the original data. In this paper, we combine the concept and technique based on the reversible data hiding to propose the reversible privacy preserving data mining scheme in order to solve the irrecoverable problem of PPDM. In the proposed privacy difference expansion (PDE) method, the original data is perturbed and embedded with a fragile watermark to accomplish privacy preserving and data integrity of mined data and to also recover the original data. Experimental tests are performed on classification accuracy, probabilistic information loss, and privacy disclosure risk used to evaluate the efficiency of PDE for privacy preserving and knowledge verification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Big healthcare data: preserving security and privacy

Article Open access 09 January 2018

Big data privacy: a technological perspective and review

Article Open access 26 November 2016

Big Data Security and Privacy

References

Aggarwal CC, Yu PS (2008) Privacy-preserving data mining: models and algorithms. Springer, Berlin
Book Google Scholar
Agrawal R, Srikant R (2000) Privacy-preserving data mining. SIGMOD Rec. 29(2):439–450. doi:10.1145/335191.335438
Article Google Scholar
Amari S, Wu S (1999) Improving support vector machine classifiers by modifying kernel functions. Neural Netw 12(6):783–789. doi:10.1016/S0893-6080(99)00032-5
Article Google Scholar
Census Bureau US (2011) Census bureau homepage. http://www.census.gov/. Accessed 20 June 2011
Chen C, Chen L, Zou X, Cai P (2009) Prediction of protein secondary structure content by using the concept of Chou’s pseudo amino acid composition and support vector machine. Protein Pept Lett 16(1):27–31. doi:10.2174/092986609787049420
Article Google Scholar
Chen TS, Chen J, Lin YC, Tsai YC (2009) Research to protect database by shaking random sampling interference (SRSI). In: Proceedings of the 2009 global congress on intelligent systems, pp 569–572. doi:10.1109/GCIS.2009.384
Chapter Google Scholar
Chun JY, Hong D, Jeong IR, Lee DH (2011) Privacy-preserving disjunctive normal form operations on distributed sets. Bibliogr. - Inst. Presse Sci. Inf. 231:113–122
MathSciNet Google Scholar
Domingo-Ferrer J, Mateo-Sanz JM, Torra V (2001) Comparing SDC methods for microdata on the basis of information loss and disclosure risk. In: Proceedings of the international conference on new techniques and technologies for statistics: exchange of technology and knowhow, pp 807–826
Google Scholar
Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml/. Accessed 1 March 2011
Furey TS, Cristianini N, Duffy N, Bednarski DW (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10):906–914. doi:10.1093/bioinformatics/16.10.906
Article Google Scholar
Fung BCM, Wang K, Yu PS (2007) Anonymizing classification data for privacy preservation. IEEE Trans Knowl Data Eng 19(5):711–725. doi:10.1109/TKDE.2007.1015
Article Google Scholar
Herranz J, Matwin S, Nin J, Torra V (2010) Classifying data from protected statistical datasets. Comput Secur 29(8):874–890. doi:10.1016/j.cose.2010.05.005
Article Google Scholar
Hong W, Chen TS (2011) Reversible data embedding for high quality images using interpolation and reference pixel distribution mechanism. J Vis Commun Image Represent 22(2):131–140. doi:10.1016/j.jvcir.2010.11.004
Article MathSciNet Google Scholar
Hong TP, Tseng LH, Chien BC (2010) Mining from incomplete quantitative data by fuzzy rough sets. Expert Syst Appl 37(3):2644–2653. doi:10.1016/j.eswa.2009.08.002
Article Google Scholar
Impagliazzo R, Shaltiel R, Wigderson A (2000) Extractors and pseudo-random generators with optimal seed length. In: Proceedings of the thirty-second annual ACM symposium on theory of computing, pp 1–10. doi:10.1145/335305.335306
Google Scholar
Jolliffe IT (2002) Principal component analysis. Springer, New York
MATH Google Scholar
Kabir SMA, Youssef AM, Elhakeem AK (2007) On data distortion for privacy preserving data mining. In: Proceedings of the 20th Canadian conference on electrical and computer engineering, pp 308–311. doi:10.1109/CCECE.2007.83
Google Scholar
Kim JJ, Winkler WE (2003) Multiplicative noise for masking continuous data. Census statistical research report series: RRS2003/01. http://www.census.gov/srd/papers/pdf/rrs2003-01.pdf
Liu K, Kargupta H (2006) Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Trans Knowl Data Eng 18(1):92–106. doi:10.1109/TKDE.2006.14
Article Google Scholar
Matatov N, Rokach L, Maimon O (2010) Privacy-preserving data mining: a feature set partitioning approach. Inf Sci 180(14):2696–2720. doi:10.1016/j.ins.2010.03.011
Article MathSciNet Google Scholar
Mateo-Sanz JM, Domingo-Ferrer J, Sebé F (2005) Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Min Knowl Discov 11(2):181–193. doi:10.1007/s10618-005-0011-9
Article MathSciNet Google Scholar
Saygin Y, Verykios VS, Clifton C (2001) Using unknowns to prevent discovery of association rules. SIGMOD Rec. 30(4):45–54. doi:10.1145/604264.604271
Article Google Scholar
Tian J (2003) Reversible data embedding using a difference expansion. IEEE Trans Circuits Syst Video Technol 13(8):890–896. doi:10.1109/TCSVT.2003.815962
Article Google Scholar
Wang XZ Dong LC, Yan JH (2012) Maximum ambiguity-based sample selection in fuzzy decision tree induction. IEEE Trans Knowl Data Eng 24(8):1491–1505. doi:10.1109/TKDE.2011.67
Article Google Scholar
Wehrens R (2011) Principal component analysis. In: Chemometrics with R, use R. Springer, Berlin, pp 43–66. doi:10.1007/978-3-642-17841-2_4
Chapter Google Scholar
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco
Google Scholar
Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemom Intell Lab Syst 2(1–3):37–52. doi:10.1016/0169-7439(87)80084-9
Article Google Scholar
Wu XD, Yue DM, Liu FL, Wang YF, Chu CH (2006) Privacy preserving data mining algorithms by data distortion. In: Proceedings of the international conference on management science and engineering, pp 223–228
Google Scholar
Yang W, Qiao S (2010) A novel anonymization algorithm: privacy protection and knowledge preservation. Expert Syst Appl 37(1):756–766. doi:10.1016/j.eswa.2009.05.097
Article Google Scholar
Zhang ML, Peña JM, Robles V (2009) Feature selection for multi-label naive Bayes classification. Inf Sci 179(19):3218–3229. doi:10.1016/j.ins.2009.06.010
Article MATH Google Scholar
Zhu D, Li XB, Wu S (2009) Identity disclosure protection: a data reconstruction approach for privacy-preserving data mining. Decis Support Syst 48(1):133–140. doi:10.1016/j.dss.2009.07.003
Article Google Scholar
Zhu X, Davidson I (2007) Knowledge discovery and data mining: challenges and realities. information science reference. Hershey, New York
Book Google Scholar

Download references

Acknowledgements

This work was supported partially by the National Science Council of the Republic of China under Grant NSC 100-2221-E-025-014-MY2.

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, National Taichung University of Science and Technology, Taichung City, 404, Taiwan, R.O.C.
Tung-Shou Chen, Jeanne Chen & Pei-Wen Hou
Department of Information Engineering and Computer Science, Feng Chia University, Taichung City, 407, Taiwan, R.O.C.
Wei-Bin Lee & Yuan-Hung Kao

Authors

Tung-Shou Chen
View author publications
You can also search for this author in PubMed Google Scholar
Wei-Bin Lee
View author publications
You can also search for this author in PubMed Google Scholar
Jeanne Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yuan-Hung Kao
View author publications
You can also search for this author in PubMed Google Scholar
Pei-Wen Hou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuan-Hung Kao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, TS., Lee, WB., Chen, J. et al. Reversible privacy preserving data mining: a combination of difference expansion and privacy preserving. J Supercomput 66, 907–917 (2013). https://doi.org/10.1007/s11227-013-0926-7

Download citation

Published: 10 April 2013
Issue Date: November 2013
DOI: https://doi.org/10.1007/s11227-013-0926-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reversible privacy preserving data mining: a combination of difference expansion and privacy preserving

Abstract

Access this article

Similar content being viewed by others

Big healthcare data: preserving security and privacy

Big data privacy: a technological perspective and review

Big Data Security and Privacy

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Reversible privacy preserving data mining: a combination of difference expansion and privacy preserving

Abstract

Access this article

Similar content being viewed by others

Big healthcare data: preserving security and privacy

Big data privacy: a technological perspective and review

Big Data Security and Privacy

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation