Privacy-preserving of SVM over vertically partitioned with imputing missing data

Omer, Mohammed Z.; Gao, Hui; Mustafa, Nadir

doi:10.1007/s10619-017-7203-3

Privacy-preserving of SVM over vertically partitioned with imputing missing data

Published: 09 September 2017

Volume 35, pages 363–382, (2017)
Cite this article

Distributed and Parallel Databases Aims and scope Submit manuscript

Mohammed Z. Omer^1,2,
Hui Gao^1,2 &
Nadir Mustafa¹

606 Accesses
9 Citations
Explore all metrics

Abstract

Most distributed data mining algorithms can efficiently manage and mine complete data from distributed resources. However, for an incomplete data some modifications are required in order to perform distributed data mining techniques and maintaining the privacy of the sensitive information to provide pretty good results of data mining. Classification is important tasks of data mining aimed at discovering knowledge and classify new instances. SVM is classified as one of the most important algorithm used for classification problems in several various spheres. In this paper, we proposed a new distributed privacy-preserving protocol with multiple imputations of missing or incomplete data. More so, a multiple imputations based on multivariate imputation by chained equations is used for missing data and Paillier cryptosystem for maintaining the privacy of the participants. Finally we constructed a global SVM model by introducing a third party (semi-honest approach) over vertical partition data based in Gram matrix without revealing the privacy of the data and used to classify new instances. The performance evolution of the proposed protocol was investigated while using accuracy metric on the distributed and centralized data. Results of our experiments reveal that the accuracy is the same as centralized data and achieve better results with imputed data while compared with omitted data. The performance of distributed data on our protocol achieves better processing time compared with centralized data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Secure and scalable deduplication of horizontally partitioned health data for privacy-preserving distributed statistical computation

Article Open access 03 January 2017

Multiple Imputation Inference for Missing Values in Distributed Datasets Using Apache Spark

Privacy Preserving Distributed Data Mining with Evolutionary Computing

References

Oliveira, S.R., Zaïane, O.R.: A privacy-preserving clustering approach toward secure and effective data analysis for business collaboration. Comput. Secur. 26(1), 81–93 (2007)
Article Google Scholar
Mariscal, G., Marbán, Ó., Fernández, C.: A survey of data mining and knowledge discovery process models and methodologies. Knowl. Eng. Rev. 25(02), 137–166 (2010)
Article Google Scholar
Maimon, O., Rokach, L.: Introduction to knowledge discovery and data mining. Data Mining and Knowledge Discovery Handbook, pp. 1–15. Springer, New York (2010)
Chapter Google Scholar
Wang, J., Luo, Y., Zhao, Y., Le, J.: A survey on privacy preserving data mining. In: 2009 First International Workshop on Database Technology and Applications, pp. 111–114, 2009
Jagannathan, G., Wright, R.N.: Privacy-preserving imputation of missing data. Data Knowl. Eng. 65(1), 40–56 (2008)
Article Google Scholar
Lin, K.-P., Chen, M.-S.: On the design and analysis of the privacy-preserving svm classifier. IEEE Trans. Knowl. Data Eng. 23(11), 1704–1717 (2011)
Article Google Scholar
Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of data–SIGMOD 00, pp. 439–450, 2000
Sun, C., Gao, H., Zhou, J., Fu, Y., She, L.: A new hybrid approach for privacy preserving distributed data mining. IEICE Trans. Inf. Syst 97(4), 876–883 (2014)
Article Google Scholar
Zhou, J., Cao, Z., Dong, X., Lin, X.: Ppdm: a privacy-preserving protocol for cloud-assisted e-healthcare systems. IEEE J. Sel. Top. Signal Process. 9(7), 1332–1344 (2015)
Article Google Scholar
Ahuja, S.P., Mani, S., Zambrano, J.: A survey of the state of cloud computing in healthcare. Netw. Commun. Technol. 1(2), 12 (2012)
Google Scholar
Grobauer, B., Walloschek, T., Stocker, E.: Understanding cloud computing vulnerabilities. IEEE Secur. Priv. 9(2), 50–57 (2011)
Article Google Scholar
Voas, J., Zhang, J.: Cloud computing: new wine or just a new bottle? IT Prof. 11(2), 15–17 (2009)
Article Google Scholar
Bhagyashree, A., and Borkar, V.: Data mining in cloud computing. In: MPGI National Multi Conference, pp. 7–8. 2012
Graham, J.W.: Missing data analysis: making it work in the real world. Annu. Rev. Psychol. 60, 549–576 (2009)
Article Google Scholar
Schenker, N., Raghunathan, T.E., Chiu, P.-L., Makuc, D.M., Zhang, G., Cohen, A.J.: Multiple imputation of missing income data in the national health interview survey. J. Am. Stat. Assoc. 101(475), 924–933 (2006)
Article MATH MathSciNet Google Scholar
Yuan, Y.: Multiple Imputation for Missing Data: Concepts and New Development, pp. 1–3. SAS Institute Inc, Rockville, MD (2010)
Google Scholar
Burges, C.J.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2(2), 121–167 (1998)
Article Google Scholar
Zhang, K., Lan, L., Wang, Z., Moerchen, F.: Scaling up Kernel SVM on limited resources: a low-rank linearization approach. Artif. Intell. Stat. 22, 1425–1434 (2012)
Google Scholar
Paillier, P.: Public-key cryptosystems based on composite degree residuosity classes. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) vol. 1592, pp. 223–238 (1999)
Nishide, T., Sakurai, K.: Distributed Paillier cryptosystem without trusted dealer. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) vol. 6513, LNCS, pp. 44–60 (2011)
Rahulamathavan, Y., Veluru, S., Phan, R.C.W., Chambers, J.A., Rajarajan, M.: Privacy-preserving clinical decision support system using gaussian kernel-based classification. IEEE J. Biomed. Heal. Inform. 18(1), 56–66 (2014)
Article Google Scholar
Sen, J.: Homomorphic Encryption: Theory and Applications, arXiv:1305.5886 pp. 1–32, 2013
Brickell, J., Shmatikov, V.: Privacy-preserving graph algorithms in the semi-honest model. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 3788, LNCS, pp. 236–252 (2005)
Hardt, J., Herke, M., Brian, T., Laubach, W.: Multiple imputation of missing data: a simulation study on a binary response. Open J. Stat. 3, 370–378 (2013)
Article Google Scholar
Azur, M.J., Stuart, E.A., Frangakis, C., Leaf, P.J.: Multiple imputation by chained equations: what is it and how does it work? Int. J. Methods Psychiatr. Res. 20(1), 40–49 (2011)
Article Google Scholar
Seera, Manjeevan, Lim, Chee Peng: A hybrid intelligent system for medical data classification. Expert Syst. Appl. 41(5), 2239–2249 (2014)
Article Google Scholar
Lu, Y., Gao, Y., Cao, Z., Cui, J., Dong, Z., Tian, Y., Xu, Y.: A study of health effects of long-distance ocean voyages on seamen using a data classification approach. BMC Med. Inform. Decis. Mak. 10(1), 13 (2010)
Article Google Scholar
Yu, W., Liu, T., Valdez, R., Gwinn, M., Khoury, M.J.: Application of support vector machine modeling for prediction of common diseases: the case of diabetes and pre-diabetes. BMC Med. Inform. Decis. Mak. 10(1), 16 (2010)
Article Google Scholar
H. Office for Civil Rights: Standards for privacy of individually identifiable health information final rule. Federal Regist. 67(157), 53141 (2002)
Google Scholar
De Hert, P., Papakonstantinou, V.: The proposed data protection Regulation replacing Directive 95/46/EC: a sound system for the protection of individuals. Comput. Law Secur. Rev. 28(2), 130–142 (2012)
Article Google Scholar
Yu, H., Vaidya, J., Jiang, X.: Privacy-preserving SVM classification on vertically partitioned data. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 3918 LNAI, pp. 647–656, 2006
Vaidya, J., Yu, H., Jiang, X.: Privacy-preserving svm classification. Knowl. Inf. Syst. 14(2), 161–178 (2008)
Article Google Scholar
Que, J., Jiang, X., Ohno-Machado, L.: A collaborative framework for distributed privacy-preserving support vector machine learning. AMIA Annu. Symp. Proc. 2012, 1350–9 (2012)
Google Scholar
Kaambwa, B., Bryan, S., Billingham, L.: Do the methods used to analyze missing data really matter? an examination of data from an observational study of intermediate care patients. BMC Res. Notes 5(1), 330 (2012)
Article Google Scholar
Sainani, K.L.: Dealing with missing data. PMR 7(9), 990–994 (2015)
Article Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
Campbell, C., Ying, Y.: Learning with support vector machines. Synth. Lectures Artif. Intell. Mach. Learn. 5(1), 1–95 (2011)
Article MATH Google Scholar
Ben-Hur, A., Ong, C.S., Sonnenburg, S., Schölkopf, B., Rätsch, G.: Support vector machines and kernels for computational biology. PLoS Comput. Biol. 4(10), e1000173 (2008)
Article Google Scholar
Raghunathan, T.E.: What do we do with missing data? some options for analysis of incomplete data. Annu. Rev. Public Health 25(1), 99–117 (2004)
Article Google Scholar
Royston, Patrick, White, Ian R.: Multiple imputation by chained equations (MICE): implementation in Stata. J. Stat. Softw. 45(4), 1–20 (2011)
Article Google Scholar
Wolberg, W.H., Street, W.N., Mangasarian, O.L.: Breast cancer wisconsin (diagnostic) data set, UCI Machine Learning Repository, 1992
Wolberg, W.H.: Breast cancer wisconsin (original) data set. UCI Machine Learning Repository, (1992)

Download references

Author information

Authors and Affiliations

School of Computer Science & Engineering, UESTC, Chengdu, 611731, China
Mohammed Z. Omer, Hui Gao & Nadir Mustafa
Big Data Research Centre, UESTC, Chengdu, 611731, China
Mohammed Z. Omer & Hui Gao

Authors

Mohammed Z. Omer
View author publications
You can also search for this author in PubMed Google Scholar
Hui Gao
View author publications
You can also search for this author in PubMed Google Scholar
Nadir Mustafa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammed Z. Omer.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Omer, M.Z., Gao, H. & Mustafa, N. Privacy-preserving of SVM over vertically partitioned with imputing missing data. Distrib Parallel Databases 35, 363–382 (2017). https://doi.org/10.1007/s10619-017-7203-3

Download citation

Published: 09 September 2017
Issue Date: December 2017
DOI: https://doi.org/10.1007/s10619-017-7203-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Privacy-preserving of SVM over vertically partitioned with imputing missing data

Abstract

Access this article

Similar content being viewed by others

Secure and scalable deduplication of horizontally partitioned health data for privacy-preserving distributed statistical computation

Multiple Imputation Inference for Missing Values in Distributed Datasets Using Apache Spark

Privacy Preserving Distributed Data Mining with Evolutionary Computing

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Privacy-preserving of SVM over vertically partitioned with imputing missing data

Abstract

Access this article

Similar content being viewed by others

Secure and scalable deduplication of horizontally partitioned health data for privacy-preserving distributed statistical computation

Multiple Imputation Inference for Missing Values in Distributed Datasets Using Apache Spark

Privacy Preserving Distributed Data Mining with Evolutionary Computing

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation