Skip to main content
Log in

Privacy-preserving of SVM over vertically partitioned with imputing missing data

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

Most distributed data mining algorithms can efficiently manage and mine complete data from distributed resources. However, for an incomplete data some modifications are required in order to perform distributed data mining techniques and maintaining the privacy of the sensitive information to provide pretty good results of data mining. Classification is important tasks of data mining aimed at discovering knowledge and classify new instances. SVM is classified as one of the most important algorithm used for classification problems in several various spheres. In this paper, we proposed a new distributed privacy-preserving protocol with multiple imputations of missing or incomplete data. More so, a multiple imputations based on multivariate imputation by chained equations is used for missing data and Paillier cryptosystem for maintaining the privacy of the participants. Finally we constructed a global SVM model by introducing a third party (semi-honest approach) over vertical partition data based in Gram matrix without revealing the privacy of the data and used to classify new instances. The performance evolution of the proposed protocol was investigated while using accuracy metric on the distributed and centralized data. Results of our experiments reveal that the accuracy is the same as centralized data and achieve better results with imputed data while compared with omitted data. The performance of distributed data on our protocol achieves better processing time compared with centralized data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Oliveira, S.R., Zaïane, O.R.: A privacy-preserving clustering approach toward secure and effective data analysis for business collaboration. Comput. Secur. 26(1), 81–93 (2007)

    Article  Google Scholar 

  2. Mariscal, G., Marbán, Ó., Fernández, C.: A survey of data mining and knowledge discovery process models and methodologies. Knowl. Eng. Rev. 25(02), 137–166 (2010)

    Article  Google Scholar 

  3. Maimon, O., Rokach, L.: Introduction to knowledge discovery and data mining. Data Mining and Knowledge Discovery Handbook, pp. 1–15. Springer, New York (2010)

    Chapter  Google Scholar 

  4. Wang, J., Luo, Y., Zhao, Y., Le, J.: A survey on privacy preserving data mining. In: 2009 First International Workshop on Database Technology and Applications, pp. 111–114, 2009

  5. Jagannathan, G., Wright, R.N.: Privacy-preserving imputation of missing data. Data Knowl. Eng. 65(1), 40–56 (2008)

    Article  Google Scholar 

  6. Lin, K.-P., Chen, M.-S.: On the design and analysis of the privacy-preserving svm classifier. IEEE Trans. Knowl. Data Eng. 23(11), 1704–1717 (2011)

    Article  Google Scholar 

  7. Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of data–SIGMOD 00, pp. 439–450, 2000

  8. Sun, C., Gao, H., Zhou, J., Fu, Y., She, L.: A new hybrid approach for privacy preserving distributed data mining. IEICE Trans. Inf. Syst 97(4), 876–883 (2014)

    Article  Google Scholar 

  9. Zhou, J., Cao, Z., Dong, X., Lin, X.: Ppdm: a privacy-preserving protocol for cloud-assisted e-healthcare systems. IEEE J. Sel. Top. Signal Process. 9(7), 1332–1344 (2015)

    Article  Google Scholar 

  10. Ahuja, S.P., Mani, S., Zambrano, J.: A survey of the state of cloud computing in healthcare. Netw. Commun. Technol. 1(2), 12 (2012)

    Google Scholar 

  11. Grobauer, B., Walloschek, T., Stocker, E.: Understanding cloud computing vulnerabilities. IEEE Secur. Priv. 9(2), 50–57 (2011)

    Article  Google Scholar 

  12. Voas, J., Zhang, J.: Cloud computing: new wine or just a new bottle? IT Prof. 11(2), 15–17 (2009)

    Article  Google Scholar 

  13. Bhagyashree, A., and Borkar, V.: Data mining in cloud computing. In: MPGI National Multi Conference, pp. 7–8. 2012

  14. Graham, J.W.: Missing data analysis: making it work in the real world. Annu. Rev. Psychol. 60, 549–576 (2009)

    Article  Google Scholar 

  15. Schenker, N., Raghunathan, T.E., Chiu, P.-L., Makuc, D.M., Zhang, G., Cohen, A.J.: Multiple imputation of missing income data in the national health interview survey. J. Am. Stat. Assoc. 101(475), 924–933 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  16. Yuan, Y.: Multiple Imputation for Missing Data: Concepts and New Development, pp. 1–3. SAS Institute Inc, Rockville, MD (2010)

    Google Scholar 

  17. Burges, C.J.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2(2), 121–167 (1998)

    Article  Google Scholar 

  18. Zhang, K., Lan, L., Wang, Z., Moerchen, F.: Scaling up Kernel SVM on limited resources: a low-rank linearization approach. Artif. Intell. Stat. 22, 1425–1434 (2012)

    Google Scholar 

  19. Paillier, P.: Public-key cryptosystems based on composite degree residuosity classes. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) vol. 1592, pp. 223–238 (1999)

  20. Nishide, T., Sakurai, K.: Distributed Paillier cryptosystem without trusted dealer. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) vol. 6513, LNCS, pp. 44–60 (2011)

  21. Rahulamathavan, Y., Veluru, S., Phan, R.C.W., Chambers, J.A., Rajarajan, M.: Privacy-preserving clinical decision support system using gaussian kernel-based classification. IEEE J. Biomed. Heal. Inform. 18(1), 56–66 (2014)

    Article  Google Scholar 

  22. Sen, J.: Homomorphic Encryption: Theory and Applications, arXiv:1305.5886 pp. 1–32, 2013

  23. Brickell, J., Shmatikov, V.: Privacy-preserving graph algorithms in the semi-honest model. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 3788, LNCS, pp. 236–252 (2005)

  24. Hardt, J., Herke, M., Brian, T., Laubach, W.: Multiple imputation of missing data: a simulation study on a binary response. Open J. Stat. 3, 370–378 (2013)

    Article  Google Scholar 

  25. Azur, M.J., Stuart, E.A., Frangakis, C., Leaf, P.J.: Multiple imputation by chained equations: what is it and how does it work? Int. J. Methods Psychiatr. Res. 20(1), 40–49 (2011)

    Article  Google Scholar 

  26. Seera, Manjeevan, Lim, Chee Peng: A hybrid intelligent system for medical data classification. Expert Syst. Appl. 41(5), 2239–2249 (2014)

    Article  Google Scholar 

  27. Lu, Y., Gao, Y., Cao, Z., Cui, J., Dong, Z., Tian, Y., Xu, Y.: A study of health effects of long-distance ocean voyages on seamen using a data classification approach. BMC Med. Inform. Decis. Mak. 10(1), 13 (2010)

    Article  Google Scholar 

  28. Yu, W., Liu, T., Valdez, R., Gwinn, M., Khoury, M.J.: Application of support vector machine modeling for prediction of common diseases: the case of diabetes and pre-diabetes. BMC Med. Inform. Decis. Mak. 10(1), 16 (2010)

    Article  Google Scholar 

  29. H. Office for Civil Rights: Standards for privacy of individually identifiable health information final rule. Federal Regist. 67(157), 53141 (2002)

    Google Scholar 

  30. De Hert, P., Papakonstantinou, V.: The proposed data protection Regulation replacing Directive 95/46/EC: a sound system for the protection of individuals. Comput. Law Secur. Rev. 28(2), 130–142 (2012)

    Article  Google Scholar 

  31. Yu, H., Vaidya, J., Jiang, X.: Privacy-preserving SVM classification on vertically partitioned data. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 3918 LNAI, pp. 647–656, 2006

  32. Vaidya, J., Yu, H., Jiang, X.: Privacy-preserving svm classification. Knowl. Inf. Syst. 14(2), 161–178 (2008)

    Article  Google Scholar 

  33. Que, J., Jiang, X., Ohno-Machado, L.: A collaborative framework for distributed privacy-preserving support vector machine learning. AMIA Annu. Symp. Proc. 2012, 1350–9 (2012)

    Google Scholar 

  34. Kaambwa, B., Bryan, S., Billingham, L.: Do the methods used to analyze missing data really matter? an examination of data from an observational study of intermediate care patients. BMC Res. Notes 5(1), 330 (2012)

    Article  Google Scholar 

  35. Sainani, K.L.: Dealing with missing data. PMR 7(9), 990–994 (2015)

    Article  Google Scholar 

  36. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  37. Campbell, C., Ying, Y.: Learning with support vector machines. Synth. Lectures Artif. Intell. Mach. Learn. 5(1), 1–95 (2011)

    Article  MATH  Google Scholar 

  38. Ben-Hur, A., Ong, C.S., Sonnenburg, S., Schölkopf, B., Rätsch, G.: Support vector machines and kernels for computational biology. PLoS Comput. Biol. 4(10), e1000173 (2008)

    Article  Google Scholar 

  39. Raghunathan, T.E.: What do we do with missing data? some options for analysis of incomplete data. Annu. Rev. Public Health 25(1), 99–117 (2004)

    Article  Google Scholar 

  40. Royston, Patrick, White, Ian R.: Multiple imputation by chained equations (MICE): implementation in Stata. J. Stat. Softw. 45(4), 1–20 (2011)

    Article  Google Scholar 

  41. Wolberg, W.H., Street, W.N., Mangasarian, O.L.: Breast cancer wisconsin (diagnostic) data set, UCI Machine Learning Repository, 1992

  42. Wolberg, W.H.: Breast cancer wisconsin (original) data set. UCI Machine Learning Repository, (1992)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammed Z. Omer.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Omer, M.Z., Gao, H. & Mustafa, N. Privacy-preserving of SVM over vertically partitioned with imputing missing data. Distrib Parallel Databases 35, 363–382 (2017). https://doi.org/10.1007/s10619-017-7203-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-017-7203-3

Keywords

Navigation