Skip to main content
Log in

MLChain: a privacy-preserving model learning framework using blockchain

  • Regular Contribution
  • Published:
International Journal of Information Security Aims and scope Submit manuscript

Abstract

In this work, we present a blockchain-based secure and flexible distributed privacy-preserving online model that helps in sharing key features of datasets across multiple organizations without violating the privacy of data. In our model, all members are encouraged to participate, discouraged to write fake data. Learning is carried out without sharing of raw data, and data sharing is immutable that improves prediction results of the data held by each member of an industry. We also propose a new consensus algorithm—Proof of Share for adding a valid transaction to the blockchain, thus preventing non participating members from reading any of the data shared by the peer and discouraging fake writes. We evaluated our model on 3, 5, and 10 members setup by applying decision tree, logistic regression, Gaussian naive Bayes, and support vector machine classifiers. The maximum increase of \(26.9231\%\) was observed in accuracy where results of a member’s data were taken as baseline. \(F_{\beta }(\beta =0.5)\) score increased by 0.4533 and \(F_{1}\) score by 0.0800. The proposed model to the best of our knowledge is the only one that encourages all members to participate, rather than being passive listeners and discourages a member from forging results thus rendering it suitable for utilization in domains like health care, finance, education, etc. where data are unevenly split and secrecy of data and peers is required.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data availability

Data are publicly available on UCI machine learning Repository https://archive.ics.uci.edu/ml/datasets/EEG+Eye+State [37]

References

  1. Nakamoto, S.: Bitcoin: a peer-to-peer electronic cash system. Technical Report Manubot (2019)

  2. Zheng, Z., Xie, S., Dai, H.-N., Chen, X., Wang, H.: Blockchain challenges and opportunities: a survey. 505 Int. J. Web Grid Serv. 14, 352–375 (2018)

    Article  Google Scholar 

  3. Kuo, T.-T., Ohno-Machado, L.: Modelchain: decentralized privacy-preserving healthcare predictive modeling framework on private blockchain networks. arXiv:1802.01746 (2018)

  4. Omar, I.A., Jayaraman, R., Salah, K., Yaqoob, I., Ellahham, S.: Applications of blockchain technology in clinical trials: review and open challenges. Arabian J. Sci. Eng. 46, 3001–3015 (2020)

    Article  Google Scholar 

  5. Yuølnes, S., Ubacht, J., Janssen, M.: Blockchain in government: benefits and implications of distributed ledger technology for information sharing. Gov. Inf. Q. 34, 355–364 (2017)

    Article  Google Scholar 

  6. Vacca, A., Di Sorbo, A., Visaggio, C.A., Canfora, G.: A systematic literature review of blockchain and smart contract development: techniques, tools, and open challenges. J. Syst. Softw. 174, 110891 (2021). https://doi.org/10.1016/j.jss.2020.110891

    Article  Google Scholar 

  7. Liu, M., Wu, K., Xu, J.J.: How will blockchain technology impact auditing and accounting: permissionless versus permissioned blockchain. Current Issues Audit. 13, A19–A29 (2019)

    Article  Google Scholar 

  8. Mingxiao, D., Xiaofeng, M., Zhe, Z., Xiangwei, W., Qijun, C.: A review on consensus algorithm of blockchain. In: 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 2567–2572. IEEE (2017)

  9. Woo, T.Y., Lam, S.S.: Authentication for distributed systems. Computer 25, 39–52 (1992)

    Article  Google Scholar 

  10. Swain, P.H., Hauska, H.: The decision tree classifier: design and potential. IEEE Trans. Geosci. Electron. 15, 142–147 (1977)

    Article  Google Scholar 

  11. Song, Y.-Y., Ying, L.: Decision tree methods: applications for classification and prediction. Shanghai Arch. Psychiatry 27, 130 (2015)

    Google Scholar 

  12. Wright, R.E.: Logistic regression (1995)

  13. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  Google Scholar 

  14. Langley, P., Iba, W., Thompson, K. et al.: An analysis of Bayesian classifiers. In: Aaai pp. 223–228. Citeseer volume 90, (1992)

  15. Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the 5th Annual Workshop on Computational Learning Theory, pp. 144–152 (1992)

  16. Wu, Y., Jiang, X., Kim, J., Ohno-Machado, L.: G rid Binary LO gistic RE gression (GLORE): building shared models without sharing data. J. Am. Med. Inf. Assoc. 19, 758–764 (2012)

    Article  Google Scholar 

  17. Jiang, W., Li, P., Wang, S., Wu, Y., Xue, M., Ohno-Machado, L., Jiang, X.: Webglore: a web service for grid logistic regression. Bioinformatics 29, 3238–3240 (2013)

    Article  Google Scholar 

  18. Shi, H., Jiang, C., Dai, W., Jiang, X., Tang, Y., Ohno-Machado, L., Wang, S.: Secure multi-pArty computation grid LOgistic REgression (SMAC-GLORE). BMC Med. Inform. Decis. Mak. 16, 175–187 (2016)

    Article  Google Scholar 

  19. Wang, S., Jiang, X., Wu, Y., Cui, L., Cheng, S., Ohno-Machado, L.: Expectation propagation logistic regression (explorer): distributed privacy-preserving online model learning. J. Biomed. Inf. 46, 480–496 (2013)

    Article  Google Scholar 

  20. Li, Y., Jiang, X., Wang, S., Xiong, H., Ohno-Machado, L.: Vertical grid logistic regression (vertigo). J. Am. Med. Inform. Assoc. 23, 570–579 (2016)

    Article  Google Scholar 

  21. Huang, L., Shea, A.L., Qian, H., Masurkar, A., Deng, H., Liu, D.: Patient clustering improves efficiency of federated machine learning to predict mortality and hospital stay time using distributed electronic medical records. J. Biomed. Inform. 99, 103291 (2019)

    Article  Google Scholar 

  22. Wang, S., Chang, T.-H.: Federated clustering via matrix factorization models: from model averaging to gradient sharing. arXiv:2002.04930, (2020)

  23. Mohassel, P., Zhang, Y.: Secureml: A system for scalable privacy-preserving machine learning. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 19–38. IEEE (2017)

  24. Shokri, R., Shmatikov, V.: Privacy-preserving deep learning. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 1310–1321 (2015)

  25. Phuong, T.T., et al.: Privacy-preserving deep learning via weight transmission. IEEE Trans. Inf. Forensics Secur. 14, 3003–3015 (2019)

    Article  Google Scholar 

  26. Aono, Y., Hayashi, T., Wang, L., Moriai, S., et al.: Privacy-preserving deep learning via additively homomorphic encryption. IEEE Trans. Inf. Forensics Secur. 13, 1333–1345 (2017)

    Google Scholar 

  27. Brisimi, T.S., Chen, R., Mela, T., Olshevsky, A., Paschalidis, I.C., Shi, W.: Federated learning of predictive models from federated electronic health records. Int. J Med. Inform. 112, 59–67 (2018)

    Article  Google Scholar 

  28. Duan, M., Liu, D., Chen, X., Tan, Y., Ren, J., Qiao, L., Liang, L.: Astraea: Self-balancing federated learning for improving classification accuracy of mobile deep learning applications. In: 2019 IEEE 37th International Conference on Computer Design (ICCD), pp. 246–254. IEEE (2019)

  29. Xie, M., Long, G., Shen, T., Zhou, T., Wang, X., Jiang, J.: Multi-center federated learning. arXiv:2005.01026, (2020)

  30. Kim, Y., Hakim, E. A., Haraldson, J., Eriksson, H., Silva Jr., J. M.B.D., Fischione, C.: Dynamic clustering in federated learning. arXiv:2012.03788 (2020)

  31. Choudhury, O., Gkoulalas-Divanis, A., Salonidis, T., Sylla, I., Park, Y., Hsu, G., Das, A.: Differential privacyenabled federated learning for sensitive health data. arXiv:1910.02578, (2019)

  32. Bouacida, N., Mohapatra, P.: Vulnerabilities in federated learning. IEEE Access 23(9), 63229–49 (2021)

    Article  Google Scholar 

  33. Kuo, T.-T., Kim, J., Gabriel, R.A.: Privacy-preserving model learning on a blockchain network-of networks. J. Am. Med. Inform. Assoc. 27, 343–354 (2020)

    Article  Google Scholar 

  34. Kuo, T.-T., Gabriel, R.A., Ohno-Machado, L.: Fair compute loads enabled by blockchain: sharing models by alternating client and server roles. J. Am. Med. Inform. Assoc. 26, 392–403 (2019)

    Article  Google Scholar 

  35. Kuo, T.-T., Gabriel, R.A., Cidambi, K.R., Ohno-Machado, L.: Ex pectation p ropagation logistic regression on permissioned block chain (explorerchain): decentralized online healthcare/genomics predictive model learning. J. Am. Med. Inform. Assoc. 27, 747–756 (2020)

    Article  Google Scholar 

  36. Kennedy, R.L., Fraser, H.S., McStay, L.N., Harrison, R.F.: Early diagnosis of acute myocardial infarction using clinical and electrocardiographic data at presentation: derivation and evaluation of logistic regression models. Eur. Heart J. 17(8), 1181–91 (1996)

    Article  Google Scholar 

  37. Dua, D., Graff, C.: UCI machine learning repository. URL:http://archive.ics.uci.edu/ml (2017)

  38. Jere, M.S., Farnan, T., Koushanfar, F.: A taxonomy of attacks on federated learning. IEEE Secur. Privacy 19(2), 20–8 (2020)

    Article  Google Scholar 

  39. Issa, W., Moustafa, N., Turnbull, B., Sohrabi, N., Tari, Z.: Blockchain-based federated learning for securing internet of things: a comprehensive survey. ACM Comput. Surv. 55(9), 1–43 (2023)

    Article  Google Scholar 

  40. Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1, 81–106 (1986)

    Article  Google Scholar 

  41. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Routledge, Milton Park (2017)

    Book  Google Scholar 

  42. Daemen, J., Rijmen, V.: Aes proposal: Rijndael, (1999)

  43. Standard, D.E., et al.: Data encryption standard. Federal Information Processing Standards Publication, 112 (1999)

  44. Kim, H., Park, J., Bennis, M., Kim, S.L.: Blockchained on-device federated learning. IEEE Commun. Lett. 24(6), 1279–1283 (2019)

    Article  Google Scholar 

  45. Short, A.R., Leligou, H.C., Papoutsidakis, M., Theocharis, E.: Using blockchain technologies to improve security in federated learning systems. In: 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC), pp. 1183-1188. IEEE (2020 Jul 13)

  46. Yin, X., Zhu, Y., Hu, J.: A comprehensive survey of privacy-preserving federated learning: a taxonomy, review, and future directions. ACM Comput. Surv. (CSUR) 54(6), 1–36 (2021)

    Article  Google Scholar 

  47. Wei, K., Li, J., Ding, M., Ma, C., Yang, H.H., Farokhi, F., Jin, S., Quek, T.Q., Poor, H.V.: Federated learning with differential privacy: algorithms and performance analysis. IEEE Trans. Inf. Forensics Secur. 17(15), 3454–69 (2020)

    Article  Google Scholar 

  48. Li, N., Li, T., Venkatasubramanian, S.: t-closeness: privacy beyond k-anonymity and l-diversity. In: 2007 IEEE 23rd International Conference on Data Engineering, pp. 106-115. IEEE (2006, April)

Download references

Funding

The study has not been funded by any institute or agency.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohona Ghosh.

Ethics declarations

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 Tables and figures of results on 3-node setup

See Figs. 12, 13, 14, 15. Tables 11, 12, 13, 14.

Fig. 13
figure 13

Performance of logistic regression on 3-node setup, a accuracy, b increase in accuracy, c \(F_{1}\) score, d \(F_{\beta }(\beta =0.5)\) score

Fig. 14
figure 14

Performance of Gaussian naive Bayes on 3-node setup, a accuracy, b increase in accuracy, c \(F_{1}\) score, d \(F_{\beta }(\beta =0.5)\) score

Fig. 15
figure 15

Support vector machine results on 3-node setup, a accuracy, b increase in accuracy, c \(F_{1}\) score, d \(F_{\beta }(\beta =0.5)\) score

Table 12 Performance of Logistic Regression on 3 node setup
Table 13 Performance of Gaussian naive Bayes on 3 node setup
Table 14 Performance of support vector machine on 3 node setup

1.2 Figures of results on 5-node setup with a subset of EEG dataset

See Figs. 16, 17, 18, 19.

Fig. 16
figure 16

Performance of decision tree on 5-node setup, a accuracy, b increase in accuracy, c \(F_{1}\) score, d \(F_{\beta }(\beta =0.5)\) score

Fig. 17
figure 17

Performance of logistic regression on 5-node setup, a accuracy, b increase in accuracy, c \(F_{1}\) score d \(F_{\beta }(\beta =0.5)\) score

Fig. 18
figure 18

Performance of Gaussian naive Bayes on 5-node setup, a accuracy, b increase in accuracy, c \(F_{1}\) score, d \(F_{\beta }(\beta =0.5)\) score

Fig. 19
figure 19

Support vector machine results on 5-node setup, a accuracy, b increase in accuracy, c \(F_{1}\) score, d \(F_{\beta }(\beta =0.5)\) score

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bansal, V., Baliyan, N. & Ghosh, M. MLChain: a privacy-preserving model learning framework using blockchain. Int. J. Inf. Secur. 23, 649–677 (2024). https://doi.org/10.1007/s10207-023-00754-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10207-023-00754-3

Keywords

Navigation