Abstract
Medical image data is naturally distributed among clinical institutions. This partitioning, combined with security and privacy restrictions on medical data, imposes limitations on machine learning algorithms in clinical applications, especially for small and newly established institutions. We present InsuLearn: an intuitive and robust open-source (open-source code available at: https://github.com/DistributedML/InsuLearn) platform designed to facilitate distributed learning (classification and regression) on medical image data, while preserving data security and privacy. InsuLearn is built on ensemble learning, in which statistical models are developed at each institution independently and combined at secure coordinator nodes. InsuLearn protocols are designed such that the liveness of the system is guaranteed as institutions join and leave the network. Coordination is implemented as a cluster of replicated state machines, making it tolerant to individual node failures. We demonstrate that InsuLearn successfully integrates accurate models for horizontally partitioned data while preserving privacy.
This work is supported in part by the Institute for Computing, Information and Cognitive Systems (ICICS) at UBC.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Open-source code available at: https://github.com/DistributedML/InsuLearn.
- 2.
In fact \(h_i\) does not know the size of H nor the nodes in H.
References
Li, Y., Bai, C., Reddy, C.K.: A distributed ensemble approach for mining healthcare data under privacy constraints. Inf. Sci. 330, 245–259 (2016)
Ohno-Machado, L.: To share or not to share: that is not the question. Sci. Trans. Med. 4(165), 165cm15 (2012)
Fabian, B., Göthling, T.: Privacy-preserving data warehousing. Int. J. Bus. Intell. Data Min. 10(4), 297–336 (2015)
McMahan, H.B., Moore, E., Ramage, D., Hampson, S., Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: Artificial Intelligence and Statistics (2016)
Hamm, J., Cao, P., Belkin, M.: Learning privately from multiparty data. In: International Conference on Machine Learning, pp. 555–563 (2016)
Xie, L., Plis, S., Sarwate, A.D.: Data-weighted ensemble learning for privacy-preserving distributed learning. In: ICASSP, pp. 2309–2313. IEEE (2016)
Wu, Y., Jiang, X., Kim, J., Ohno-Machado, L.: Grid Binary LOgistic REgression (GLORE): building shared models without sharing data. J. Am. Med. Inform. Assoc. 19(5), 758–764 (2012)
Shokri, R., Shmatikov, V.: Privacy-preserving deep learning. In: Computer and Communications Security, pp. 1310–1321. ACM (2015)
Schneider, F.B.: Implementing fault-tolerant services using the state machine approach: a tutorial. ACM Comput. Surv. 22(4), 299–319 (1990)
Ongaro, D., Ousterhout, J.K.: In search of an understandable consensus algorithm. In: USENIX Annual Technical Conference, pp. 305–319 (2014)
Lichman, M.: UCI machine learning repository (2013)
Castro, M., Liskov, B.: Practical byzantine fault tolerance and proactive recovery. ACM Trans. Comput. Syst. 20(4), 398–461 (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Amir-Khalili, A., Kianzad, S., Abugharbieh, R., Beschastnikh, I. (2017). Scalable and Fault Tolerant Platform for Distributed Learning on Private Medical Data. In: Wang, Q., Shi, Y., Suk, HI., Suzuki, K. (eds) Machine Learning in Medical Imaging. MLMI 2017. Lecture Notes in Computer Science(), vol 10541. Springer, Cham. https://doi.org/10.1007/978-3-319-67389-9_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-67389-9_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67388-2
Online ISBN: 978-3-319-67389-9
eBook Packages: Computer ScienceComputer Science (R0)