Scalable and Fault Tolerant Platform for Distributed Learning on Private Medical Data

Amir-Khalili, Alborz; Kianzad, Soheil; Abugharbieh, Rafeef; Beschastnikh, Ivan

doi:10.1007/978-3-319-67389-9_21

Alborz Amir-Khalili¹⁷,
Soheil Kianzad¹⁸,
Rafeef Abugharbieh¹⁷ &
…
Ivan Beschastnikh¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10541))

Included in the following conference series:

International Workshop on Machine Learning in Medical Imaging

4227 Accesses
2 Citations

Abstract

Medical image data is naturally distributed among clinical institutions. This partitioning, combined with security and privacy restrictions on medical data, imposes limitations on machine learning algorithms in clinical applications, especially for small and newly established institutions. We present InsuLearn: an intuitive and robust open-source (open-source code available at: https://github.com/DistributedML/InsuLearn) platform designed to facilitate distributed learning (classification and regression) on medical image data, while preserving data security and privacy. InsuLearn is built on ensemble learning, in which statistical models are developed at each institution independently and combined at secure coordinator nodes. InsuLearn protocols are designed such that the liveness of the system is guaranteed as institutions join and leave the network. Coordination is implemented as a cluster of replicated state machines, making it tolerant to individual node failures. We demonstrate that InsuLearn successfully integrates accurate models for horizontally partitioned data while preserving privacy.

This work is supported in part by the Institute for Computing, Information and Cognitive Systems (ICICS) at UBC.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Open-source code available at: https://github.com/DistributedML/InsuLearn.
2.
In fact \(h_i\) does not know the size of H nor the nodes in H.

References

Li, Y., Bai, C., Reddy, C.K.: A distributed ensemble approach for mining healthcare data under privacy constraints. Inf. Sci. 330, 245–259 (2016)
Article Google Scholar
Ohno-Machado, L.: To share or not to share: that is not the question. Sci. Trans. Med. 4(165), 165cm15 (2012)
Article Google Scholar
Fabian, B., Göthling, T.: Privacy-preserving data warehousing. Int. J. Bus. Intell. Data Min. 10(4), 297–336 (2015)
Article Google Scholar
McMahan, H.B., Moore, E., Ramage, D., Hampson, S., Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: Artificial Intelligence and Statistics (2016)
Google Scholar
Hamm, J., Cao, P., Belkin, M.: Learning privately from multiparty data. In: International Conference on Machine Learning, pp. 555–563 (2016)
Google Scholar
Xie, L., Plis, S., Sarwate, A.D.: Data-weighted ensemble learning for privacy-preserving distributed learning. In: ICASSP, pp. 2309–2313. IEEE (2016)
Google Scholar
Wu, Y., Jiang, X., Kim, J., Ohno-Machado, L.: Grid Binary LOgistic REgression (GLORE): building shared models without sharing data. J. Am. Med. Inform. Assoc. 19(5), 758–764 (2012)
Article Google Scholar
Shokri, R., Shmatikov, V.: Privacy-preserving deep learning. In: Computer and Communications Security, pp. 1310–1321. ACM (2015)
Google Scholar
Schneider, F.B.: Implementing fault-tolerant services using the state machine approach: a tutorial. ACM Comput. Surv. 22(4), 299–319 (1990)
Article Google Scholar
Ongaro, D., Ousterhout, J.K.: In search of an understandable consensus algorithm. In: USENIX Annual Technical Conference, pp. 305–319 (2014)
Google Scholar
Lichman, M.: UCI machine learning repository (2013)
Google Scholar
Castro, M., Liskov, B.: Practical byzantine fault tolerance and proactive recovery. ACM Trans. Comput. Syst. 20(4), 398–461 (2002)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Biomedical Signal and Image Computing Lab, University of British Columbia, Vancouver, Canada
Alborz Amir-Khalili & Rafeef Abugharbieh
Computer Science, University of British Columbia, Vancouver, Canada
Soheil Kianzad & Ivan Beschastnikh

Authors

Alborz Amir-Khalili
View author publications
You can also search for this author in PubMed Google Scholar
Soheil Kianzad
View author publications
You can also search for this author in PubMed Google Scholar
Rafeef Abugharbieh
View author publications
You can also search for this author in PubMed Google Scholar
Ivan Beschastnikh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alborz Amir-Khalili .

Editor information

Editors and Affiliations

Shanghai Jiao Tong University, Shanghai, China
Qian Wang
Nanjing University , Nanjing, China
Yinghuan Shi
Korea University , Seoul, Korea (Republic of)
Heung-Il Suk
Illinois Institute of Technology, Chicago, Illinois, USA
Kenji Suzuki

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 137 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Amir-Khalili, A., Kianzad, S., Abugharbieh, R., Beschastnikh, I. (2017). Scalable and Fault Tolerant Platform for Distributed Learning on Private Medical Data. In: Wang, Q., Shi, Y., Suk, HI., Suzuki, K. (eds) Machine Learning in Medical Imaging. MLMI 2017. Lecture Notes in Computer Science(), vol 10541. Springer, Cham. https://doi.org/10.1007/978-3-319-67389-9_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-67389-9_21
Published: 07 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67388-2
Online ISBN: 978-3-319-67389-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics