Abstract
Research in medicine has to deal with the growing amount of data about patients which are made available by modern technologies. All these data might be used to support statistical studies, and for identifying causal relations. To use these data, which are spread across hospitals, efficient merging techniques as well as policies to deal with this sensitive information are strongly needed. In this paper we introduce and empirically test a distributed learning approach, to train Support Vector Machines (SVM), that allows to overcome problems related to privacy and data being spread around. The introduced technique allows to train algorithms without sharing any patients-related information, ensuring privacy and avoids the development of merging tools. We tested this approach on a large dataset and we described results, in terms of convergence and performance; we also provide considerations about the features of an IT architecture designed to support distributed learning computations.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Caragea, D.: Learning Classifiers from Distributed, Semantically Heterogeneous, Autonomous Data Sources. Ph.D. thesis (2004)
Dantzig, G.B., Wolfe, P.: Decomposition principle for linear programs. Operations Research 8, 101–111 (1960)
Caragea, D., Reinoso, J., Silvescu, A., Honavar, V.: Statistics gathering for learning from distributed, heterogeneous and autonomous data sources (2003)
Grant, M. C., Boyd, S. P.: Graph implementations for nonsmooth convex programs. In: Blondel, V.D., Boyd, S.P., Kimura, H. (eds.) Recent Advances in Learning and Control. LNCIS, vol. 371, pp. 95–110. Springer, Heidelberg (2008)
Grant, M., Boyd, S.: Cvx: Matlab software for disciplined convex programming, version 2.1 (March 2014)
Gummadi, S., Housri, N., Zimmers, T.A., Koniaris, L.G.: medical record: a balancing act of patient safety, privacy and health care delivery. The American Journal of the Medical Sciences 348(3), 238–243 (2014)
Vaidya, J., Yu, H., Jiang, X.: Privacy-preserving SVM classification. Knowledge and Information Systems 14(2), 161–178 (2008)
Lindell, Y., Pinkas, B.: Privacy preserving data mining. In: Bellare, M. (ed.) CRYPTO 2000. LNCS, vol. 1880, pp. 36–54. Springer, Heidelberg (2000)
Nguyen, M.H., de la Torre, F.: Optimal feature selection for support vector machines. Smart Computing Review 3(43), 584–591 (2010)
Mullen, K.M.: Continuous global optimization in r. Journal of Statistical Software 60(6) (2014)
Nash, J.C., Varadhan, R.: Unifying optimization algorithms to aid software system users: optimx for r. Journal of Statistical Software 43(9), 1–14 (2011)
Parikh, N., Boyd, S.: Proximal algorithms. Foundations and Trends in Optimization 1(3), 123–231 (2014)
Que, J., Jiang, X., Ohno-Machado, L.: A collaborative framework for distributed privacy-preserving support vector machine learning. In: AMIA Annual Symposium Proceedings, vol. 2012, pp. 1350–1359 (2012)
Koenker, R., Mizera, I.: Convex optimization in r. Journal of Statistical Software 60(5) (2014)
Kumar, V., Minz, S.: Feature selection: A literature review. Smart Computing Review 4(3) (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Damiani, A. et al. (2015). Distributed Learning to Protect Privacy in Multi-centric Clinical Studies. In: Holmes, J., Bellazzi, R., Sacchi, L., Peek, N. (eds) Artificial Intelligence in Medicine. AIME 2015. Lecture Notes in Computer Science(), vol 9105. Springer, Cham. https://doi.org/10.1007/978-3-319-19551-3_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-19551-3_8
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19550-6
Online ISBN: 978-3-319-19551-3
eBook Packages: Computer ScienceComputer Science (R0)