Abstract
A crucial step to assure drug safety is predicting off-target binding. For oligonucleotide drugs this requires learning the relevant thermodynamics from often large-scale data distributed across different organisations. This process will respect data privacy if distributed and private learning under limited and private communication between local nodes is used. We propose an ADMM-based SVM with differential privacy for this purpose. We empirically show that this approach achieves accuracy comparable to the non-private one, i.e. \({\sim }86\%\), while yielding tight empirical privacy guarantees even after convergence.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Here, M is the number of nodes across which the data is distributed and \(M \le N\).
- 2.
E.g. the 3-gram “GCG” has larger weight due to higher binding affinity than “ATA”.
- 3.
Increase in privacy level \(\epsilon \) indicates decrease in differential privacy.
References
Abadi, M., et al.: Deep learning with differential privacy. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308–318 (2016)
Bennett, C.F.: Therapeutic antisense oligonucleotides are coming of age. Annu. Rev. Med. 70, 307–321 (2019)
Blaschke, T., et al.: Reinvent 2.0: an AI tool for de novo drug design. J. Chem. Inf. Mod. 60(12), 5918–5922 (2020)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3(1), 1–122 (2011)
Canonne, C.: What is \(\delta \), and what \(\delta \) difference does it make? DifferentialPrivacy.org, March 2021. https://differentialprivacy.org/flavoursofdelta/
Chen, H., Engkvist, O., Wang, Y., Olivecrona, M., Blaschke, T.: The rise of deep learning in drug discovery. Drug Discov. Today 23(6), 1241–1250 (2018)
Collobert, R., Bengio, S.: Svmtorch: support vector machines for large-scale regression problems. J. Mach. Learn. Res. 1, 143–160 (2001)
Dandekar, A., Basu, D., Bressan, S.: Differential privacy at risk: bridging randomness and privacy budget. In: Proceedings on Privacy Enhancing Technologies, vol. 1, pp. 64–84 (2021)
Ding, J., Wang, J., Liang, G., Bi, J., Pan, M.: Towards plausible differentially private ADMM based distributed machine learning. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 285–294 (2020)
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14
Flake, G.W., Lawrence, S.: Efficient SVM regression training with SMO. Mach. Learn. 46(1), 271–290 (2002)
Forero, P.A., Cano, A., Giannakis, G.B.: Consensus-based distributed support vector machines. J. Mach. Learn. Res. 11, 1663–1707 (2010)
França, G., Bento, J.: How is distributed ADMM affected by network topology? ArXiv e-prints, October 2017
Harvard: Differential privacy (2021). https://privacytools.seas.harvard.edu/differential-privacy
Johansson, S., et al.: AI-assisted synthesis prediction. Drug Discov. Today Technol. 32–33, 65–72 (2020)
Johansson, S.V., et al.: Using active learning to develop machine learning models for reaction yield prediction. ChemRxiv (2021). https://doi.org/10.33774/chemrxiv-2021-bpv0c. Under review
Kairouz, P., Oh, S., Viswanath, P.: The composition theorem for differential privacy. In: International Conference on Machine Learning, pp. 1376–1385. PMLR (2015)
Leslie, C., Eskin, E., Noble, W.S.: The spectrum kernel: a string kernel for SVM protein classification. In: Biocomputing 2002, pp. 564–575. World Scientific (2001)
Lorenz, R., et al.: ViennaRNA package 2.0. Algorithms Mol. Biol. 6(1), 1–14 (2011)
Martin, E.J., Zhu, X.W.: Collaborative profile-QSAR: a natural platform for building collaborative models among competing companies. J. Chem. Inf. Mod. 61(4), 1603–1616 (2021)
NSC: Tetralith (2021). https://www.nsc.liu.se/systems/tetralith/, https://www.nsc.liu.se/systems/tetralith/
Papargyri, N., Pontoppidan, M., Andersen, M.R., Koch, T., Hagedorn, P.H.: Chemical diversity of locked nucleic acid-modified antisense oligonucleotides allows optimization of pharmaceutical properties. Mol. Ther. Nucleic Acids 19, 706–717 (2020)
Pinot, R., Yger, F., Gouy-Pailler, C., Atif, J.: A unified view on differential privacy and robustness to adversarial examples (2019)
Platt, J.: Sequential minimal optimization: A fast algorithm for training support vector machines. Technical Report MSR-TR-98-14, Microsoft Research, April 1998
Raisaro, J.L., et al.: Protecting privacy and security of genomic data in i2b2 with homomorphic encryption and differential privacy. IEEE/ACM Trans. Comput. Biol. Bioinform. 15(5), 1413–1426 (2018)
Shevade, S.K., Keerthi, S.S., Bhattacharyya, C., Murthy, K.R.K.: Improvements to the SMO algorithm for SVM regression. IEEE Trans. Neural Netw. 11(5), 1188–1193 (2000)
Soman, K., Loganathan, R., Ajay, V.: Machine learning with SVM and other kernel methods. PHI Learning Pvt. Ltd. (2009)
Sun, Z., Wang, Y., Shu, M., Liu, R., Zhao, H.: Differential privacy for data and model publishing of medical data. IEEE Access 7, 152103–152114 (2019)
Tavara, S.: Parallel computing of support vector machines: a survey. ACM Comput. Surv. (CSUR) 51(6), 1–38 (2019)
Tavara, S., Schliep, A.: Effect of network topology on the performance of ADMM-based SVMs. In: 2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp. 388–393. IEEE (2018)
Tavara, S., Schliep, A.: Effects of network topology on the performance of consensus and distributed learning of SVMs using ADMM. PeerJ Comput. Sci. 7, e397 (2021)
Tavara, S., Sundell, H., Dahlbom, A.: Empirical study of time efficiency and accuracy of support vector machines using an improved version of PSVM. In: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), p. 177. The Steering Committee of The World Congress in Computer Science, Computer (2015)
Watt, A.T., Swayze, G., Swayze, E.E., Freier, S.M.: Likelihood of nonspecific activity of gapmer antisense oligonucleotides is associated with relative hybridization free energy. Nucleic Acid Ther. 30(4), 215–228 (2020)
Wei, J., Lin, Y., Yao, X., Zhang, J., Liu, X.: Differential privacy-based genetic matching in personalized medicine. IEEE Trans. Emerg. Top. Comput. (2020)
Yu, D., Zhang, H., Chen, W., Liu, T.Y., Yin, J.: Gradient perturbation is underrated for differentially private convex optimization. arXiv preprint arXiv:1911.11363 (2019)
Zhang, R., Ma, J.: An improved SVM method P-SVM for classification of remotely sensed data. Int. J. Remote Sens. 29(20), 6029–6036 (2008)
Zhang, X., Khalili, M.M., Liu, M.: Improving the privacy and accuracy of ADMM-based distributed algorithms. In: International Conference on Machine Learning, pp. 5796–5805. PMLR (2018)
Zuker, M., Mathews, D.H., Turner, D.H.: Algorithms and thermodynamics for RNA secondary structure prediction: a practical guide. In: Barciszewski, J., Clark, B.F.C. (eds.) RNA Biochemistry and Biotechnology. NATO Science Series (Series 3: High Technology), vol. 70, pp. 11–43. Springer, Dordrecht (1999). https://doi.org/10.1007/978-94-011-4485-8_2
Acknowledgments
SSF Strategic Mobility Grant “Drug Discovery for Antisense Oligos” (A.S.), Swedish National Supercomputer Centre (A.S. & S.T.).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Tavara, S., Schliep, A., Basu, D. (2021). Federated Learning of Oligonucleotide Drug Molecule Thermodynamics with Differentially Private ADMM-Based SVM. In: Kamp, M., et al. Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2021. Communications in Computer and Information Science, vol 1525. Springer, Cham. https://doi.org/10.1007/978-3-030-93733-1_34
Download citation
DOI: https://doi.org/10.1007/978-3-030-93733-1_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93732-4
Online ISBN: 978-3-030-93733-1
eBook Packages: Computer ScienceComputer Science (R0)