Abstract
Privacy arises to a major issue in distributed learning. Current approaches that do not use a trusted external authority either reduce the accuracy of the learning algorithm (e.g., by adding noise), or incur a high performance penalty. We propose a methodology for private distributed ML from light-weight cryptography (in short, PD-ML-Lite). We apply our methodology to two major ML algorithms, namely non-negative matrix factorization (NMF) and singular value decomposition (SVD). Our protocols are communication optimal, achieve the same accuracy as their non-private counterparts, and satisfy a notion of privacy—which we define—that is both intuitive and measurable. We use light cryptographic tools (multi-party secure sum and normed secure sum) to build learning algorithms rather than wrap complex learning algorithms in a heavy multi-party computation (MPC) framework.
We showcase our algorithms’ utility and privacy for NMF on topic modeling and recommender systems, and for SVD on principal component regression, and low rank approximation.
The full version of this work is available at [TMIZ19].
M. Tsikhanovich and M. Ishaq—Work done in part while the author was at Rensselaer Polytechnic Institute.
V. Zikas—This work was done in part while the author was at Rensselaer Polytechnic Institute and UCLA and supported in part by DARPA and SPAWAR under contract N66001-15-C-4065 and by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via 2019-1902070008. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of ODNI, IARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation therein.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Centralized refers to the optimal (non-private) outcome where all data is aggregated for learning.
- 2.
Recall that this requirement renders DP unacceptable.
- 3.
- 4.
Theoretical results for DP (e.g. [BBFM12]) only apply to simple mechanisms. Composition of these simple mechanisms needs to be examined case-by-case (e.g., in one-party Differentially Private NMF, [LWS15] incurr a 19% loss in learning quality when strict DP is satisfied even for \(\epsilon =0.25\)). In the M-party setting due to a possible difference attack at successive iterations, each party must add noise to all observables they emit in every iteration [RA12, HXZ15]. The empirical impact is a disaster.
- 5.
One can adapt PD-NMF to accomodate non-observed entries.
References
20 news groups dataset
Balcan, M.F., Blum, A., Fine, S., Mansour, Y.: Distributed learning, communication complexity and privacy. In: Conference on Learning Theory, p. 26-1 (2012)
Blum, A., Dwork, C., McSherry, F., Nissim, K.: Practical privacy: the SuLQ framework. In: Proceedings of the Twenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 128–138. ACM (2005)
Bendlin, R., Damgard, I., Orlandi, C., Zakarias, S.: Semi-homomorphic encryption and multiparty computation. Cryptology ePrint Archive, Report 2010/514 (2010)
Bassily, R., Freund, Y.: Typical stability. arXiv preprint arXiv:1604.03336 (2016)
Boutsidis, C., Gallopoulos, E.: SVD based initialization: a head start for nonnegative matrix factorization. Pattern Recogn. 41, 1350–1362 (2008)
Bassily, R., Groce, A., Katz, J., Smith, A.: Coupled-worlds privacy: exploiting adversarial uncertainty in statistical data privacy. In: 2013 IEEE 54th Annual Symposium on Foundations of Computer Science (FOCS), pp. 439–448. IEEE (2013)
Bertin-Mahieux, T., Ellis, D.P.W., Whitman, B., Lamere, P.: The million song dataset. In: ISMIR (2011)
Berry, M., Mezher, D., Philippe, B., Sameh, A.: Parallel computation of the singular value decomposition. Ph.D. thesis, INRIA (2003)
Brickell, J., Shmatikov, V.: The cost of privacy: destruction of data-mining utility in anonymized data publishing. In: SIGKDD (2008)
Chen, S., Lu, R., Zhang, J.: A flexible privacy-preserving framework for singular value decomposition under internet of things environment. CoRR, abs/1703.06659 (2017)
Condat, L.: Fast projection onto the simplex and the \(\ell _1\) ball. Math. Program. 158, 575–585 (2016)
Cichocki, A., Zdunek, R., Amari, S.: Hierarchical ALS algorithms for nonnegative matrix and 3D tensor factorization. In: Davies, M.E., James, C.J., Abdallah, S.A., Plumbley, M.D. (eds.) ICA 2007. LNCS, vol. 4666, pp. 169–176. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74494-8_22
Du, S.S., Liu, Y., Chen, B., Li, L.: Maxios: large scale nonnegative matrix factorization for collaborative filtering. In: NIPS 2014 Workshop on Distributed Matrix Computations (2014)
Damgard, I., Pastro, V., Smart, N.P., Zakarias, S.: Multiparty computation from somewhat homomorphic encryption. Cryptology ePrint Archive, Report 2011/535 (2011)
Dwork, C., Roth, A., et al.: The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9, 211–407 (2014)
Demmler, D., Schneider, T., Zohner, M.: ABY - a framework for efficient mixed-protocol secure two-party computation. In: 22nd Annual Network and Distributed System Security Symposium, NDSS 2015, San Diego, California, USA, 8–11 February 2015 (2015)
Bag of words datasets
Fernandes, K., Vinagre, P., Cortez, P.: A proactive intelligent decision support system for predicting the popularity of online news. In: Pereira, F., Machado, P., Costa, E., Cardoso, A. (eds.) EPIA 2015. LNCS (LNAI), vol. 9273, pp. 535–546. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23485-4_53
Gillis, N.: Successive nonnegative projection algorithm for robust nonnegative blind source separation. SIIMS (2014)
Gemulla, R., Nijkamp, E., Haas, P.J., Sismanis, Y.: Large-scale matrix factorization with distributed stochastic gradient descent. In: KDD (2011)
Groce, A.D.: New notions and mechanisms for statistical privacy. Ph.D. thesis, University of Maryland (2014)
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. U.S.A 101(Suppl. 1), 5228–5235 (2004)
Golub, G.H., Van Loan, C.F.: Matrix Computations, vol. 3. JHU Press (2012)
Han, S., Ng, W.K., Yu, P.S.: Privacy-preserving singular value decomposition. In: Proceedings of the ICDE, March 2009
Ho, N.-D.: Nonnegative matrix factorization algorithms and applications. Ph.D. thesis, École Polytechnique (2008)
Hardt, M., Roth, A.: Beyond worst-case analysis in private singular vector computation. In: Proceedings of the STOC. ACM (2013)
Hua, J., Xia, C., Zhong, S.: Differentially private matrix factorization. In: IJCAI, pp. 1763–1770 (2015)
Iwen, M.A., Ong, B.W.: A distributed and incremental SVD algorithm for agglomerative data analysis on large networks. SIAM J. Matrix Anal. Appl. 37(4), 1699–1718 (2016)
Jolliffe, I.T.: A note on the use of principal components in regression. Appl. Stat. 31, 300–303 (1982)
Kairouz, P.: The fundamental limits of statistical data privacy. Ph.D. thesis, University of Illinois at Urbana-Champaign (2016)
Kim, S., Kim, J., Koo, D., Kim, Y., Yoon, H., Shin, J.: Efficient privacy-preserving matrix factorization via fully homomorphic encryption: extended abstract. In: AsiaCCS (2016)
Keller, M., Pastro, V., Rotaru, D.: Overdrive: making SPDZ great again. Cryptology ePrint Archive, Report 2017/1230 (2017)
Kumar, A., Sindhwani, V., Kambadur, P.: Fast conical hull algorithms for near-separable non-negative matrix factorization. arXiv preprint arXiv:1210.1190 (2012)
Lin, C.-J.: Projected gradient methods for nonnegative matrix factorization. Neural Comput. 19, 2756–2779 (2007)
Lee, D.D., Sebastian Seung, H.: Algorithms for non-negative matrix factorization. In: NIPS (2001)
Limbeck, P., Suntinger, M., Schiefer, J.: SARI OpenRec - empowering recommendation systems with business events. In: DBKDA (2010)
Liu, Z., Wang, Y.-X., Smola, A.J.: Fast differentially private matrix factorization. CoRR (2015)
Markovsky, I.: Low Rank Approximation: Algorithms, Implementation, Applications. Springer, Heidelberg (2011)
Malin, B.A., El Emam, K., O’keefe, C.M.: Biomedical data privacy: problems, perspectives, and recent advances. JAMIA (2013)
Mazloom, S., Dov Gordon, S.: Secure computation with differentially private access patterns. In: Lie, D., Mannan, M., Backes, M., Wang, X. (eds.) ACM SIGSAC Conference on Computer and Communications Security, CCS 2018, pp. 490–507. ACM (2018)
Mazloom, S., Dov Gordon, S.: Secure computation with differentially private access patterns. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, pp. 490–507 (2018)
Movielens 1m dataset
Mohassel, P., Rindal, P.: \({\rm Aby}^3\): a mixed protocol framework for machine learning. In: Lie, D., Mannan, M., Backes, M., Wang, X. (eds.) ACM SIGSAC Conference on Computer and Communications Security, CCS 2018, pp. 35–52. ACM (2018)
Mohassel, P., Zhang, Y.: SecureML: a system for scalable privacy-preserving machine learning. In: IEEE Symposium on Security and Privacy, SP 2017, pp. 19–38 (2017)
Nikolaenko, V., Ioannidis, S., Weinsberg, U., Joye, M., Taft, N., Boneh, D.: Privacy-preserving matrix factorization. In: SIGSAC (2013)
Nielsen, J.B., Nordholt, P.S., Orlandi, C., Burra, S.S.: A new approach to practical active-secure two-party computation. Cryptology ePrint Archive, Report 2011/091 (2011)
Parlett, B.: The symmetric Eigenvalue Problem. SIAM (1998)
Rajkumar, A., Agarwal, S.: A differentially private stochastic gradient descent algorithm for multiparty classification. In: Artificial Intelligence and Statistics, pp. 933–941 (2012)
Röder, M., Both, A., Hinneburg, A.: Exploring the space of topic coherence measures. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 399–408. ACM (2015)
Tsikhanovich, M., Magdon-Ismail, M., Ishaq, M., Zikas, V.: PD-ML-Lite: private distributed machine learning from lighweight cryptography. CoRR, abs/1901.07986 (2019)
Tang, C., Xu, Z., Dwarkadas, S.: Peer-to-peer information retrieval using self-organizing semantic overlay networks. In: SIGCOMM (2003)
Vavasis, S.A.: On the complexity of nonnegative matrix factorization. SIAM J. Optim. 20(3), 1364–1377 (2009)
Wellek, S.: Testing Statistical Hypotheses of Equivalence and Noninferiority, 2nd edn. CRC Press, Boca Raton (2010)
Wang, Y.-X., Fienberg, S., Smola, A.: Privacy for free: posterior sampling and stochastic gradient Monte Carlo. In: Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), pp. 2493–2502 (2015)
Won, H.-S., Kim, S.-P., Lee, S., Choi, M.-J., Moon, Y.-S.: Secure principal component analysis in multiple distributed nodes. Secur. Commun. Netw. 9(14), 2348–2358 (2016)
Yi, X., Allan, J.: A comparative study of utilizing topic models for information retrieval. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 29–41. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00958-7_6
Youdao, N.: P4P: practical large-scale privacy-preserving distributed computation robust against malicious users. In: Proceedings of the USENEX (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Tsikhanovich, M., Magdon-Ismail, M., Ishaq, M., Zikas, V. (2019). PD-ML-Lite: Private Distributed Machine Learning from Lightweight Cryptography. In: Lin, Z., Papamanthou, C., Polychronakis, M. (eds) Information Security. ISC 2019. Lecture Notes in Computer Science(), vol 11723. Springer, Cham. https://doi.org/10.1007/978-3-030-30215-3_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-30215-3_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30214-6
Online ISBN: 978-3-030-30215-3
eBook Packages: Computer ScienceComputer Science (R0)