research-article

Data Sharing via Differentially Private Coupled Matrix Factorization

Authors:
Beyza Ermiş

Amazon Research, Berlin, Germany

Amazon Research, Berlin, Germany
View Profile

,
A. Taylan Cemgİl

Boğaziçi University, Istanbul, Turkey

Boğaziçi University, Istanbul, Turkey
View Profile

ACM Transactions on Knowledge Discovery from Data Volume 14 Issue 3Article No.: 28pp 1–27https://doi.org/10.1145/3372408

Published:13 May 2020Publication History

ACM Transactions on Knowledge Discovery from Data

Abstract

We address the privacy-preserving data-sharing problem in a distributed multiparty setting. In this setting, each data site owns a distinct part of a dataset and the aim is to estimate the parameters of a statistical model conditioned on the complete data without any site revealing any information about the individuals in their own parts. The sites want to maximize the utility of the collective data analysis while providing privacy guarantees for their own portion of the data as well as for each participating individual. Our first contribution is to classify these different privacy requirements as (i) site-level and (ii) user-level differential privacy and present formal privacy guarantees for these two cases under the model of differential privacy. To satisfy a stronger form of differential privacy, we use a variant of differential privacy which is local differential privacy where the sensitive data is perturbed with a randomized response mechanism prior to the estimation. In this study, we assume that the data instances that are partitioned between several parties are arranged as matrices. A natural statistical model for this distributed scenario is coupled matrix factorization. We present two generic frameworks for privatizing Bayesian inference for coupled matrix factorization models that are able to guarantee proposed differential privacy notions based on the privacy requirements of the model. To privatize Bayesian inference, we first exploit the connection between differential privacy and sampling from a Bayesian posterior via stochastic gradient Langevin dynamics and then derive an efficient coupled matrix factorization method. In the local privacy context, we propose two models that have an additional privatization mechanism to achieve a stronger measure of privacy and introduce a Gibbs sampling based algorithm. We demonstrate that the proposed methods are able to provide good prediction accuracy on synthetic and real datasets while adhering to the introduced privacy constraints.

References

Martin Abadi, Andy Chu, Ian Goodfellow, Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep learning with differential privacy. In Proceedings of the 23rd ACM Conference on Computer and Communications Security (ACM CCS’16). 308--318.Google ScholarDigital Library
Orly Alter, Patrick O. Brown, and David Botstein. 2003. Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms. Proceedings of the National Academy of Sciences 100, 6 (2003), 3351--3356.Google ScholarCross Ref
Rina Foygel Barber and John C. Duchi. 2014. Privacy and statistical risk: Formalisms and minimax bounds. arXiv:1412.4451 (2014).Google Scholar
Raef Bassily, Adam Smith, and Abhradeep Thakurta. 2014. Private empirical risk minimization, revisited. arXiv:1405.7085 (2014).Google Scholar
Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ.Google Scholar
Ali Taylan Cemgil. 2009. Bayesian inference for nonnegative matrix factorisation models. Intell. Neuroscience 2009, Article 4 (Jan. 2009), 17 pages. https://doi.org/10.1155/2009/785152Google Scholar
Kamalika Chaudhuri, Claire Monteleoni, and Anand D. Sarwate. 2011. Differentially private empirical risk minimization. The Journal of Machine Learning Research 12 (2011), 1069--1109.Google ScholarDigital Library
Siddhartha Chib and Edward Greenberg. 1995. Understanding the Metropolis-Hastings algorithm. The American Statistician 49, 4 (1995), 327--335.Google Scholar
Christos Dimitrakakis, Blaine Nelson, Aikaterini Mitrokotsa, and Benjamin I. P. Rubinstein. 2014. Robust and private Bayesian inference. In Algorithmic Learning Theory. Springer, 291--305.Google Scholar
John C. Duchi, Michael I. Jordan, and Martin J. Wainwright. 2013. Local privacy, data processing inequalities, and minimax rates. arXiv:1302.3203 (2013).Google Scholar
Cynthia Dwork. 2006. Differential privacy. In Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part II. Springer, 1--12.Google ScholarDigital Library
Cynthia Dwork and Jing Lei. 2009. Differential privacy and robust statistics. In Proceedings of the 41st Annual ACM Symposium on Theory of Computing. ACM, 371--380.Google ScholarDigital Library
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating noise to sensitivity in private data analysis. In Proceedings of the 3rd Conference on Theory of Cryptography (TCC’06). Springer-Verlag, Berlin, Heidelberg, 265--284.Google ScholarDigital Library
Cynthia Dwork and Aaron Roth. 2013. The algorithmic foundations of differential privacy. Theoretical Computer Science 9, 3-4 (2013), 211--407.Google Scholar
Cynthia Dwork and Adam Smith. 2010. Differential privacy for statistics: What we know and what we want to learn. Journal of Privacy and Confidentiality 1, 2 (2010), 2.Google ScholarCross Ref
Cynthia Dwork, Kunal Talwar, Abhradeep Thakurta, and Li Zhang. 2014. Analyze gauss: Optimal bounds for privacy-preserving principal component analysis. In Proceedings of the 46th Annual ACM Symposium on Theory of Computing. ACM, 11--20.Google ScholarDigital Library
Stuart Geman and Donald Geman. 1984. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6 (1984), 721--741.Google ScholarDigital Library
Arpita Ghosh, Tim Roughgarden, and Mukund Sundararajan. 2012. Universally utility-maximizing privacy mechanisms. SIAM Journal on Computing 41, 6 (2012), 1673--1693.Google ScholarCross Ref
Walter R. Gilks, Sylvia Richardson, and David Spiegelhalter. 1995. Markov Chain Monte Carlo in Practice. CRC press.Google Scholar
Prem Gopalan, Jake M. Hofman, and David M. Blei. 2015. Scalable Recommendation with Hierarchical Poisson Factorization (UAI’15). AUAI Press, Arlington, Virginia, USA, 326–335.Google Scholar
Prem Gopalan, Francisco J. Ruiz, Rajesh Ranganath, and David M. Blei. 2014. Bayesian nonparametric poisson factorization for recommendation systems. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 275--283.Google Scholar
Sunil Kumar Gupta, Santu Rana, and Svetha Venkatesh. 2016. Differentially private multi-task learning. In Proceedings of the 11th Pacific Asia Workshop on Intelligence and Security Informatics, Vol. 9650. 101--113.Google ScholarDigital Library
Jihun Hamm, Paul Cao, and Mikhail Belkin. 2016. Learning privately from multiparty data. CoRR abs/1602.03552 (2016). Retrieved from http://arxiv.org/abs/1602.03552.Google Scholar
Mikko Heikkilä, Eemil Lagerspetz, Samuel Kaski, Kana Shimizu, Sasu Tarkoma, and Antti Honkela. 2017. Differentially private Bayesian learning on distributed data. In Proceedings of the Advances in Neural Information Processing Systems. 3226--3235.Google Scholar
Matthew D. Hoffman, David M. Blei, Chong Wang, and John William Paisley. 2013. Stochastic variational inference. Journal of Machine Learning Research 14, 1 (2013), 1303--1347. Retrieved from http://dl.acm.org/citation.cfm?id=2502622.Google ScholarDigital Library
Naoise Holohan, Douglas J. Leith, and Oliver Mason. 2017. Extreme points of the local differential privacy polytope. Linear Algebra and its Applications 534 (2017), 78--96. http://mural.maynoothuniversity.ie/11658/.Google Scholar
Jingyu Hua, Chang Xia, and Sheng Zhong. 2015. Differentially private matrix factorization. In Proceedings of the International Joint Conferences on Artificial Intelligence. 1763--1770.Google Scholar
Hafiz Imtiaz and Anand D. Sarwate. 2018. Distributed differentially-private algorithms for matrix and tensor factorization. IEEE Journal of Selected Topics in Signal Processing 12, 6 (December 2018), 1449--1464. https://doi.org/10.1109/JSTSP.2018.2877842Google ScholarCross Ref
Prateek Jain, Om Thakkar, and Abhradeep Thakurta. 2017. Differentially private matrix completion, revisited. arXiv preprint arXiv:1712.09765 (2017).Google Scholar
Joonas Jälkö, Onur Dikmen, and Antti Honkela. 2016. Differentially private variational inference for non-conjugate models. arXiv preprint arXiv:1610.08749 (2016).Google Scholar
Jia-Yun Jiang, Cheng-Te Li, and Shou-De Lin. 2017. Towards a more reliable privacy-preserving recommender system. arXiv preprint arXiv:1711.07638 (2017).Google Scholar
Meng Jiang, Peng Cui, Rui Liu, Qiang Yang, Fei Wang, Wenwu Zhu, and Shiqiang Yang. 2012. Social contextual recommendation. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM, 45--54.Google ScholarDigital Library
Peter Kairouz, Sewoong Oh, and Pramod Viswanath. 2014. Extremal mechanisms for local differential privacy. In Proceedings of the Advances in Neural Information Processing Systems. 2879--2887.Google Scholar
John Kent. 1978. Time-reversible diffusions. Advances in Applied Probability 10, 4 (1978), 819--835.Google ScholarCross Ref
Bai Li, Changyou Chen, Hao Liu, and Lawrence Carin. 2019. On connecting stochastic gradient MCMC and differential privacy. 89 (Apr. 2019), 557--566.Google Scholar
Ziqi Liu, Yu-Xiang Wang, and Alexander J. Smola. 2015. Fast differentially private matrix factorization. In Proceedings of the 9th ACM Conference on Recommender Systems. 171--178.Google Scholar
Bo Long, Zhongfei Mark Zhang, Xiaoyun Wu, and Philip S. Yu. 2006. Spectral clustering for multi-type relational data. In Proceedings of the 23rd International Conference on Machine Learning. ACM, 585--592.Google Scholar
Frank McSherry and Ilya Mironov. 2009. Differentially private recommender systems: Building privacy into the net. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 627--636.Google ScholarDigital Library
Darakhshan J. Mir. 2013. Information-theoretic foundations of differential privacy. In Proceedings of the International Symposium on Foundations and Practice of Security. Springer, 374--381.Google ScholarDigital Library
Radford M. Neal et al. 2011. MCMC using Hamiltonian dynamics. Handbook of Markov Chain Monte Carlo 2, 11 (2011), 2.Google Scholar
Valeria Nikolaenko, Stratis Ioannidis, Udi Weinsberg, Marc Joye, Nina Taft, and Dan Boneh. 2013. Privacy-preserving matrix factorization. In Proceedings of the 2013 ACM SIGSAC Conference on Computer 8 Communications Security. ACM, 801--812.Google ScholarDigital Library
Nicolas Papernot, Martín Abadi, Úlfar Erlingsson, Ian J. Goodfellow, and Kunal Talwar. 2016. Semi-supervised knowledge transfer for deep learning from private training data. CoRR abs/1610.05755 (2016).Google Scholar
Mijung Park, James R. Foulds, Kamalika Chaudhuri, and Max Welling. 2016. Variational Bayes In Private Settings (VIPS). CoRR abs/1611.00340 (2016). Retrieved from http://arxiv.org/abs/1611.00340.Google Scholar
Manas Pathak, Shantanu Rane, and Bhiksha Raj. 2010. Multiparty differential privacy via aggregation of locally trained classifiers. In Proceedings of the Advances in Neural Information Processing Systems. 1876--1884.Google Scholar
Arun Rajkumar and Shivani Agarwal. 2012. A differentially private stochastic gradient descent algorithm for multiparty classification. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 933--941.Google Scholar
Christian P. Robert and George Casella. 2005. Monte Carlo Statistical Methods. Springer-Verlag New York, Inc., Secaucus, NJ.Google Scholar
Ruslan Salakhutdinov and Andriy Mnih. 2008. Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In Proceedings of the 25th International Conference on Machine Learning. ACM, 880--887.Google ScholarDigital Library
A. D. Sarwate, S. M. Plis, J. A. Turner, M. R. Arbabshirani, and V. D. Calhoun. 2014. Sharing privacy-sensitive access to neuroimaging and genetics data: A review and preliminary validation. Frontiers in Neuroinformatics 8 (2014), 35. https://doi.org/10.3389/fninf.2014.00035Google ScholarCross Ref
Anand D. Sarwate and Kamalika Chaudhuri. 2013. Signal processing and machine learning with differential privacy: Algorithms and challenges for continuous data. Signal Processing Magazine, IEEE 30, 5 (2013), 86--94.Google ScholarCross Ref
Aaron Schein, Zhiwei Steven Wu, Mingyuan Zhou, and Hanna Wallach. 2019. Locally private Bayesian inference for count models. 97 (Jun. 2019), 5638--5648.Google Scholar
Reza Shokri and Vitaly Shmatikov. 2015. Privacy-preserving deep learning. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, CO, October 12-6, 2015. 1310--1321.Google ScholarDigital Library
Umut Simsekli, Ali Taylan Cemgil, and Beyza Ermis. 2015. Learning mixed divergences in coupled matrix and tensor factorization models.. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 2120--2124.Google ScholarCross Ref
Ajit P. Singh and Geoffrey J. Gordon. 2008. Relational learning via collective matrix factorization. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 650--658.Google Scholar
John G. Skellam. 1946. The frequency distribution of the difference between two Poisson variates belonging to different populations. Journal of the Royal Statistical Society. Series A (General) 109, Pt 3 (1946), 296--296.Google Scholar
Shuang Song, Kamalika Chaudhuri, and Anand D. Sarwate. 2013. Stochastic gradient descent with differentially private updates. In Proceedings of the IEEE Global Conference on Signal and Information Processing.Google Scholar
Shuang Song, Kamalika Chaudhuri, and Anand D. Sarwate. 2014. Learning from data with heterogeneous noise using SGD. arXiv:1412.5617 (2014).Google Scholar
Ambuj Tewari and Sougata Chaudhuri. 2014. On Lipschitz continuity and smoothness of loss functions in learning to rank. arXiv preprint arXiv:1405.0586 (2014).Google Scholar
Chain Monte Carlo. 2004. Markov chain Monte Carlo and Gibbs sampling. Lecture Notes for EEB 581 (2004).Google Scholar
Jun Wang and Qiang Tang. 2017. Differentially private neighborhood-based recommender systems. In Proceedings of the International Conference on ICT Systems Security and Privacy Protection. Springer, 459--473.Google ScholarCross Ref
Yu-Xiang Wang, Stephen E. Fienberg, and Alexander J. Smola. 2015. Privacy for free: Posterior sampling and stochastic gradient Monte Carlo. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6-11 July 2015. 2493--2502.Google Scholar
Yu-Xiang Wang. 2018. Revisiting differentially private linear regression: Optimal and adaptive prediction 8 estimation in unbounded domain. (2018).Google Scholar
Stanley L. Warner. 1965. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association 60, 309 (1965), 63--69.Google ScholarCross Ref
Max Welling and Yee W. Teh. 2011. Bayesian learning via stochastic gradient Langevin dynamics. In Proceedings of the 28th International Conference on Machine Learning. 681--688.Google ScholarDigital Library
Oliver Williams and Frank McSherry. 2010. Probabilistic inference and differential privacy. In Proceedings of the Advances in Neural Information Processing Systems. 2451--2459.Google Scholar
Yonghui Xiao and Li Xiong. 2012. Bayesian inference under differential privacy. arXiv:1203.0617 (2012).Google Scholar
Liyang Xie, Inci M. Baytas, Kaixiang Lin, and Jiayu Zhou. 2017. Privacy-preserving distributed multi-task learning with asynchronous updates. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’17). 1195--1204.Google ScholarDigital Library
Yu Xin and Tommi Jaakkola. 2014. Controlling privacy in recommender systems. In Proceedings of the Advances in Neural Information Processing Systems. 2618--2626.Google Scholar
Bin Yang, Issei Sato, and Hiroshi Nakagawa. 2015. Bayesian differential privacy on correlated data. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, New York, NY, 747--762.Google ScholarDigital Library
Shuang-Hong Yang, Bo Long, Alex Smola, Narayanan Sadagopan, Zhaohui Zheng, and Hongyuan Zha. 2011. Like like alike: Joint friendship and interest propagation in social networks. In Proceedings of the 20th International Conference on World Wide Web. ACM, 537--546.Google ScholarDigital Library
Kenan Y. Yılmaz, Ali T. Cemgil, and Umut Simsekli. 2011. Generalised coupled tensor factorisation. In Proceedings of the Advances in Neural Information Processing Systems. 2151--2159.Google Scholar
Jiho Yoo and Seungjin Choi. 2012. Hierarchical variational Bayesian matrix co-factorization. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 1901--1904.Google ScholarCross Ref
Jiho Yoo, Minje Kim, Kyeongok Kang, and Seungjin Choi. 2010. Nonnegative matrix partial co-factorization for drum source separation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 1942--1945.Google ScholarCross Ref
Shun Zhang, Laixiang Liu, Zhili Chen, and Hong Zhong. 2018. Probabilistic matrix factorization with personalized differential privacy. Knowledge-Based Systems 183 (2019), 104864.Google Scholar

Index Terms

Data Sharing via Differentially Private Coupled Matrix Factorization
1. Security and privacy
  1. Human and societal aspects of security and privacy
    1. Privacy protections

Recommendations

Applying Differential Privacy to Matrix Factorization
RecSys '15: Proceedings of the 9th ACM Conference on Recommender Systems

Recommender systems are increasingly becoming an integral part of on-line services. As the recommendations rely on personal user information, there is an inherent loss of privacy resulting from the use of such systems. While several works studied ...
Read More
A differentially private algorithm for location data release

The rise of mobile technologies in recent years has led to large volumes of location information, which are valuable resources for knowledge discovery such as travel patterns mining and traffic analysis. However, location dataset has been confronted ...
Read More
Differentially private data publishing via optimal univariate microaggregation and record perturbation
Abstract
We present an approach to generate differentially private data sets that consists in adding noise to a microaggregated version of the original data set. While this idea has already been pursued in the literature to reduce the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Knowledge Discovery from Data Volume 14, Issue 3
June 2020
381 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3388473
Editors:
Charu Aggarwal
IBM T. J. Watson Research, USA
,
Xindong Wu
Minginglamp Academy of Sciences, China
Issue’s Table of Contents
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 May 2020
- Online AM: 7 May 2020
- Accepted: 1 November 2019
- Revised: 1 September 2019
- Received: 1 March 2019
Published in tkdd Volume 14, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Differential privacy
Markov Chain Monte Carlo (MCMC)
collective matrix factorization
distributed data
local differential privacy
stochastic gradient Langevin dynamics (SGLD)
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 279
  Total Downloads
- Downloads (Last 12 months)30
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Data Sharing via Differentially Private Coupled Matrix Factorization

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

Applying Differential Privacy to Matrix Factorization

A differentially private algorithm for location data release

Differentially private data publishing via optimal univariate microaggregation and record perturbation