Abstract
We address the privacy-preserving data-sharing problem in a distributed multiparty setting. In this setting, each data site owns a distinct part of a dataset and the aim is to estimate the parameters of a statistical model conditioned on the complete data without any site revealing any information about the individuals in their own parts. The sites want to maximize the utility of the collective data analysis while providing privacy guarantees for their own portion of the data as well as for each participating individual. Our first contribution is to classify these different privacy requirements as (i) site-level and (ii) user-level differential privacy and present formal privacy guarantees for these two cases under the model of differential privacy. To satisfy a stronger form of differential privacy, we use a variant of differential privacy which is local differential privacy where the sensitive data is perturbed with a randomized response mechanism prior to the estimation. In this study, we assume that the data instances that are partitioned between several parties are arranged as matrices. A natural statistical model for this distributed scenario is coupled matrix factorization. We present two generic frameworks for privatizing Bayesian inference for coupled matrix factorization models that are able to guarantee proposed differential privacy notions based on the privacy requirements of the model. To privatize Bayesian inference, we first exploit the connection between differential privacy and sampling from a Bayesian posterior via stochastic gradient Langevin dynamics and then derive an efficient coupled matrix factorization method. In the local privacy context, we propose two models that have an additional privatization mechanism to achieve a stronger measure of privacy and introduce a Gibbs sampling based algorithm. We demonstrate that the proposed methods are able to provide good prediction accuracy on synthetic and real datasets while adhering to the introduced privacy constraints.
- Martin Abadi, Andy Chu, Ian Goodfellow, Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep learning with differential privacy. In Proceedings of the 23rd ACM Conference on Computer and Communications Security (ACM CCS’16). 308--318.Google ScholarDigital Library
- Orly Alter, Patrick O. Brown, and David Botstein. 2003. Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms. Proceedings of the National Academy of Sciences 100, 6 (2003), 3351--3356.Google ScholarCross Ref
- Rina Foygel Barber and John C. Duchi. 2014. Privacy and statistical risk: Formalisms and minimax bounds. arXiv:1412.4451 (2014).Google Scholar
- Raef Bassily, Adam Smith, and Abhradeep Thakurta. 2014. Private empirical risk minimization, revisited. arXiv:1405.7085 (2014).Google Scholar
- Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ.Google Scholar
- Ali Taylan Cemgil. 2009. Bayesian inference for nonnegative matrix factorisation models. Intell. Neuroscience 2009, Article 4 (Jan. 2009), 17 pages. https://doi.org/10.1155/2009/785152Google Scholar
- Kamalika Chaudhuri, Claire Monteleoni, and Anand D. Sarwate. 2011. Differentially private empirical risk minimization. The Journal of Machine Learning Research 12 (2011), 1069--1109.Google ScholarDigital Library
- Siddhartha Chib and Edward Greenberg. 1995. Understanding the Metropolis-Hastings algorithm. The American Statistician 49, 4 (1995), 327--335.Google Scholar
- Christos Dimitrakakis, Blaine Nelson, Aikaterini Mitrokotsa, and Benjamin I. P. Rubinstein. 2014. Robust and private Bayesian inference. In Algorithmic Learning Theory. Springer, 291--305.Google Scholar
- John C. Duchi, Michael I. Jordan, and Martin J. Wainwright. 2013. Local privacy, data processing inequalities, and minimax rates. arXiv:1302.3203 (2013).Google Scholar
- Cynthia Dwork. 2006. Differential privacy. In Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part II. Springer, 1--12.Google ScholarDigital Library
- Cynthia Dwork and Jing Lei. 2009. Differential privacy and robust statistics. In Proceedings of the 41st Annual ACM Symposium on Theory of Computing. ACM, 371--380.Google ScholarDigital Library
- Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating noise to sensitivity in private data analysis. In Proceedings of the 3rd Conference on Theory of Cryptography (TCC’06). Springer-Verlag, Berlin, Heidelberg, 265--284.Google ScholarDigital Library
- Cynthia Dwork and Aaron Roth. 2013. The algorithmic foundations of differential privacy. Theoretical Computer Science 9, 3-4 (2013), 211--407.Google Scholar
- Cynthia Dwork and Adam Smith. 2010. Differential privacy for statistics: What we know and what we want to learn. Journal of Privacy and Confidentiality 1, 2 (2010), 2.Google ScholarCross Ref
- Cynthia Dwork, Kunal Talwar, Abhradeep Thakurta, and Li Zhang. 2014. Analyze gauss: Optimal bounds for privacy-preserving principal component analysis. In Proceedings of the 46th Annual ACM Symposium on Theory of Computing. ACM, 11--20.Google ScholarDigital Library
- Stuart Geman and Donald Geman. 1984. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6 (1984), 721--741.Google ScholarDigital Library
- Arpita Ghosh, Tim Roughgarden, and Mukund Sundararajan. 2012. Universally utility-maximizing privacy mechanisms. SIAM Journal on Computing 41, 6 (2012), 1673--1693.Google ScholarCross Ref
- Walter R. Gilks, Sylvia Richardson, and David Spiegelhalter. 1995. Markov Chain Monte Carlo in Practice. CRC press.Google Scholar
- Prem Gopalan, Jake M. Hofman, and David M. Blei. 2015. Scalable Recommendation with Hierarchical Poisson Factorization (UAI’15). AUAI Press, Arlington, Virginia, USA, 326–335.Google Scholar
- Prem Gopalan, Francisco J. Ruiz, Rajesh Ranganath, and David M. Blei. 2014. Bayesian nonparametric poisson factorization for recommendation systems. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 275--283.Google Scholar
- Sunil Kumar Gupta, Santu Rana, and Svetha Venkatesh. 2016. Differentially private multi-task learning. In Proceedings of the 11th Pacific Asia Workshop on Intelligence and Security Informatics, Vol. 9650. 101--113.Google ScholarDigital Library
- Jihun Hamm, Paul Cao, and Mikhail Belkin. 2016. Learning privately from multiparty data. CoRR abs/1602.03552 (2016). Retrieved from http://arxiv.org/abs/1602.03552.Google Scholar
- Mikko Heikkilä, Eemil Lagerspetz, Samuel Kaski, Kana Shimizu, Sasu Tarkoma, and Antti Honkela. 2017. Differentially private Bayesian learning on distributed data. In Proceedings of the Advances in Neural Information Processing Systems. 3226--3235.Google Scholar
- Matthew D. Hoffman, David M. Blei, Chong Wang, and John William Paisley. 2013. Stochastic variational inference. Journal of Machine Learning Research 14, 1 (2013), 1303--1347. Retrieved from http://dl.acm.org/citation.cfm?id=2502622.Google ScholarDigital Library
- Naoise Holohan, Douglas J. Leith, and Oliver Mason. 2017. Extreme points of the local differential privacy polytope. Linear Algebra and its Applications 534 (2017), 78--96. http://mural.maynoothuniversity.ie/11658/.Google Scholar
- Jingyu Hua, Chang Xia, and Sheng Zhong. 2015. Differentially private matrix factorization. In Proceedings of the International Joint Conferences on Artificial Intelligence. 1763--1770.Google Scholar
- Hafiz Imtiaz and Anand D. Sarwate. 2018. Distributed differentially-private algorithms for matrix and tensor factorization. IEEE Journal of Selected Topics in Signal Processing 12, 6 (December 2018), 1449--1464. https://doi.org/10.1109/JSTSP.2018.2877842Google ScholarCross Ref
- Prateek Jain, Om Thakkar, and Abhradeep Thakurta. 2017. Differentially private matrix completion, revisited. arXiv preprint arXiv:1712.09765 (2017).Google Scholar
- Joonas Jälkö, Onur Dikmen, and Antti Honkela. 2016. Differentially private variational inference for non-conjugate models. arXiv preprint arXiv:1610.08749 (2016).Google Scholar
- Jia-Yun Jiang, Cheng-Te Li, and Shou-De Lin. 2017. Towards a more reliable privacy-preserving recommender system. arXiv preprint arXiv:1711.07638 (2017).Google Scholar
- Meng Jiang, Peng Cui, Rui Liu, Qiang Yang, Fei Wang, Wenwu Zhu, and Shiqiang Yang. 2012. Social contextual recommendation. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM, 45--54.Google ScholarDigital Library
- Peter Kairouz, Sewoong Oh, and Pramod Viswanath. 2014. Extremal mechanisms for local differential privacy. In Proceedings of the Advances in Neural Information Processing Systems. 2879--2887.Google Scholar
- John Kent. 1978. Time-reversible diffusions. Advances in Applied Probability 10, 4 (1978), 819--835.Google ScholarCross Ref
- Bai Li, Changyou Chen, Hao Liu, and Lawrence Carin. 2019. On connecting stochastic gradient MCMC and differential privacy. 89 (Apr. 2019), 557--566.Google Scholar
- Ziqi Liu, Yu-Xiang Wang, and Alexander J. Smola. 2015. Fast differentially private matrix factorization. In Proceedings of the 9th ACM Conference on Recommender Systems. 171--178.Google Scholar
- Bo Long, Zhongfei Mark Zhang, Xiaoyun Wu, and Philip S. Yu. 2006. Spectral clustering for multi-type relational data. In Proceedings of the 23rd International Conference on Machine Learning. ACM, 585--592.Google Scholar
- Frank McSherry and Ilya Mironov. 2009. Differentially private recommender systems: Building privacy into the net. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 627--636.Google ScholarDigital Library
- Darakhshan J. Mir. 2013. Information-theoretic foundations of differential privacy. In Proceedings of the International Symposium on Foundations and Practice of Security. Springer, 374--381.Google ScholarDigital Library
- Radford M. Neal et al. 2011. MCMC using Hamiltonian dynamics. Handbook of Markov Chain Monte Carlo 2, 11 (2011), 2.Google Scholar
- Valeria Nikolaenko, Stratis Ioannidis, Udi Weinsberg, Marc Joye, Nina Taft, and Dan Boneh. 2013. Privacy-preserving matrix factorization. In Proceedings of the 2013 ACM SIGSAC Conference on Computer 8 Communications Security. ACM, 801--812.Google ScholarDigital Library
- Nicolas Papernot, Martín Abadi, Úlfar Erlingsson, Ian J. Goodfellow, and Kunal Talwar. 2016. Semi-supervised knowledge transfer for deep learning from private training data. CoRR abs/1610.05755 (2016).Google Scholar
- Mijung Park, James R. Foulds, Kamalika Chaudhuri, and Max Welling. 2016. Variational Bayes In Private Settings (VIPS). CoRR abs/1611.00340 (2016). Retrieved from http://arxiv.org/abs/1611.00340.Google Scholar
- Manas Pathak, Shantanu Rane, and Bhiksha Raj. 2010. Multiparty differential privacy via aggregation of locally trained classifiers. In Proceedings of the Advances in Neural Information Processing Systems. 1876--1884.Google Scholar
- Arun Rajkumar and Shivani Agarwal. 2012. A differentially private stochastic gradient descent algorithm for multiparty classification. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 933--941.Google Scholar
- Christian P. Robert and George Casella. 2005. Monte Carlo Statistical Methods. Springer-Verlag New York, Inc., Secaucus, NJ.Google Scholar
- Ruslan Salakhutdinov and Andriy Mnih. 2008. Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In Proceedings of the 25th International Conference on Machine Learning. ACM, 880--887.Google ScholarDigital Library
- A. D. Sarwate, S. M. Plis, J. A. Turner, M. R. Arbabshirani, and V. D. Calhoun. 2014. Sharing privacy-sensitive access to neuroimaging and genetics data: A review and preliminary validation. Frontiers in Neuroinformatics 8 (2014), 35. https://doi.org/10.3389/fninf.2014.00035Google ScholarCross Ref
- Anand D. Sarwate and Kamalika Chaudhuri. 2013. Signal processing and machine learning with differential privacy: Algorithms and challenges for continuous data. Signal Processing Magazine, IEEE 30, 5 (2013), 86--94.Google ScholarCross Ref
- Aaron Schein, Zhiwei Steven Wu, Mingyuan Zhou, and Hanna Wallach. 2019. Locally private Bayesian inference for count models. 97 (Jun. 2019), 5638--5648.Google Scholar
- Reza Shokri and Vitaly Shmatikov. 2015. Privacy-preserving deep learning. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, CO, October 12-6, 2015. 1310--1321.Google ScholarDigital Library
- Umut Simsekli, Ali Taylan Cemgil, and Beyza Ermis. 2015. Learning mixed divergences in coupled matrix and tensor factorization models.. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 2120--2124.Google ScholarCross Ref
- Ajit P. Singh and Geoffrey J. Gordon. 2008. Relational learning via collective matrix factorization. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 650--658.Google Scholar
- John G. Skellam. 1946. The frequency distribution of the difference between two Poisson variates belonging to different populations. Journal of the Royal Statistical Society. Series A (General) 109, Pt 3 (1946), 296--296.Google Scholar
- Shuang Song, Kamalika Chaudhuri, and Anand D. Sarwate. 2013. Stochastic gradient descent with differentially private updates. In Proceedings of the IEEE Global Conference on Signal and Information Processing.Google Scholar
- Shuang Song, Kamalika Chaudhuri, and Anand D. Sarwate. 2014. Learning from data with heterogeneous noise using SGD. arXiv:1412.5617 (2014).Google Scholar
- Ambuj Tewari and Sougata Chaudhuri. 2014. On Lipschitz continuity and smoothness of loss functions in learning to rank. arXiv preprint arXiv:1405.0586 (2014).Google Scholar
- Chain Monte Carlo. 2004. Markov chain Monte Carlo and Gibbs sampling. Lecture Notes for EEB 581 (2004).Google Scholar
- Jun Wang and Qiang Tang. 2017. Differentially private neighborhood-based recommender systems. In Proceedings of the International Conference on ICT Systems Security and Privacy Protection. Springer, 459--473.Google ScholarCross Ref
- Yu-Xiang Wang, Stephen E. Fienberg, and Alexander J. Smola. 2015. Privacy for free: Posterior sampling and stochastic gradient Monte Carlo. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6-11 July 2015. 2493--2502.Google Scholar
- Yu-Xiang Wang. 2018. Revisiting differentially private linear regression: Optimal and adaptive prediction 8 estimation in unbounded domain. (2018).Google Scholar
- Stanley L. Warner. 1965. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association 60, 309 (1965), 63--69.Google ScholarCross Ref
- Max Welling and Yee W. Teh. 2011. Bayesian learning via stochastic gradient Langevin dynamics. In Proceedings of the 28th International Conference on Machine Learning. 681--688.Google ScholarDigital Library
- Oliver Williams and Frank McSherry. 2010. Probabilistic inference and differential privacy. In Proceedings of the Advances in Neural Information Processing Systems. 2451--2459.Google Scholar
- Yonghui Xiao and Li Xiong. 2012. Bayesian inference under differential privacy. arXiv:1203.0617 (2012).Google Scholar
- Liyang Xie, Inci M. Baytas, Kaixiang Lin, and Jiayu Zhou. 2017. Privacy-preserving distributed multi-task learning with asynchronous updates. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’17). 1195--1204.Google ScholarDigital Library
- Yu Xin and Tommi Jaakkola. 2014. Controlling privacy in recommender systems. In Proceedings of the Advances in Neural Information Processing Systems. 2618--2626.Google Scholar
- Bin Yang, Issei Sato, and Hiroshi Nakagawa. 2015. Bayesian differential privacy on correlated data. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, New York, NY, 747--762.Google ScholarDigital Library
- Shuang-Hong Yang, Bo Long, Alex Smola, Narayanan Sadagopan, Zhaohui Zheng, and Hongyuan Zha. 2011. Like like alike: Joint friendship and interest propagation in social networks. In Proceedings of the 20th International Conference on World Wide Web. ACM, 537--546.Google ScholarDigital Library
- Kenan Y. Yılmaz, Ali T. Cemgil, and Umut Simsekli. 2011. Generalised coupled tensor factorisation. In Proceedings of the Advances in Neural Information Processing Systems. 2151--2159.Google Scholar
- Jiho Yoo and Seungjin Choi. 2012. Hierarchical variational Bayesian matrix co-factorization. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 1901--1904.Google ScholarCross Ref
- Jiho Yoo, Minje Kim, Kyeongok Kang, and Seungjin Choi. 2010. Nonnegative matrix partial co-factorization for drum source separation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 1942--1945.Google ScholarCross Ref
- Shun Zhang, Laixiang Liu, Zhili Chen, and Hong Zhong. 2018. Probabilistic matrix factorization with personalized differential privacy. Knowledge-Based Systems 183 (2019), 104864.Google Scholar
Index Terms
- Data Sharing via Differentially Private Coupled Matrix Factorization
Recommendations
Applying Differential Privacy to Matrix Factorization
RecSys '15: Proceedings of the 9th ACM Conference on Recommender SystemsRecommender systems are increasingly becoming an integral part of on-line services. As the recommendations rely on personal user information, there is an inherent loss of privacy resulting from the use of such systems. While several works studied ...
A differentially private algorithm for location data release
The rise of mobile technologies in recent years has led to large volumes of location information, which are valuable resources for knowledge discovery such as travel patterns mining and traffic analysis. However, location dataset has been confronted ...
Differentially private data publishing via optimal univariate microaggregation and record perturbation
AbstractWe present an approach to generate differentially private data sets that consists in adding noise to a microaggregated version of the original data set. While this idea has already been pursued in the literature to reduce the ...
Comments