skip to main content
research-article

Data Sharing via Differentially Private Coupled Matrix Factorization

Published:13 May 2020Publication History
Skip Abstract Section

Abstract

We address the privacy-preserving data-sharing problem in a distributed multiparty setting. In this setting, each data site owns a distinct part of a dataset and the aim is to estimate the parameters of a statistical model conditioned on the complete data without any site revealing any information about the individuals in their own parts. The sites want to maximize the utility of the collective data analysis while providing privacy guarantees for their own portion of the data as well as for each participating individual. Our first contribution is to classify these different privacy requirements as (i) site-level and (ii) user-level differential privacy and present formal privacy guarantees for these two cases under the model of differential privacy. To satisfy a stronger form of differential privacy, we use a variant of differential privacy which is local differential privacy where the sensitive data is perturbed with a randomized response mechanism prior to the estimation. In this study, we assume that the data instances that are partitioned between several parties are arranged as matrices. A natural statistical model for this distributed scenario is coupled matrix factorization. We present two generic frameworks for privatizing Bayesian inference for coupled matrix factorization models that are able to guarantee proposed differential privacy notions based on the privacy requirements of the model. To privatize Bayesian inference, we first exploit the connection between differential privacy and sampling from a Bayesian posterior via stochastic gradient Langevin dynamics and then derive an efficient coupled matrix factorization method. In the local privacy context, we propose two models that have an additional privatization mechanism to achieve a stronger measure of privacy and introduce a Gibbs sampling based algorithm. We demonstrate that the proposed methods are able to provide good prediction accuracy on synthetic and real datasets while adhering to the introduced privacy constraints.

References

  1. Martin Abadi, Andy Chu, Ian Goodfellow, Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep learning with differential privacy. In Proceedings of the 23rd ACM Conference on Computer and Communications Security (ACM CCS’16). 308--318.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Orly Alter, Patrick O. Brown, and David Botstein. 2003. Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms. Proceedings of the National Academy of Sciences 100, 6 (2003), 3351--3356.Google ScholarGoogle ScholarCross RefCross Ref
  3. Rina Foygel Barber and John C. Duchi. 2014. Privacy and statistical risk: Formalisms and minimax bounds. arXiv:1412.4451 (2014).Google ScholarGoogle Scholar
  4. Raef Bassily, Adam Smith, and Abhradeep Thakurta. 2014. Private empirical risk minimization, revisited. arXiv:1405.7085 (2014).Google ScholarGoogle Scholar
  5. Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ.Google ScholarGoogle Scholar
  6. Ali Taylan Cemgil. 2009. Bayesian inference for nonnegative matrix factorisation models. Intell. Neuroscience 2009, Article 4 (Jan. 2009), 17 pages. https://doi.org/10.1155/2009/785152Google ScholarGoogle Scholar
  7. Kamalika Chaudhuri, Claire Monteleoni, and Anand D. Sarwate. 2011. Differentially private empirical risk minimization. The Journal of Machine Learning Research 12 (2011), 1069--1109.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Siddhartha Chib and Edward Greenberg. 1995. Understanding the Metropolis-Hastings algorithm. The American Statistician 49, 4 (1995), 327--335.Google ScholarGoogle Scholar
  9. Christos Dimitrakakis, Blaine Nelson, Aikaterini Mitrokotsa, and Benjamin I. P. Rubinstein. 2014. Robust and private Bayesian inference. In Algorithmic Learning Theory. Springer, 291--305.Google ScholarGoogle Scholar
  10. John C. Duchi, Michael I. Jordan, and Martin J. Wainwright. 2013. Local privacy, data processing inequalities, and minimax rates. arXiv:1302.3203 (2013).Google ScholarGoogle Scholar
  11. Cynthia Dwork. 2006. Differential privacy. In Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part II. Springer, 1--12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Cynthia Dwork and Jing Lei. 2009. Differential privacy and robust statistics. In Proceedings of the 41st Annual ACM Symposium on Theory of Computing. ACM, 371--380.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating noise to sensitivity in private data analysis. In Proceedings of the 3rd Conference on Theory of Cryptography (TCC’06). Springer-Verlag, Berlin, Heidelberg, 265--284.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Cynthia Dwork and Aaron Roth. 2013. The algorithmic foundations of differential privacy. Theoretical Computer Science 9, 3-4 (2013), 211--407.Google ScholarGoogle Scholar
  15. Cynthia Dwork and Adam Smith. 2010. Differential privacy for statistics: What we know and what we want to learn. Journal of Privacy and Confidentiality 1, 2 (2010), 2.Google ScholarGoogle ScholarCross RefCross Ref
  16. Cynthia Dwork, Kunal Talwar, Abhradeep Thakurta, and Li Zhang. 2014. Analyze gauss: Optimal bounds for privacy-preserving principal component analysis. In Proceedings of the 46th Annual ACM Symposium on Theory of Computing. ACM, 11--20.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Stuart Geman and Donald Geman. 1984. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6 (1984), 721--741.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Arpita Ghosh, Tim Roughgarden, and Mukund Sundararajan. 2012. Universally utility-maximizing privacy mechanisms. SIAM Journal on Computing 41, 6 (2012), 1673--1693.Google ScholarGoogle ScholarCross RefCross Ref
  19. Walter R. Gilks, Sylvia Richardson, and David Spiegelhalter. 1995. Markov Chain Monte Carlo in Practice. CRC press.Google ScholarGoogle Scholar
  20. Prem Gopalan, Jake M. Hofman, and David M. Blei. 2015. Scalable Recommendation with Hierarchical Poisson Factorization (UAI’15). AUAI Press, Arlington, Virginia, USA, 326–335.Google ScholarGoogle Scholar
  21. Prem Gopalan, Francisco J. Ruiz, Rajesh Ranganath, and David M. Blei. 2014. Bayesian nonparametric poisson factorization for recommendation systems. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 275--283.Google ScholarGoogle Scholar
  22. Sunil Kumar Gupta, Santu Rana, and Svetha Venkatesh. 2016. Differentially private multi-task learning. In Proceedings of the 11th Pacific Asia Workshop on Intelligence and Security Informatics, Vol. 9650. 101--113.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Jihun Hamm, Paul Cao, and Mikhail Belkin. 2016. Learning privately from multiparty data. CoRR abs/1602.03552 (2016). Retrieved from http://arxiv.org/abs/1602.03552.Google ScholarGoogle Scholar
  24. Mikko Heikkilä, Eemil Lagerspetz, Samuel Kaski, Kana Shimizu, Sasu Tarkoma, and Antti Honkela. 2017. Differentially private Bayesian learning on distributed data. In Proceedings of the Advances in Neural Information Processing Systems. 3226--3235.Google ScholarGoogle Scholar
  25. Matthew D. Hoffman, David M. Blei, Chong Wang, and John William Paisley. 2013. Stochastic variational inference. Journal of Machine Learning Research 14, 1 (2013), 1303--1347. Retrieved from http://dl.acm.org/citation.cfm?id=2502622.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Naoise Holohan, Douglas J. Leith, and Oliver Mason. 2017. Extreme points of the local differential privacy polytope. Linear Algebra and its Applications 534 (2017), 78--96. http://mural.maynoothuniversity.ie/11658/.Google ScholarGoogle Scholar
  27. Jingyu Hua, Chang Xia, and Sheng Zhong. 2015. Differentially private matrix factorization. In Proceedings of the International Joint Conferences on Artificial Intelligence. 1763--1770.Google ScholarGoogle Scholar
  28. Hafiz Imtiaz and Anand D. Sarwate. 2018. Distributed differentially-private algorithms for matrix and tensor factorization. IEEE Journal of Selected Topics in Signal Processing 12, 6 (December 2018), 1449--1464. https://doi.org/10.1109/JSTSP.2018.2877842Google ScholarGoogle ScholarCross RefCross Ref
  29. Prateek Jain, Om Thakkar, and Abhradeep Thakurta. 2017. Differentially private matrix completion, revisited. arXiv preprint arXiv:1712.09765 (2017).Google ScholarGoogle Scholar
  30. Joonas Jälkö, Onur Dikmen, and Antti Honkela. 2016. Differentially private variational inference for non-conjugate models. arXiv preprint arXiv:1610.08749 (2016).Google ScholarGoogle Scholar
  31. Jia-Yun Jiang, Cheng-Te Li, and Shou-De Lin. 2017. Towards a more reliable privacy-preserving recommender system. arXiv preprint arXiv:1711.07638 (2017).Google ScholarGoogle Scholar
  32. Meng Jiang, Peng Cui, Rui Liu, Qiang Yang, Fei Wang, Wenwu Zhu, and Shiqiang Yang. 2012. Social contextual recommendation. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM, 45--54.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Peter Kairouz, Sewoong Oh, and Pramod Viswanath. 2014. Extremal mechanisms for local differential privacy. In Proceedings of the Advances in Neural Information Processing Systems. 2879--2887.Google ScholarGoogle Scholar
  34. John Kent. 1978. Time-reversible diffusions. Advances in Applied Probability 10, 4 (1978), 819--835.Google ScholarGoogle ScholarCross RefCross Ref
  35. Bai Li, Changyou Chen, Hao Liu, and Lawrence Carin. 2019. On connecting stochastic gradient MCMC and differential privacy. 89 (Apr. 2019), 557--566.Google ScholarGoogle Scholar
  36. Ziqi Liu, Yu-Xiang Wang, and Alexander J. Smola. 2015. Fast differentially private matrix factorization. In Proceedings of the 9th ACM Conference on Recommender Systems. 171--178.Google ScholarGoogle Scholar
  37. Bo Long, Zhongfei Mark Zhang, Xiaoyun Wu, and Philip S. Yu. 2006. Spectral clustering for multi-type relational data. In Proceedings of the 23rd International Conference on Machine Learning. ACM, 585--592.Google ScholarGoogle Scholar
  38. Frank McSherry and Ilya Mironov. 2009. Differentially private recommender systems: Building privacy into the net. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 627--636.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Darakhshan J. Mir. 2013. Information-theoretic foundations of differential privacy. In Proceedings of the International Symposium on Foundations and Practice of Security. Springer, 374--381.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Radford M. Neal et al. 2011. MCMC using Hamiltonian dynamics. Handbook of Markov Chain Monte Carlo 2, 11 (2011), 2.Google ScholarGoogle Scholar
  41. Valeria Nikolaenko, Stratis Ioannidis, Udi Weinsberg, Marc Joye, Nina Taft, and Dan Boneh. 2013. Privacy-preserving matrix factorization. In Proceedings of the 2013 ACM SIGSAC Conference on Computer 8 Communications Security. ACM, 801--812.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Nicolas Papernot, Martín Abadi, Úlfar Erlingsson, Ian J. Goodfellow, and Kunal Talwar. 2016. Semi-supervised knowledge transfer for deep learning from private training data. CoRR abs/1610.05755 (2016).Google ScholarGoogle Scholar
  43. Mijung Park, James R. Foulds, Kamalika Chaudhuri, and Max Welling. 2016. Variational Bayes In Private Settings (VIPS). CoRR abs/1611.00340 (2016). Retrieved from http://arxiv.org/abs/1611.00340.Google ScholarGoogle Scholar
  44. Manas Pathak, Shantanu Rane, and Bhiksha Raj. 2010. Multiparty differential privacy via aggregation of locally trained classifiers. In Proceedings of the Advances in Neural Information Processing Systems. 1876--1884.Google ScholarGoogle Scholar
  45. Arun Rajkumar and Shivani Agarwal. 2012. A differentially private stochastic gradient descent algorithm for multiparty classification. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 933--941.Google ScholarGoogle Scholar
  46. Christian P. Robert and George Casella. 2005. Monte Carlo Statistical Methods. Springer-Verlag New York, Inc., Secaucus, NJ.Google ScholarGoogle Scholar
  47. Ruslan Salakhutdinov and Andriy Mnih. 2008. Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In Proceedings of the 25th International Conference on Machine Learning. ACM, 880--887.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. A. D. Sarwate, S. M. Plis, J. A. Turner, M. R. Arbabshirani, and V. D. Calhoun. 2014. Sharing privacy-sensitive access to neuroimaging and genetics data: A review and preliminary validation. Frontiers in Neuroinformatics 8 (2014), 35. https://doi.org/10.3389/fninf.2014.00035Google ScholarGoogle ScholarCross RefCross Ref
  49. Anand D. Sarwate and Kamalika Chaudhuri. 2013. Signal processing and machine learning with differential privacy: Algorithms and challenges for continuous data. Signal Processing Magazine, IEEE 30, 5 (2013), 86--94.Google ScholarGoogle ScholarCross RefCross Ref
  50. Aaron Schein, Zhiwei Steven Wu, Mingyuan Zhou, and Hanna Wallach. 2019. Locally private Bayesian inference for count models. 97 (Jun. 2019), 5638--5648.Google ScholarGoogle Scholar
  51. Reza Shokri and Vitaly Shmatikov. 2015. Privacy-preserving deep learning. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, CO, October 12-6, 2015. 1310--1321.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Umut Simsekli, Ali Taylan Cemgil, and Beyza Ermis. 2015. Learning mixed divergences in coupled matrix and tensor factorization models.. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 2120--2124.Google ScholarGoogle ScholarCross RefCross Ref
  53. Ajit P. Singh and Geoffrey J. Gordon. 2008. Relational learning via collective matrix factorization. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 650--658.Google ScholarGoogle Scholar
  54. John G. Skellam. 1946. The frequency distribution of the difference between two Poisson variates belonging to different populations. Journal of the Royal Statistical Society. Series A (General) 109, Pt 3 (1946), 296--296.Google ScholarGoogle Scholar
  55. Shuang Song, Kamalika Chaudhuri, and Anand D. Sarwate. 2013. Stochastic gradient descent with differentially private updates. In Proceedings of the IEEE Global Conference on Signal and Information Processing.Google ScholarGoogle Scholar
  56. Shuang Song, Kamalika Chaudhuri, and Anand D. Sarwate. 2014. Learning from data with heterogeneous noise using SGD. arXiv:1412.5617 (2014).Google ScholarGoogle Scholar
  57. Ambuj Tewari and Sougata Chaudhuri. 2014. On Lipschitz continuity and smoothness of loss functions in learning to rank. arXiv preprint arXiv:1405.0586 (2014).Google ScholarGoogle Scholar
  58. Chain Monte Carlo. 2004. Markov chain Monte Carlo and Gibbs sampling. Lecture Notes for EEB 581 (2004).Google ScholarGoogle Scholar
  59. Jun Wang and Qiang Tang. 2017. Differentially private neighborhood-based recommender systems. In Proceedings of the International Conference on ICT Systems Security and Privacy Protection. Springer, 459--473.Google ScholarGoogle ScholarCross RefCross Ref
  60. Yu-Xiang Wang, Stephen E. Fienberg, and Alexander J. Smola. 2015. Privacy for free: Posterior sampling and stochastic gradient Monte Carlo. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6-11 July 2015. 2493--2502.Google ScholarGoogle Scholar
  61. Yu-Xiang Wang. 2018. Revisiting differentially private linear regression: Optimal and adaptive prediction 8 estimation in unbounded domain. (2018).Google ScholarGoogle Scholar
  62. Stanley L. Warner. 1965. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association 60, 309 (1965), 63--69.Google ScholarGoogle ScholarCross RefCross Ref
  63. Max Welling and Yee W. Teh. 2011. Bayesian learning via stochastic gradient Langevin dynamics. In Proceedings of the 28th International Conference on Machine Learning. 681--688.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Oliver Williams and Frank McSherry. 2010. Probabilistic inference and differential privacy. In Proceedings of the Advances in Neural Information Processing Systems. 2451--2459.Google ScholarGoogle Scholar
  65. Yonghui Xiao and Li Xiong. 2012. Bayesian inference under differential privacy. arXiv:1203.0617 (2012).Google ScholarGoogle Scholar
  66. Liyang Xie, Inci M. Baytas, Kaixiang Lin, and Jiayu Zhou. 2017. Privacy-preserving distributed multi-task learning with asynchronous updates. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’17). 1195--1204.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Yu Xin and Tommi Jaakkola. 2014. Controlling privacy in recommender systems. In Proceedings of the Advances in Neural Information Processing Systems. 2618--2626.Google ScholarGoogle Scholar
  68. Bin Yang, Issei Sato, and Hiroshi Nakagawa. 2015. Bayesian differential privacy on correlated data. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, New York, NY, 747--762.Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Shuang-Hong Yang, Bo Long, Alex Smola, Narayanan Sadagopan, Zhaohui Zheng, and Hongyuan Zha. 2011. Like like alike: Joint friendship and interest propagation in social networks. In Proceedings of the 20th International Conference on World Wide Web. ACM, 537--546.Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Kenan Y. Yılmaz, Ali T. Cemgil, and Umut Simsekli. 2011. Generalised coupled tensor factorisation. In Proceedings of the Advances in Neural Information Processing Systems. 2151--2159.Google ScholarGoogle Scholar
  71. Jiho Yoo and Seungjin Choi. 2012. Hierarchical variational Bayesian matrix co-factorization. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 1901--1904.Google ScholarGoogle ScholarCross RefCross Ref
  72. Jiho Yoo, Minje Kim, Kyeongok Kang, and Seungjin Choi. 2010. Nonnegative matrix partial co-factorization for drum source separation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 1942--1945.Google ScholarGoogle ScholarCross RefCross Ref
  73. Shun Zhang, Laixiang Liu, Zhili Chen, and Hong Zhong. 2018. Probabilistic matrix factorization with personalized differential privacy. Knowledge-Based Systems 183 (2019), 104864.Google ScholarGoogle Scholar

Index Terms

  1. Data Sharing via Differentially Private Coupled Matrix Factorization

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Knowledge Discovery from Data
      ACM Transactions on Knowledge Discovery from Data  Volume 14, Issue 3
      June 2020
      381 pages
      ISSN:1556-4681
      EISSN:1556-472X
      DOI:10.1145/3388473
      Issue’s Table of Contents

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 13 May 2020
      • Online AM: 7 May 2020
      • Accepted: 1 November 2019
      • Revised: 1 September 2019
      • Received: 1 March 2019
      Published in tkdd Volume 14, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format