Abstract
The protection of private and sensitive data is an important problem of increasing interest due to the vast amount of personal data collected. Differential Privacy is arguably the most dominant approach to address privacy protection, and is currently implemented in both industry and government. In a decentralized paradigm, the sensitive information belonging to each individual will be locally transformed by a known privacy-maintaining mechanism Q. The objective of differential privacy is to allow an analyst to recover the distribution of the raw data, or some functionals of it, while only having access to the transformed data. In this work, we propose a Bayesian nonparametric methodology to perform inference on the distribution of the sensitive data, reformulating the differentially private estimation problem as a latent variable Dirichlet Process mixture model. This methodology has the advantage that it can be applied to any mechanism Q and works as a “black box” procedure, being able to estimate the distribution and functionals thereof using the same MCMC draws and with very little tuning. Also, being a fully nonparametric procedure, it requires very little assumptions on the distribution of the raw data. For the most popular mechanisms Q, like Laplace and Gaussian, we describe efficient specialized MCMC algorithms and provide theoretical guarantees. Experiments on both synthetic and real dataset show a good performance of the proposed method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Antoniak, C.E.: Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems (1974)
Barak, B., Chaudhuri, K., Dwork, C., Kale, S., McSherry, F., Talwar, K.: Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In: Proceedings of the Twenty-Sixth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 273–282. ACM (2007)
Borgs, C., Chayes, J., Smith, A.: Private graphon estimation for sparse graphs. In: Advances in Neural Information Processing Systems, pp. 1369–1377 (2015)
Borgs, C., Chayes, J., Smith, A., Zadik, I.: Revealing network structure, confidentially: improved rates for node-private graphon estimation. In: 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pp. 533–543. IEEE (2018)
Duchi, J.C., Jordan, M.I., Wainwright, M.J.: Minimax optimal procedures for locally private estimation. J. Am. Stat. Assoc. 113(521), 182–201 (2018)
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14
Eland, A.: Tackling urban mobility with technology. Google Europe Blog, 18 November 2015
Erlingsson, Ú., Pihur, V., Korolova, A.: RAPPOR: randomized aggregatable privacy-preserving ordinal response. In: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, pp. 1054–1067. ACM (2014)
Fienberg, S.E., Rinaldo, A., Yang, X.: Differential privacy and the risk-utility tradeoff for multi-dimensional Contingency Tables. In: Domingo-Ferrer, J., Magkos, E. (eds.) PSD 2010. LNCS, vol. 6344, pp. 187–199. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15838-4_17
Gaboardi, M., Lim, H.W., Rogers, R.M., Vadhan, S.P.: Differentially private chi-squared hypothesis testing: Goodness of fit and independence testing. In: ICML 2016 Proceedings of the 33rd International Conference on International Conference on Machine Learning-Volume 48. JMLR (2016)
Gaboardi, M., Rogers, R.: Local private hypothesis testing: chi-square tests. arXiv preprint arXiv:1709.07155 (2017)
Gao, F., van der Vaart, A., et al.: Posterior contraction rates for deconvolution of Dirichlet-Laplace mixtures. Electron. J. Stat. 10(1), 608–627 (2016)
Ghosal, S., Van der Vaart, A.: Fundamentals of Nonparametric Bayesian Inference, vol. 44. Cambridge University Press, Cambridge (2017)
Karwa, V., Slavković, A., et al.: Inference using noisy degrees: differentially private \(\beta \)-model and synthetic graphs. Ann. Stat. 44(1), 87–112 (2016)
Kasiviswanathan, S.P., Nissim, K., Raskhodnikova, S., Smith, A.: Analyzing graphs with node differential privacy. In: Sahai, A. (ed.) TCC 2013. LNCS, vol. 7785, pp. 457–476. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36594-2_26
Lo, A.Y.: On a class of Bayesian nonparametric estimates: I. Density estimates. Ann. Stat. 12, 351–357 (1984)
Machanavajjhala, A., Kifer, D., Abowd, J., Gehrke, J., Vilhuber, L.: Privacy: theory meets practice on the map. In: Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, pp. 277–286. IEEE Computer Society (2008)
McSherry, F., Talwar, K.: Mechanism design via differential privacy. In: FOCS 2007, pp. 94–103 (2007)
Neal, R.M.: Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Stat. 9(2), 249–265 (2000)
Nguyen, X., et al.: Convergence of latent mixing measures in finite and infinite mixture models. Ann. Stat. 41(1), 370–400 (2013)
Rinott, Y., O’Keefe, C.M., Shlomo, N., Skinner, C., et al.: Confidentiality and differential privacy in the dissemination of frequency tables. Stat. Sci. 33(3), 358–385 (2018)
Villani, C.: Optimal Transport: Old and New, vol. 338. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-71050-9
Wang, Y., Lee, J., Kifer, D.: Revisiting differentially private hypothesis tests for categorical data. arXiv preprint arXiv:1511.03376 (2015)
Wasserman, L., Zhou, S.: A statistical framework for differential privacy. J. Am. Stat. Assoc. 105(489), 375–389 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix A: Algorithm for Laplace Mechanism
In this Section, we derive the posterior \(\mathbb {P}(dX^{*}_{k}|Z_{j_{1}:j_{n_k}})\) in the case of Laplace Mechanism. Together with Algorithm 2 in the main text, this posterior offers an efficient MCMC algorithm to perform posterior estimation when the Laplace mechanism has been applied to the original data. We remark that, even though the posterior (4) might look complicated at first glance, it is actually just a mixture distribution. For most choices of \(P_0\), it is very easy to compute the weights of this mixture and sample from it. After the proof of Proposition A, we will detail a specific example of (4) for \(P_0\) being Gaussian, which will be used in the experiments. The parameters r and \(\lambda _\alpha \) are chosen as in [5] so that the Laplace Mechanism satisfies Differential Privacy.
Proposition A
(Posterior with Laplace Mech.). Let \(r > 0\) and \(\varPi _{[-r,r]}\) denote the projection operator on \([-r,r]\), defined as . Let \(Z_{i} | X_{i} \sim \text {Laplace} (\varPi _{[-r,r]}(X_{i}), \lambda _\alpha )\) \(i=1,\ldots ,n\) and let \(Z_{j_1},\ldots ,Z_{j_{n_k}}\) denote the \(n_k\) observations currently assigned to cluster k, i.e. with \(c_{j_i}=k\), assumed w.l.o.g. to be ordered increasingly. Let also \(i_- := \min \{ i\ |\ Z_{j_i} \ge -r\}\) (\(i_- = m+1\) if the set is empty) and \(i_+ := \max \{ i\ |\ Z_{j_i} \le r\}\) (\(i_+ = 0\) if the set is empty) and \(\widetilde{Z}_{i_- - 1} = -r\), \(\widetilde{Z}_{i_+ + 1} = r\) and for \(i \in [i_-, i_+]\), \(\widetilde{Z}_i = Z_{j_i}\). Then, the posterior distribution \(\mathbb {P}(dX^{*}_{k}|Z_{j_{1}:j_{n_k}})\) is proportional to

where \(C_j = e^{\frac{1}{\lambda _\alpha } \left( \sum \limits _{i=1}^{j} \widetilde{Z}_{i} - \sum \limits _{i = j+1}^n \widetilde{Z}_{i}\right) }\) for \(j=\{i_- -1,\ldots , i_+ \}\).
Normal Base Measure: Let \(P_0(dX) = \frac{1}{\sqrt{2 \pi } \sigma } e^{-\frac{(X-\mu )^2}{2\sigma ^2}} dX\) be a Normal distribution. Let us denote \(\tilde{\mu }_j = \frac{(n-2j)\sigma ^2}{\lambda _\alpha } + \mu \). Then the posterior (4) specializes into
where we have used the fact that

Let us denote, for \(j=i_- - 2,.., i_+ + 1\),
where \(\text {erf}\) denotes the Gauss error function. Let \((\pi _j)_j = (\varPi _j/\sum _k \varPi _k)_j\) the normalized weights. The posterior is then a mixture of truncated Normals with disjoint supports. In order to sample for it, we can proceed in 2 steps. First, we sample the a categorical variable J such that \(\mathbb {P}(J=j) = \pi _j\). If \(J = i_- - 2\), we sample \(X^{*}_{k}\) from a truncated Normal with mean and variance respectively \(\mu \) and \(\sigma ^2\) restricted on \((-\infty ,-r)\). If \(J = i_+ + 1\), we sample \(X^{*}_{k}\) from a truncated Normal with same mean and variance on \((r,\infty )\). Otherwise, sample \(X^{*}_{k}\) from a truncated Normal with mean and variance respectively \(\tilde{\mu }_j\) and \(\sigma ^2\) restricted to \((\widetilde{Z}_{j}, \widetilde{Z}_{j+1})\).
Appendix B: Proof of Proposition 1
Denote first \(M_P (Z_i) = \int Q(Z_i|X_i) P(dX_i)\), the marginal of the observations when the sensitive data is distributed according to P. Therefore, denoting \(P_*\) the true distribution of the sensitive data \(X_i\), it comes that the true marginal distribution of \(Z_i\) is \(M_{P_*}\). We will prove Proposition 2, following these steps,
-
1.
Step 1: We show that
$$\begin{aligned} \forall \epsilon> 0,\ \varPi (h(M_P,M_{P^*}) > \epsilon \ |\ Z_{1:n}) \rightarrow 0 \ \ \text {a.s.} \end{aligned}$$(5)Here, \(\varPi \) denotes the Dirichlet process prior and \(\varPi ( \cdot |Z_{1:n})\) denotes the posterior under the DPM model and h the Hellinger distance.
-
2.
Step 2: We will show that for any \(\delta > 0\),
$$\begin{aligned} W_2(P,P_*)^2 \le C_\delta h(M_P,M_{P^*}) ^{3/4}+ C \delta ^2 \end{aligned}$$(6)where \(W_2\) is the \(\mathbb {L}_2\) Wasserstein distance.
-
3.
Conclusion: Using step 1 and 2, we will show that for any \(\epsilon > 0\),
$$\begin{aligned} \varPi (W_1(P,P_*) > \epsilon \ |\ Z_{1:n}) \rightarrow 0 \ \ \text {a.s.} \end{aligned}$$(7)Now, since \(W_1\) is convex and uniformly bounded on the space of probability measures on \(\mathcal {X} \subset [a,b]\), Theorem 6.8 of [13] gives that \(\mathbb {E}(P|Z_{1:n})\) converges almost surely to \(P_*\) for the \(W_1\) metric. Since [a, b] is compact, this implies that it also converges for any \(W_k\) for \(k \ge 1\).
To simplify the reading of the proof, in the following C will refer to constant quantities (in particular they do not depend on n), that can change from line to line.
Let us start with the easiest step, which is Eq. (7) of step 3. Let \(\epsilon > 0\), from Eq. (6), we know that
Take \(\delta \) such that \(C \delta ^2 \le \epsilon /2\). We can hence lower bound the left hand side of previous inequality by \( \varPi (C_\delta h(M_P,M_{P^*})^{3/4} \le \epsilon /2 \ |\ Z_{1:n})\), which we know from Eq. (5) converges almost surely to 1, proving convergence in \(W_2\), which implies (7) since \(\mathcal {X} \subset [a,b]\).
Now let us consider Step 1. The Dirichlet prior \(\varPi \) defines a prior on the marginals of \(Z_i\), \(M_P\) (also denoted \(\varPi \)). Since \(Z_i \overset{iid}{\sim } M_{P_*}\), Schwartz theorem guarantees that (5) holds as long as \(M_{P_*}\) is in the Kullback-Leibler support of \(\varPi \). We will use Theorem 7.2 of [13] to prove it. Let
Let \(Z_i \in \mathcal {Z}\), for any \(X_i \in \mathcal {X}\), the differential privacy condition gives
which corresponds to condition (A1) in the theorem of [13]. We only need to prove that (A2) holds, i.e.
for any probability measure P on \(\mathcal {X}\). To see this we rewrite the expression in the \(\log \) as follows
where last inequality is due to the differential privacy property of Q. This proves Step 1.
It remains to prove Step 2. We remark first that since the noise is additive in our setting, \(Q(Z_i | X_i) = C_Q e^{-\alpha \rho (X_i-Z_i)/\varDelta }\) where \(C_Q\) is a constant (independent of \(X_i\)). Denote \(f: t \mapsto C_Q e^{-\alpha \rho (t)/\varDelta }\) and \(\mathcal {L}(f)\) its Fourier transform. Denote \(P*f\) the convolution of P and f. We also recall that
This part follows the same strategy as the proof of Theorem 2 in [20], the main difference being that here we are not interested in rates and hence need weaker conditions on f. In a similar way, we define a symmetric density K on \(\mathbb {R}\) whose Fourier transform \(\mathcal {L}(K)\) is continuous, bounded and with support included in \([-1,1]\). Let \(\delta \in (0,1)\) and \(K_\delta (x) = \frac{1}{\delta } K(x/\delta )\). Following the lines of the proof of Theorem 2 in [20], we find that
where C is a constant (depending only on K), and that
where \(g_\delta \) is the inverse Fourier transform of \( \frac{\mathcal {L}(K_\delta )}{\mathcal {L}(f)}\) and \(d_{TV}\) the total variation distance. Now, using Plancherel’s identity it comes that
where second line comes from the fact that the support of \(\mathcal {L}(K_\delta )\) is in \([-1/\delta ,1/\delta ]\), and third line from the fact that it is bounded. Since \(|\mathcal {L}(f)|\) is strictly positive (from assumptions) and continuous, it comes that \(C_\delta ^2 = C \sup \limits _{[-1/ \delta , 1/ \delta ] } |\mathcal {L}(f)|^{-2} < +\infty \). Using the bound \(d_{TV} \le \sqrt{2}\ h\), we can write
which together with (8) gives
Convergence of moments follows directly from [22] (Theorems 6.7 and 6.8).
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Ayed, F., Battiston, M., Di Benedetto, G. (2020). A Bayesian Nonparametric Approach to Differentially Private Data. In: Domingo-Ferrer, J., Muralidhar, K. (eds) Privacy in Statistical Databases. PSD 2020. Lecture Notes in Computer Science(), vol 12276. Springer, Cham. https://doi.org/10.1007/978-3-030-57521-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-57521-2_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-57520-5
Online ISBN: 978-3-030-57521-2
eBook Packages: Computer ScienceComputer Science (R0)