Abstract
Differential privacy is a robust privacy standard that has been successfully applied to a range of data analysis tasks. We describe the matrix mechanism, an algorithm for answering a workload of linear counting queries that adapts the noise distribution to properties of the provided queries. Given a workload, the mechanism uses a different set of queries, called a query strategy, which are answered using a standard Laplace or Gaussian mechanism. Noisy answers to the workload queries are then derived from the noisy answers to the strategy queries. This two-stage process can result in a more complex, correlated noise distribution that preserves differential privacy but increases accuracy. We provide a formal analysis of the error of query answers produced by the mechanism and investigate the problem of computing the optimal query strategy in support of a given workload. We show that this problem can be formulated as a rank-constrained semidefinite program. We analyze two seemingly distinct techniques proposed in the literature, whose similar behavior is explained by viewing them as instances of the matrix mechanism. We also describe an extension of the mechanism in which nonnegativity constraints are included in the derivation process and provide experimental evidence of its efficacy.









Similar content being viewed by others
References
Ács, G., Castelluccia, C., Chen, R.: Differentially private histogram publishing through lossy compression. In: ICDM, pp. 1–10 (2012)
Barak, B., Chaudhuri, K., Dwork, C., Kale, S., McSherry, F., Talwar, K.: Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In: PODS (2007)
Ben-Israel, A., Greville, T.: Generalized Inverses: Theory and Applications, vol. 15. Springer, Berlin (2003)
Cormode, G., Procopiuc, M., Shen, E., Srivastava, D., Yu, T.: Differentially private spatial decompositions. In: ICDE (2012)
Dattorro, J.: Convex Optimization & Euclidean Distance Geometry. Meboo Publishing, USA (2005)
Ding, B., Winslett, M., Han, J., Li, Z.: Differentially private data cubes: optimizing noise sources and consistency. In: SIGMOD, pp. 217–228 (2011)
Dwork, C.: Differential privacy: a survey of results. In: TAMC (2008)
Dwork, C.: The differential privacy frontier. In: TCC (2009)
Dwork, C.: A firm foundation for private data analysis. Commun. ACM 54(1), 86–95 (2011)
Dwork, C., Kenthapadi, K., McSherry, F., Mironov, I., Naor, M.: Our data, ourselves: privacy via distributed noise generation. In: EUROCRYPT, pp. 486–503 (2006)
Dwork, C., Naor, M., Reingold, O., Rothblum, G., Vadhan, S.: On the complexity of differentially private data release: efficient algorithms and hardness results. In: STOC, pp. 381–390 (2009)
Dwork, C., Nissim, F.M.K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: TCC (2006)
Dwork, C., Rothblum, G.N., Vadhan, S.P.: Boosting and differential privacy. In: FOCS, pp. 51–60 (2010)
Ghosh, A., Roughgarden, T., Sundararajan, M.: Universally utility-maximizing privacy mechanisms. In: STOC (2009)
Gupta, A., Roth, A., Ullman, J.: Iterative constructions and private data release. In: TCC, pp. 339–356 (2012)
Hardt, M., Ligett, K., McSherry, F.: A simple and practical algorithm for differentially private data release. In: NIPS, pp. 2348–2356 (2012)
Hardt, M., Rothblum, G.: A multiplicative weights mechanism for privacy-preserving data analysis. In: FOCS, pp. 61–70 (2010)
Hardt, M., Talwar, K.: On the geometry of differential privacy. In: STOC, pp. 705–714 (2010)
Hay, M., Rastogi, V., Miklau, G., Suciu, D.: Boosting the accuracy of differentially-private histograms through consistency. PVLDB 3(1–2), 1021–1032 (2010)
Li, C., Hay, M., Rastogi, V., Miklau, G., McGregor, A.: Optimizing linear counting queries under differential privacy. In: PODS, pp. 123–134 (2010)
Li, C., Miklau, G.: An adaptive mechanism for accurate query answering under differential privacy. PVLDB 5(6), 514–525 (2012)
McSherry, F., Mironov, I.: Differentially private recommender systems: building privacy into the netflix prize contenders. In: SIGKDD (2009)
McSherry, F.D.: Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: SIGMOD, pp. 19–30 (2009)
Nikolov, A., Talwar, K., Zhang, L.: The geometry of differential privacy: the sparse and approximate cases. In: STOC (2013)
Nissim, K., Raskhodnikova, S., Smith, A.: Smooth sensitivity and sampling in private data analysis. In: STOC, pp. 75–84 (2007)
Qardaji, W.H., Yang, W., Li, N.: Understanding hierarchical methods for differentially private histograms. PVLDB 6(14), 1954–1965 (2013)
Roth, A., Roughgarden, T.: Interactive privacy via the median mechanism. In: STOC, pp. 765–774 (2010)
Xiao, X., Wang, G., Gehrke, J.: Differential privacy via wavelet transforms. In: ICDE, pp. 225–236 (2010)
Xiao, Y., Gardner, J.J., Xiong, L.: Dpcube: Releasing differentially private data cubes for health information. In: ICDE, pp. 1305–1308 (2012)
Xu, J., Zhang, Z., Xiao, X., Yang, Y., Yu, G., Winslett, M.: Differentially private histogram publication. VLDB J 22(6), 797–822 (2013)
Yaroslavtsev, G., Cormode, G., Procopiuc, C.M., Srivastava, D.: Accurate and efficient private release of datacubes and contingency tables. In: ICDE (2013)
Yuan, G., Zhang, Z., Winslett, M., Xiao, X., Yang, Y., Hao, Z.: Low-rank mechanism: optimizing batch queries under differential privacy. In: VLDB (2012)
Acknowledgments
We appreciate the comments of each of the anonymous reviewers. Hay, Li, and Miklau were supported by the National Science Foundation under NSF Award Numbers IIS-0964094, CNS 1012748 and CNS-1409143. McGregor was supported by CCF-0953754. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect those of the National Science Foundation.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Linear algebra fundamentals
In this section, we summarize the concepts and results in linear algebra and matrix analysis that are used throughout the paper. We use \(\text{ diag }(d_1, \ldots d_n)\) to indicate the \(n \times n\) diagonal matrix with scalars \(d_i\) on the diagonal and \(\mathbf {0}^{m \times n}\) to indicate a matrix of zeroes with m rows and n columns. Recall that for a matrix \(\mathbf {A},\,\mathbf {A}^T\) is its transpose and \({\mathbf {A}}^{-1}\) is its inverse. We say \(\mathbf {A}\) is symmetric if \(\mathbf {A}^T=\mathbf {A}\) and orthogonal if \(\mathbf {A}^T={\mathbf {A}}^{-1}\). The rank of a matrix \(\mathbf {A},\,\mathrm{rank}(\mathbf {A})\), is defined as the size of the largest set of linearly independent rows (or equivalently columns) of \(\mathbf {A}\). We say a matrix is full row (column) rank if its rank is equal to the number of its rows (columns). In particular, \({\mathbf {A}}^{-1}\) exists if and only if \(\mathbf {A}\) is a square matrix with full rank.
If matrix \(\mathbf {A}\) is a square matrix, the trace of \(\mathbf {A}\), denoted as \(\text{ trace }(\mathbf {A})\), is the sum of entries on the main diagonal if \(\mathbf {A}\). The trace of a matrix has a very important property: It is invariant under cyclic permutations; i.e, if matrix \(\mathbf {A}_1\) has m columns and matrix \(\mathbf {A}_3\) has m rows,
Another concept that is related to the trace is the Frobenius norm. The Frobenius norm of \(\mathbf {A}\) is denoted as \(||\mathbf {A}||_F\) and defined as \(\sqrt{\text{ trace }(\mathbf {A}^T\mathbf {A})}\), or, equivalently, the square root of the squared sum of all entries in \(\mathbf {A}\).
Matrix decomposition is extensively used in the paper. We focus on two decompositions: eigenvalue decomposition and singular value decomposition. Given a matrix \(\mathbf {A}\), the eigenvalue decomposition of \(\mathbf {A}\) always exists when \(\mathbf {A}\) is symmetric. It can be written as \(\mathbf {A}=\mathbf {Q}\mathbf {D}\mathbf {Q}^T\) where \(\mathbf {Q}\) is an orthogonal matrix whose columns are eigenvectors of \(\mathbf {A}\) and \(\mathbf {D}\) is a diagonal matrix whose diagonal entries are eigenvalues of \(\mathbf {A}\). The singular value decomposition of \(\mathbf {A}\) always exists and has the form \(\mathbf {A}=\mathbf {Q}\mathbf {D}\mathbf {P}^T\) where \(\mathbf {Q}\) and \(\mathbf {P}\) are orthogonal matrices and \(\mathbf {D}\) is a diagonal matrix padded with columns or rows of 0s.
We will also rely on the notion of the positive semidefinite matrix. A symmetric square matrix \(\mathbf {A}\) is called positive semidefinite, denoted as \(\mathbf {A}\succeq 0\), if for any vector \(\mathbf {x},\,\mathbf {x}^T\mathbf {A}\mathbf {x}\ge 0\). In particular, for any matrix \(\mathbf {A},\,\mathbf {A}^T\mathbf {A}\succeq 0\). Here, we present two equivalent conditions to positive semidefinite.
Proposition 15
Given an \(n\times n\) symmetric matrix \(\mathbf {A}\), both of the following conditions are equivalent with \(\mathbf {A}\succeq 0\).
-
(i)
All the eigenvalues of \(\mathbf {A}\) are nonnegative.
-
(ii)
For any \(1\le i_1<\cdots < i_k\le n\), the determinant of the matrix that consists of the intersection of the \(i_1^{th},\ldots , i_k^{th}\) rows and \(i_1^{th},\ldots , i_k^{th}\) columns of matrix \(\mathbf {A}\) is nonnegative.
In addition, we consider a generalization of the matrix inverse, called the Moore–Penrose pseudoinverse, which is defined as follows:
Definition 17
(Moore–Penrose Pseudoinverse [3]) Given a \(m\times n\) matrix \(\mathbf {A}\), a matrix \(\mathbf {A}^{+}\) is the Moore–Penrose pseudoinverse of \(\mathbf {A}\) if it satisfies each of the following:
The Moore–Penrose pseudoinverse is unique and can be computed using the singular value decomposition of a matrix.
Proposition 16
([3]) Given an \(n\times n\) diagonal matrix \(\mathbf {D}_0,\,\mathbf {D}_0^{+}=\{d'_{ij}\}\) is an \(n\times n\) diagonal matrix such that
For an \(m\times n\) matrix \(\mathbf {D}\) consisting of a diagonal matrix \(\mathbf {D}_0\) padding with columns (rows) of 0s, \(\mathbf {D}^{+}\) is an \(n\times m\) matrix consisting of the diagonal matrix \(\mathbf {D}_0^{+}\) with rows (columns) of 0s. Given a matrix \(\mathbf {A}\) with singular value decomposition \(\mathbf {A}=\mathbf {Q}\mathbf {D}\mathbf {P}^T,\,\mathbf {A}^{+}=\mathbf {P}\mathbf {D}^{+}\mathbf {Q}^T\).
When \(\mathbf {A}\) has full column rank, \(\mathbf {A}^{+}=(\mathbf {A}^T\mathbf {A})^{-1}\mathbf {A}^T\). We include some important properties of the Moore–Penrose pseudoinverse in the following proposition.
Proposition 17
([3]) The Moore–Penrose pseudoinverse satisfies the following properties:
-
1.
Given any matrix \(\mathbf {A}\), there exists a unique matrix that is the Moore–Penrose pseudoinverse of \(\mathbf {A}\).
-
2.
Given a vector \(\mathbf {y}\), we have \(||\mathbf {y}-\mathbf {A}\mathbf {x}||_2\ge ||\mathbf {y}-\mathbf {A}\mathbf {A}^{+}\mathbf {y}||_2\) for any vector \(\mathbf {x}\).
-
3.
For any satisfiable linear system \(\mathbf {B}\mathbf {A}=\mathbf {W},\,\mathbf {W}\mathbf {A}^{+}\) is a solution to the linear system and \(||\mathbf {W}\mathbf {A}^{+}||_F\le ||\mathbf {B}||_F\) for any solution \(\mathbf {B}\) to the linear system.
Appendix 2: Proofs
1.1 Proofs from Section 4
Proposition 6 The matrix mechanism \(\mathcal {M}_{\mathcal {K}, \mathbf {A}}\) inherits the privacy guarantee of \(\mathcal {K}\) and is unbiased if \(\mathcal {K}\) is unbiased.
Proof
According to Eq. (1), the matrix mechanism can be considered to post-process the output of \(\mathcal {K}(\mathbf {A}, \mathbf {x})\), without using \(\mathbf {x}\), and hence shares the same privacy guarantee with \(\mathcal {K}(\mathbf {A}, \mathbf {x})\). In addition, since \(\mathbf {W}\mathbf {A}^{+}\mathbf {A}=\mathbf {W}\), we have:
\(\square \)
1.2 Proofs from Section 5
Corollary 1 Given two query strategies \(\mathbf {A}_1\) and \(\mathbf {A}_2\) that are profile equivalent, for any query \(\mathbf {W},\,\mathbf {A}_1\) supports \(\mathbf {W}\) if and only if \(\mathbf {A}_2\) supports \(\mathbf {W}\). Furthermore, there exists a nonzero constant c such that given a differentially private algorithm \(\mathcal {K}\), for any workload query \(\mathbf {W}\) that \(\mathbf {A}_1\) and \(\mathbf {A}_2\) support, \({\textsc {TotalError}}_{\mathcal {K}, \mathbf {A}_1}( \mathbf {W} )=c\cdot {\textsc {TotalError}}_{\mathcal {K}, \mathbf {A}_2}( \mathbf {W} )\).
Proof
Given a query workload \(\mathbf {W}\), if \(\mathbf {A}_1\) supports \(\mathbf {W}\), there exists a matrix \(\mathbf {X}\) such that \(\mathbf {W}=\mathbf {X}\mathbf {A}_1\). According to Proposition 11(iii), \(\mathbf {A}_1\) and \(\mathbf {A}_2\) are profile equivalent if and only if there exists a nonzero constant c and an orthogonal matrix \(\mathbf {Q}\) such that \(\mathbf {A}_1=c\cdot \mathbf {Q}\mathbf {A}_2\). Then, \(c\cdot \mathbf {X}\mathbf {Q}\) satisfies \(\mathbf {W}=c\cdot \mathbf {X}\mathbf {Q}\mathbf {A}_2\), and therefore, \(\mathbf {A}_2\) supports \(\mathbf {W}\) as well.
The definition of profile equivalence indicates that there is a constant \(c'\) such that \((\mathbf {A}_1^T\mathbf {A}_1)^{+}= c'\cdot (\mathbf {A}_2^T\mathbf {A}_2)^{+}\). Thus, for any query workload that \(\mathbf {A}_1\) supports:
where the ratio is a value that is independent of \(\mathbf {W}\). \(\square \)
1.3 Proofs from Section 6
Corollary 2 For any linear counting query \(\mathbf {w}\) and differentially private mechanism \(\mathcal {K}\),
Proof
According to Theorem 1, let \(\mathbf {H}'_n\) be the matrix that results from removing the row of all 1s from matrix \({\left[ \begin{array}{c}\mathbf {H}_n \\ \mathbf {I}_n\end{array}\right] }\). Since \(\mathbf {H}'_n\) and \(\mathbf {Y}_n\) are equivalent strategies under both \(\epsilon \)- and \((\epsilon ,\delta )\)-differentially private mechanisms, it is sufficient to prove that for any linear counting query \(\mathbf {w}\),
Let \(\mathbf {v}=\mathbf {w}\mathbf {H}_n^{+}\) and \(\mathbf {v}'\) be a vector such that
One can verify that \(\mathbf {v}'\mathbf {H}'_n=\mathbf {v}\mathbf {H}_n=\mathbf {w}\). Since
noticing that \(||\mathbf {H}_n||_1=||\mathbf {H}_n'||_1\) and \(||\mathbf {H}_n||_2=||\mathbf {H}_n'||_2,\,{\textsc {Error}}_{\mathcal {K}, \mathbf {H}'_n}( \mathbf {w} )\le 2{\textsc {Error}}_{\mathcal {K}, \mathbf {H}_n}( \mathbf {w} )\).
On the other hand, \(\mathbf {H}'_n\) contains two copies of queries \(\mathbf {I}_n\), which is equivalent to reduce the error on those queries by a factor of 2. Noticing all other queries in \(\mathbf {H}'_n\) are contained in \(\mathbf {H}_n\), we have \({\textsc {Error}}_{\mathcal {K}, \mathbf {H}_n}( \mathbf {w} )\le 2{\textsc {Error}}_{\mathcal {K}, \mathbf {H}'_n}( \mathbf {w} )\). \(\square \)
Corollary 3 (Cell condition simplification) Given a workload \(\mathbf {W}_1\subseteq \mathbf {W}_R\) with m queries and its corresponding list of cell conditions, \(\varPhi _1\), there exists a workload \(\mathbf {W}_2\) and a list of cell conditions, \(\varPhi _2\), such that \((\mathbf {W}_1, \varPhi _1)\equiv (\mathbf {W}_2, \varPhi _2),\,|\varPhi _2| \le 2m-1\).
Proof
Since \(\mathbf {W}_1\subseteq \mathbf {W}_R\), let \(b_1, \ldots , b_{2m}\) be the boundaries of queries in \(\mathbf {W}_1\) such that \(b_1\le \cdots \le b_{2m}\). For any query \(q\in W_1\), it contains either all cells between \(b_i\) and \(b_{i+1}\) or none of them. Therefore, columns \(b_i,\ldots , b_{i+1}-1\) are exactly the same. According to Theorem 7, removing columns \(b_i+1,\ldots , b_{i+1}-1\) and merging the cell conditions of \(b_i,\ldots , b_{i+1}-1\) results in a workload and a list of cell conditions that is equivalent with \((\mathbf {W}_1, \varPhi _1)\). Thus, merging cells between \(b_i, \ldots , b_{i+1}-1\) for \(i=1,\ldots , 2m-1\) and discarding the rest of the cells gives a workload \(\mathbf {W}_2\) and its corresponding list of cell conditions \(\varPhi _2\), such that \((\mathbf {W}_2, \varPhi _2)\equiv (\mathbf {W}_1, \varPhi _1)\) and \(|\varPhi _2| \le 2m-1\). \(\square \)
Rights and permissions
About this article
Cite this article
Li, C., Miklau, G., Hay, M. et al. The matrix mechanism: optimizing linear counting queries under differential privacy. The VLDB Journal 24, 757–781 (2015). https://doi.org/10.1007/s00778-015-0398-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-015-0398-x