Abstract
We consider range queries on a graph under the constraints of differential privacy and query ranges are defined as the set of edges on the shortest path of the graph. Edges in the graph carry sensitive attributes and the goal is to report the sum of these attributes on the shortest path for counting query or the minimum of the attributes in a bottleneck query. We use differential privacy to ensure that answering these queries does not violate the privacy of the sensitive edge attributes. Our goal is to design mechanisms that minimize the additive error of the output with the given privacy budget.
For this, we develop the first set of non-trivial results for private range queries on shortest paths. For counting range queries we can achieve an additive error of \(\widetilde{O}(n^{1/3})\) for \(\varepsilon \)-DP and \(\widetilde{O}(n^{1/4})\) for \((\varepsilon , \delta )\)-DP. We present two algorithms where we control the final error by carefully balancing perturbation added to the edge attributes directly versus perturbation added to (a subset of) range query answers. Bottleneck range queries are easier and can be answered with polylogarithmic additive errors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
One can use symbolic perturbation of edge distances to produce unique shortest paths.
- 2.
Any set of three vertices \(\{u, v, w\}\) cannot be shattered: if one vertex w stays on the shortest path of the other two vertices u, v, then one cannot obtain the subset u, v; if none of them stays on the shortest path of the other two, then one cannot obtain the subset u, v, w.
- 3.
In a directed graph, a directed cycle of u, v, w can be shattered.
References
Abraham, I., Delling, D., Fiat, A., Goldberg, A.V., Werneck, R.F.: VC-dimension and shortest path algorithms. In: Aceto, L., Henzinger, M., Sgall, J. (eds.) ICALP 2011. LNCS, vol. 6755, pp. 690–699. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22006-7_58
Acs, G., Castelluccia, C., Chen, R.: Differentially private histogram publishing through lossy compression. In: 2012 IEEE 12th International Conference on Data Mining, pp. 1–10. IEEE (2012)
Beimel, A., Moran, S., Nissim, K., Stemmer, U.: Private center points and learning of halfspaces. In: Conference on Learning Theory, pp. 269–282. PMLR (2019)
Beimel, A., Nissim, K., Stemmer, U.: Private learning and sanitization: pure vs. approximate differential privacy. In: Raghavendra, P., Raskhodnikova, S., Jansen, K., Rolim, J.D.P. (eds.) APPROX/RANDOM -2013. LNCS, vol. 8096, pp. 363–378. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40328-6_26
Bhaskara, A., Dadush, D., Krishnaswamy, R., Talwar, K.: Unconditional differentially private mechanisms for linear queries. In: Proceedings of the forty-fourth annual ACM Symposium on Theory of computing, pp. 1269–1284 (2012)
Blum, A., Dwork, C., McSherry, F., Nissim, K.: Practical privacy: the SuLQ framework. In: Proceedings of the Twenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 128–138 (2005)
Blum, A., Ligett, K., Roth, A.: A learning theory approach to noninteractive database privacy. J. ACM 60(2), 12 (2013)
Bun, M., Ullman, J., Vadhan, S.: Fingerprinting codes and the price of approximate differential privacy. SIAM J. Comput. 47(5), 1888–1938 (2018)
Chan, T.H.H., Shi, E., Song, D.: Private and continual release of statistics. ACM Trans. Inf. Syst. Secur. (TISSEC) 14(3), 1–24 (2011)
Chen, J.Y., et al.: Differentially private all-pairs shortest path distances: improved algorithms and lower bounds. In: 2023 Symposium on Discrete Algorithm (SODA 2023) (2023)
Cormode, G., Kulkarni, T., Srivastava, D.: Answering range queries under local differential privacy. Proc. VLDB Endowment 12(10), 1126–1138 (2019)
Cormode, G., Procopiuc, C., Srivastava, D., Shen, E., Yu, T.: Differentially private spatial decompositions. In: 2012 IEEE 28th International Conference on Data Engineering, pp. 20–31. IEEE (2012)
Durfee, D., Rogers, R.M.: Practical differentially private top-k selection with pay-what-you-get composition. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019 (December), pp. 8–14. Vancouver, BC, Canada, pp. 3527–3537 (2019). https://proceedings.neurips.cc/paper/2019/hash/b139e104214a08ae3f2ebcce149cdf6e-Abstract.html
Dwork, C., Kenthapadi, K., McSherry, F., Mironov, I., Naor, M.: Our data, ourselves: privacy via distributed noise generation. In: Vaudenay, S. (ed.) EUROCRYPT 2006. LNCS, vol. 4004, pp. 486–503. Springer, Heidelberg (2006). https://doi.org/10.1007/11761679_29
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. J. Priv. Confidentiality 7(3), 17–51 (2016)
Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Found. Trends® Theor. Comput. Sci. 9(3–4), 211–407 (2014)
Dwork, C., Rothblum, G.N., Vadhan, S.: Boosting and differential privacy. In: 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pp. 51–60. IEEE (2010)
Fan, C., Li, P.: Distances release with differential privacy in tree and grid graph. arXiv preprint arXiv:2204.12488 (2022)
Fan, C., Li, P., Li, X.: Breaking the linear error barrier in differentially private graph distance release. arXiv preprint arXiv:2204.14247 (2022)
Funke, S., Nusser, A., Storandt, S.: On k-path covers and their applications. Proc. VLDB Endowment 7(10), 893–902 (2014)
Ghane, S., Kulik, L., Ramamoharao, K.: A differentially private algorithm for range queries on trajectories. Knowl. Inf. Syst. 63(2), 277–303 (2021)
Ghosh, A., Ding, J., Sarkar, R., Gao, J.: Differentially private range counting in planar graphs for spatial sensing. In: Proceedings of the 39th Annual IEEE International Conference on Computer Communications (INFOCOM 2020), pp. 2233–2242 (2020)
Gupta, A., Roth, A., Ullman, J.: Iterative constructions and private data release. In: Cramer, R. (ed.) TCC 2012. LNCS, vol. 7194, pp. 339–356. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28914-9_19
Hardt, M., Ligett, K., McSherry, F.: A simple and practical algorithm for differentially private data release. In: Advances in Neural Information Processing Systems, vol. 25 (2012)
Hardt, M., Rothblum, G.N.: A multiplicative weights mechanism for privacy-preserving data analysis. In: 2010 51st Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp. 61–70. IEEE (2010)
Hardt, M., Talwar, K.: On the geometry of differential privacy. In: Proceedings of the Forty-Second ACM Symposium on Theory of Computing, pp. 705–714. ACM (2010)
Hay, M., Li, C., Miklau, G., Jensen, D.: Accurate estimation of the degree distribution of private networks. In: 2009 Ninth IEEE International Conference on Data Mining, pp. 169–178. IEEE (2009)
Hay, M., Rastogi, V., Miklau, G., Suciu, D.: Boosting the accuracy of differentially-private histograms through consistency. arXiv preprint arXiv:0904.0942 (2009)
Hong, Y.C., Chen, J.: Graph database to enhance supply chain resilience for industry 4.0. IJISSCM 15(1), 1–19 (2022)
Kaplan, H., Mansour, Y., Stemmer, U., Tsfadia, E.: Private learning of halfspaces: simplifying the construction and reducing the sample complexity. Adv. Neural. Inf. Process. Syst. 33, 13976–13985 (2020)
Li, C., Hay, M., Rastogi, V., Miklau, G., McGregor, A.: Optimizing linear counting queries under differential privacy. In: Proceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 123–134. ACM (2010)
Li, C., Miklau, G.: Optimal error of query sets under the differentially-private matrix mechanism. In: Proceedings of the 16th International Conference on Database Theory, pp. 272–283 (2013)
Li, Y., Purcell, M., Rakotoarivelo, T., Smith, D., Ranbaduge, T., Ng, K.S.: Private graph data release: a survey. ACM Comput. Surv. 55(11), 1–39 (2023). https://doi.org/10.1145/3569085
Matoušek, J.: Geometric Discrepancy. Springer, Berlin Heidelberg (1999). https://doi.org/10.1007/978-3-642-03942-3
McSherry, F., Talwar, K.: Mechanism design via differential privacy. In: 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2007), pp. 94–103. IEEE (2007)
Muthukrishnan, S., Nikolov, A.: Optimal private halfspace counting via discrepancy. In: Proceedings of the Forty-Fourth Annual ACM Symposium on Theory of Computing, pp. 1285–1292 (2012)
Nikolov, A., Talwar, K., Zhang, L.: The geometry of differential privacy: the sparse and approximate cases. In: Proceedings of the Forty-Fifth Annual ACM Symposium on Theory of Computing, pp. 351–360 (2013)
Ogbuke, N.J., Yusuf, Y.Y., Dharma, K., Mercangoz, B.A.: Big data supply chain analytics: ethical, privacy and security challenges posed to business, industries and society. Prod. Plan. Control 33(2–3), 123–137 (2022)
Pourhabibi, T., Ong, K.L., Kam, B.H., Boo, Y.L.: Fraud detection: a systematic literature review of graph-based anomaly detection approaches. Decis. Support Syst. 133, 113303 (2020)
Qardaji, W., Yang, W., Li, N.: Differentially private grids for geospatial data. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp. 757–768. IEEE (2013)
Qardaji, W., Yang, W., Li, N.: Understanding hierarchical methods for differentially private histograms. Proc. VLDB Endowment 6(14), 1954–1965 (2013)
Qardaji, W., Yang, W., Li, N.: Priview: practical differentially private release of marginal contingency tables. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 1435–1446 (2014)
Qiao, G., Su, W.J., Zhang, L.: Oneshot differentially private top-k selection. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18–24 July 2021, Virtual Event. Proceedings of Machine Learning Research, vol. 139, pp. 8672–8681. PMLR (2021). http://proceedings.mlr.press/v139/qiao21b.html
Sadigurschi, M., Stemmer, U.: On the sample complexity of privately learning axis-aligned rectangles. Adv. Neural. Inf. Process. Syst. 34, 28286–28297 (2021)
Sealfon, A.: Shortest paths and distances with differential privacy. In: Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pp. 29–41 (2016)
Sharma, S., Chen, K., Sheth, A.: Toward practical privacy-preserving analytics for IoT and cloud-based healthcare systems. IEEE Internet Comput. 22(2), 42–51 (2018)
Tao, Y., Sheng, C., Pei, J.: On k-skip shortest paths. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, SIGMOD 2011, pp. 421–432. Association for Computing Machinery, New York (2011)
Toth, C.D., O’Rourke, J., Goodman, J.E.: Handbook of Discrete and Computational Geometry (2017)
Tukey, J.W.: Mathematics and the picturing of data. In: Proceedings of the International Congress of Mathematicians, Vancouver, 1975, vol. 2, pp. 523–531 (1975)
Wainwright, M.J.: High-Dimensional Statistics: A Non-asymptotic Viewpoint, vol. 48. Cambridge University Press, Cambridge (2019)
Xiao, X., Wang, G., Gehrke, J.: Differential privacy via wavelet transforms. IEEE Trans. Knowl. Data Eng. 23(8), 1200–1214 (2010)
Xiao, Y., Xiong, L., Fan, L., Goryczka, S.: DPCube: Differentially private histogram release through multidimensional partitioning. arXiv preprint arXiv:1202.5358 (2012)
Zhang, J., Xiao, X., Xie, X.: Privtree: A differentially private algorithm for hierarchical decompositions. In: Proceedings of the 2016 International Conference on Management of Data, pp. 155–170 (2016)
Acknowledgements
We would like to thank Adam Sealfon, Shyam Narayanan, Justin Chen, Badih Ghazi, Ravi Kumar, Pasin Manurangsi, Jelani Nelson and Yinzhan Xu for useful discussion and suggestions. Deng and Gao have been partially supported by NSF through CCF-2118953, CNS-2137245, CCF-2208663, and CRCNS-2207440.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Appendices
A Range Query on All Paths
When we allow queries along any path in a graph and require differential privacy guarantees, the following result provides a lower bound of \(\Omega (n)\) on the additive error. To show the lower bound, we first consider a range query formulated by the incidence matrix A, with m columns corresponding to the m edges in the graph G and rows corresponding to all queries. A query along path P is represented by a row in the matrix with an element of 1 corresponding to edge e if e is on P and 0 otherwise. We will then talk about the discrepancy of matrix A.
The classical notion of discrepancy of a matrix A is the minimum value of \(||Ax||_{\infty }\), where x is a vector with elements taking values \(+1\) or \(-1\). And the hereditary discrepancy of A is the maximum discrepancy of A limited on any subset of columns. As shown in [36], both discrepancy and hereditary discrepancy of A provides a lower bound on the additive error of differentially private range query using incidence matrix A.
Theorem 4
A \((\varepsilon , \delta )\)-differential privacy mechanism that answers range queries where ranges are defined on any path of an input graph has to incur additive error of \(\Omega (n)\).
Proof
Consider a graph of \(n+1\) vertices \(v_1, v_2, \cdots , v_{n+1}\) and 2n edges. Between vertices \(v_i\) and \(v_{i+1}\) there are two parallel edges \(e_i\) and \(e'_i\). On this graph there are \(2^n\) paths from \(v_1\) to \(v_{n+1}\). We consider only queries along these paths and the incidence matrix is a tall matrix A of \(2^n\) rows and 2n columns, corresponding to the 2n edges in the graph. Now we take a submatrix of A with only the columns corresponding to edges \(e_i\). This gives a matrix \(A'\) of \(2^n \times n\), with the rows corresponding to all subsets of [n]. \(A'\) has discrepancy of \(\Omega (n)\). To see that, consider the specific vector x that minimizes \(||A'x||_{\infty }\). Suppose x has k entries of \(+1\) and \(n-k\) entries of \(-1\). Without loss of generality, we assume \(k\ge n/2\), The row of A that has value 1 corresponding to the positive entries of x and value 0 corresponding to the negative entries of x, gives a value of \(k\ge n/2\). Thus \(||Ax||_{\infty }\) is at least n/2. This means that the hereditary discrepancy of A is at least \(\Omega (n)\).
By the same argument and use Corollary 1 in [36], we conclude that any \((\varepsilon , \delta )\)-differentially private mechanism has to have error of \(\Omega (n)\).
B Proof of Lemma 8 – \((\varepsilon , \delta )\) Algorithm for Tree Graphs
Proof
We first claim that we can answer all pairs shortest distance on a tree with \((\alpha ,\beta )\)-accuracy for
showing the utility guarantee of Lemma 8. Specifically, if we wish to have high probability bounds for the shortest path distance errors, i.e., \(\beta =O(1/n)\), the error is upper bounded by \(O\left( \frac{1}{\varepsilon }\log ^{1.5} n \sqrt{\log \left( \frac{1}{\delta }\right) }\right) \).
In Sealfon’s algorithm [45], a tree rooted at \(v_0\) is partitioned into subtrees each of at most n/2 vertices. Specifically, define \(v^*\) to be the vertex with at least n/2 descendants but none of \(v^*\)’s children has more than n/2 descendants. The tree is partitioned into the subtrees rooted at the children of \(v^*\), and a subtree of the remaining vertices rooted at \(v_0\). In Sealfon’s algorithm a Laplace noise of \({{\textsf {Lap}}(\log n/\varepsilon )}\) is added to the shortest path distance from \(v_0\) to \(v^*\) and the edges from \(v^*\) to each of its children. The algorithm then repeatedly privatizes each of the subtrees recursively. Using Sealfon’s algorithm, we know that for a given root node \(v_0\), computing the single source (with the root being the source) shortest path distance requires adding at most \(O(\log n)\) privatized edges. Further, their algorithm ensures that any edge can be in at most \(\log n\) levels of recursion and hence can be used to compute \(O(\log n)\) noisy answers. In other words, the number of adaptive compositions we need is \(O(\log n)\).
We use the Gaussian mechanism to privatize the edges. Since we are concerned with approximate-DP guarantee, the variance of the noise required to preserve \((\varepsilon ,\delta )\)-differential privacy is \(\sigma ^2 := O\left( \frac{1}{\varepsilon ^2}\log (1/\delta ) \log n\right) \).
Fix a node u. Let \(\widehat{d}(u,v_0)\) be the distance estimated by using Sealfon’s algorithm instantiated with the Gaussian mechanism instead of the Laplace mechanism. Now the noise added are zero mean. Therefore,
Using the standard concentration of Gaussian distribution [50] implies that
Setting \(a = \frac{C}{\varepsilon }\log n \sqrt{\log \left( \frac{2n}{\beta }\right) \log \left( \frac{1}{\delta }\right) }\) for some constant \(C>0\), we have
Now union bound gives that
We can now use the above result to answer all pair shortest paths by fixing a node \(v^*\) to be the root note and compute a single source shortest distance with the root node being the source node. Once we have all these estimates, to compute all pair shortest distance, for any two vertices, \((u,v) \in \mathcal {V} \times \mathcal {V} \), we first compute the least common ancestor z of u and v. We then compute the distance as follows:
Since each of these estimates can be computed with an absolute error:
we get the final additive error bound. That is,
completing the proof of the claim.
C Proof of Lemma 6
Proof
(Proof of Lemma 6). The lemma is proved by a simple application of the Chernoff bound. For each path P(u, v) with more than \(\frac{n}{\zeta }\) edges, let \(v'\) be the \(\left( \frac{n}{\zeta }+1\right) \)-th vertices on the path P(u, v) from u. Similarly, let \(u'\) be the \(\left( \frac{n}{\zeta }+1\right) \)-th vertices on the path P(u, v) from v (traversing backward). We show that there must be two vertices sampled in S on both \(P(u, v')\) and \(P(u',v)\), which is sufficient to prove the lemma statement.
Define \(X_{u,v'}\) as the random variable for the number of vertices on \(P(u,v')\) that are sampled in S, and define \(X_{z}\) for each \(z \in P(u,v')\) as the indicator random variable for z to be sampled in S. It is straightforward to see that \(X_{u,v'}=\sum _{z\in P(u,v')} X_{z}\). Since \(P(u,v')\) has at least \(\frac{n}{\zeta }\) vertices, and we are sampling \(s=100\, \log {n}\cdot \zeta \) vertices uniformly at random as S, the expected number of vertices on \(P(u,v')\) that are sampled is at least \(100\, \log {n}\). Formally, we have
As such, by applying the multiplicative Chernoff bound, we have
The same argument can be applied to \(P(u',v)\) by defining \(X_{u',v}\) as the total number of vertices that are sampled in S. We omit the repetitive details for simplicity. Finally, although the random variables for different (u, v) pairs are dependent, we can still apply a union bound regardless the dependence, and get the desired statement.
D A Remark on Range Query Shortest Path Lower Bound
For counting range queries with \((\varepsilon , \delta )\)-DP guarantee, there is a lower bound of \(\Omega (n^{1/6})\) on the additive error, adapted from the construction of the lower bound for private all pairs shortest distances [10]. Specifically, the construction uses a graph where vertices are points in the plane and edges map to line segments between two points that do not contain other vertices. The edge length is the Euclidean length and therefore the shortest path between two vertices is the path corresponding to a straight line. The range query problem can be now formulated as a (special case) of linear queries, as in Sect. 6 and Sect. A, where the matrix A corresponds to the incidence matrix of the shortest paths and the edges in the graph. It is known that this matrix has a discrepancy lower bound of \(\Omega (n^{1/6})\) [34]. By the connection of the discrepancy and linear query lower bounds [36], this is a lower bound for our problem.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Deng, C., Gao, J., Upadhyay, J., Wang, C. (2023). Differentially Private Range Query on Shortest Paths. In: Morin, P., Suri, S. (eds) Algorithms and Data Structures. WADS 2023. Lecture Notes in Computer Science, vol 14079. Springer, Cham. https://doi.org/10.1007/978-3-031-38906-1_23
Download citation
DOI: https://doi.org/10.1007/978-3-031-38906-1_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-38905-4
Online ISBN: 978-3-031-38906-1
eBook Packages: Computer ScienceComputer Science (R0)