TC 11 Briefing PapersDifferentially private graph publishing with degree distribution preservation
Introduction
With the rapid development of information networks (e.g., citation networks and communication networks), a large volume of network data has been generated, which enables a wide spectrum of data analysis tasks. Network data is typically represented as graphs, where nodes represent a set of individuals, and edges represent connections between them. In this paper, the term network is henceforth called graph. It has been shown that, with naively sanitized graph data, an adversary is able to launch different types of privacy attacks that re-identify nodes, reveal edges between nodes (Hay et al., 2011). Therefore, graph data needs to be sanitized with formal, provable privacy guarantees before it can be released to the public.
Differential privacy (DP) (Dwork et al., 2006) is a widely accepted privacy model, which ensures that the output of the process undergoes sufficient perturbation to mask the presence or absence of any individual in the input, while offering both provable properties on the results and practical algorithms to provide them. Graph data publishing under DP, which implements by distilling an input graph into structural statistics (e.g. -series, etc.), adding noise to the extracted structure features, and then generating a synthetic graph using the perturbed structure features, has been extensively studied in the past few years (Sala et al., 2011), (Wang and Wu, 2013), (Xiao et al., 2014). One of the fundamental issues when sanitizing graph data is to avoid disclosure of individuals’ sensitive information while still permitting certain analysis on the graph. Degree distribution is an important data utility as it serves as fundamental operations for many graph analysis tasks, such as worm propagation, and protocol design, etc. In practice, degree distribution is nontrivially distorted, and the accompanying information loss is incurred by the following two aspects. First, it is difficult to capture local and global degree distribution information simultaneously. In particular, existing methods (Sala et al., 2011), (Wang and Wu, 2013) distill a graph into a set of -2 series (which is the degree distribution of connected components of size within a target graph). While this idea is appealing, generator algorithms have not yet been discovered for -series where which results in Sala et al. (2011), Wang and Wu (2013) only capturing local degree information. Second, massive noise is injected into the exacted structure features. For example, the existing methods (Sala et al., 2011), (Wang and Wu, 2013) ensure DP on -series statistics. Yet, as graph data is highly correlated, this implies that a prohibitive amount of noise needs to be injected to mask the change of a single edge, which leads to poor overall data utility. After that, for alleviating the impact of a single, Xiao et al. (2014) encode a graph’s structure in terms of connection probabilities between nodes, which, unfortunately, still cannot achieve high data utility on degree distribution.
Recently, Generative Adversarial Network for graphs (NetGAN) (Bojchevski et al., 2018) has attracted much research interest in machine learning, due to its ability to learn both local and global degree correlations of a target (graph data) via biased random walks over a single graph, with two deep neural networks, namely the generator and the discriminator, in a minimax game, to generate high quality “fake” samples that are hard to be distinguished from real ones. Further, it allows us to move the burden of privacy-preserving to the learning procedure of its discriminator, which eliminates the impact of a single edge.
Motivated by this, we propose Priv-GAN, a private publishing model based on NetGAN for graph data. Unlike previous methods (Sala et al., 2011), (Wang and Wu, 2013), (Xiao et al., 2014) publishing a sanitized version of the original graph data, we aim to publish a deep generative model, which is trained utilizing the original data in a DP manner. With Priv-GAN, data holders, once equipped with this generative model, are able to produce synthetic graph data with degree distribution preservation. In contrast to every other solution, ours highlights two significant advantages. First, based on the stochastic gradient langevin dynamics (SGLD) (Raginsky et al., 2017) with the ability to escape local minima, a private Langevin with gradient estimate is designed as an optimizer for discriminator, which provides a theoretical gradient upper bound for the private Langevin and achieves DP by adding noise to the gradients according to this bound. It is worth pointing that since GAN and its variants involve a minimax problem formulation, deriving the gradient of such problem is often a prohibitively complex analytic calculation. For this, the gradient estimate (Duchi et al., 2015), which only uses the function values in optimization process, is employed to simplify such calculation. Second, importantly, the error bound of the noisy Langevin method is theoretically analyzed, which demonstrates that with appropriate parameter settings, Priv-GAN can maintain high utility guarantees. Briefly speaking, we make the following contributions:
- •
We present Priv-GAN, a private publishing model based on NetGAN, which can preserve high data utility on degree distribution while obeying -DP.
- •
A private Langevin with gradient estimate is designed as an optimizer for discriminator, which provides a theoretical gradient upper bound and achieves DP by adding noise to the gradients. In particular, the gradient estimate, which only uses the function values in optimization process, simplifies intricate gradient calculations when using GAN model and its variants.
- •
The error bound of the noisy Langevin method is theoretically analyzed, which reveals that with appropriate parameter settings, the proposed Priv-GAN is able to maintain high utility guarantees.
- •
Experimental results on real datasets show that Priv-GAN outperforms its most relevant counterparts in synthesising DP graphs that preserve the degree distribution.
The remainder of our work is organized as follows. We discuss related works in Section 2. Section 3 describes the preliminaries of our solution. The problem statement is introduced in Section 4. Our proposed solution is presented with detailed analysis in Section 5. The experimental results are reported in Section 6, followed by conclusion in Section 7.
Section snippets
Related work
In this section, we first review previous work on two directions, namely DP-based graph statistics publishing and DP-based graph publishing, and discuss how our work differs from existing work. Then, we briefly introduce the DP-based deep learning.
Preliminaries
In this section, we first review DP and GAN, and then introduce SGLD and Gaussian smoothing, in which we also explain why SGLD is employed.
Problem statement
Throughout this paper, we investigate the following setup. A graph database curator wishes to publish a sanitized graph that mimics an original graph in terms of some important characteristics such as degree distribution while satisfying -DP. To this end, we start from a state-of-the-art, NetGAN, which can capture both local and global degree correlations over a single graph via biased random walks. Our objective is to develop a DP NetGAN, Priv-GAN, by devising a private Langevin with
The proposed solution priv-GAN
In this section, we elaborate the design of Priv-GAN, a NetGAN-based DP graph publishing approach to protect the privacy of individuals in the released graph, and theoretically retain high data utility on degree distribution.
Experiments
The purpose of our experiments is to empirically validate the following claim: our Priv-GAN model outperforms preceding models in terms of the preservation of degree distribution, while remaining comparable in terms of the effective preservation of other essential graph structural properties.
Conclusion
In this paper, we have proposed Priv-GAN, a DP graph publishing model based on NetGAN. The underlying highlights lie in the following aspects: (i) a private Langevin with gradient estimate is designed as an optimizer for discriminator, which provides a theoretical gradient upper bound and achieves DP by adding noise to the gradients; and (ii) the error bound of the noisy Langevin method is theoretically analyzed, which demonstrates that with appropriate parameter settings, Priv-GAN is able to
CRediT authorship contribution statement
Sen Zhang: Conceptualization, Methodology, Software, Writing - original draft. Weiwei Ni: Conceptualization, Writing - review & editing, Supervision. Nan Fu: Conceptualization, Methodology, Software, Writing - original draft.
Declaration of Competing Interest
The authors declare that they have no conflicts of interest to this work. We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.
Acknowledgments
The authors would like to sincerely thank anonymous reviewers for their valuable and constructive suggestions and encouraging comments. The work was supported by the National Natural Science Foundation of China under grant 61772131. Besides, it was partially supported by the State Grid Corporation of China Project (5700-202018268A-0-0-00).
SEN ZHANG is currently pursuing the Ph.D. degree with the Complex Data Management Laboratory, School of Computer Science and Engineering, Southeast University, Nanjing, Jiangsu, China. His research interests include data mining, data privacy protection, big data technology, and complex data management.
References (45)
- et al.
Improved recurrent generative adversarial networks with regularization techniques and a controllable framework
Inf Sci (Ny)
(2020) - et al.
Differential privacy preserving spectral graph analysis
Pacific-Asia Conference on Knowledge Discovery and Data Mining
(2013) - et al.
Deep learning with differential privacy
Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security
(2016) - et al.
Differentially private mixture of generative neural networks
IEEE Trans. Knowl. Data Eng.
(2018) - et al.
Publishing social network graph eigen-spectrum with privacy guarantees
IEEE Trans. Network Sci. Eng.
(2020) - et al.
On differentially private graph sparsification and applications
Advances in Neural Information Processing Systems
(2019) - et al.
A simple proof of the poincaré inequality for a large class of probability measures
Electronic Communications in Probability
(2008) - et al.
Sharp nonasymptotic bounds on the norm of random matrices with independent entries
The Annals of Probability
(2016) - et al.
Netgan: Generating graphs via random walks
Proceedings of the 35th International Conference on Machine Learning
(2018) - et al.
Correlated network data publication via differential privacy
VLDBJ.
(2014)
Recursive mechanism: towards node differential privacy and unrestricted joins
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Diffusion for global optimization in
SIAM J. Control Optim.
On the properties of neural machine translation: encoder-decoder approaches
arXiv preprint arXiv:1409.1259
Optimal rates for zero-order convex optimization: the power of two function evaluations
IEEE Trans. Inf. Theory
Calibrating noise to sensitivity in private data analysis
TCC
The algorithmic foundations of differential privacy
Foundations and Trends in Theoretical Computer Science.
Differentially private release of synthetic graphs
Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms
Preserving persistent homology in differentially private graph publications
IEEE INFOCOM 2019-IEEE Conference on Computer Communications
Generative adversarial nets
Advances in neural information processing systems
Iterative constructions and private data release
Theory of cryptography conference
Accurate estimation of the degree distribution of private networks
2009 Ninth IEEE International Conference on Data Mining
Privacy-aware data management in information networks
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Cited by (17)
Privacy-utility equilibrium data generation based on Wasserstein generative adversarial networks
2023, Information SciencesPGAN framework for synthesizing sensor data privately
2022, Journal of Information Security and ApplicationsSoK: Privacy-Preserving Data Synthesis
2023, arXiv
SEN ZHANG is currently pursuing the Ph.D. degree with the Complex Data Management Laboratory, School of Computer Science and Engineering, Southeast University, Nanjing, Jiangsu, China. His research interests include data mining, data privacy protection, big data technology, and complex data management.
WEIWEI NI received the B.E. and Ph.D. degrees in computer science from Southeast University, China, in 2001 and 2005, respectively. He is currently a Professor with Southeast University. His research interests include data mining, data provenance, data privacy protection, big data technology, and complex data management.
NAN FU is currently pursuing the Ph.D. degree with the Complex Data Management Laboratory, School of Computer Science and Engineering, Southeast University, Nanjing, Jiangsu, China. His research interests include data mining and data privacy protection.