TC 11 Briefing Papers
Differentially private graph publishing with degree distribution preservation

https://doi.org/10.1016/j.cose.2021.102285Get rights and content

Abstract

The goal of privacy-preserving graph publishing is to protect individual privacy in released graph data while preserving data utility. Degree distribution, serving as fundamental operations for many graph analysis tasks, is a crucial data utility. Yet, existing methods using differential privacy (DP) cannot well preserve degree distribution, since they distill a graph into a set of structural statistics (e.g. dK-series, etc.) that only captures local degree correlations, and require massive noise added to mask the change of a single edge. Recently Generative Adversarial Network for graphs (NetGAN) plays a key role in machine learning, due to its ability to capture the local and global degree distribution of the graph via biased random walks. Further, it allows us to move the burden of privacy-preserving to the learning procedure of its discriminator, rather than the extracted structure features. Inspired by this, we propose Priv-GAN, a private publishing model based on NetGAN. Instead of distilling and then publishing graphs, we publish the Priv-GAN model that is trained using the original data in a DP manner. With Priv-GAN, data holders are able to produce synthetic graph data with degree distribution preservation. Compared to alternative solutions, ours highlights that (i) a private Langevin with gradient estimate is designed as an optimizer for discriminator, which provides a theoretical gradient upper bound and achieves DP by adding noise to the gradients; and (ii) importantly, the error bound of the noisy Langevin method is theoretically analyzed, which demonstrates that with appropriate parameter settings, Priv-GAN is able to maintain high utility guarantees. Experimental results confirm our theoretical findings and the efficacy of Priv-GAN.

Introduction

With the rapid development of information networks (e.g., citation networks and communication networks), a large volume of network data has been generated, which enables a wide spectrum of data analysis tasks. Network data is typically represented as graphs, where nodes represent a set of individuals, and edges represent connections between them. In this paper, the term network is henceforth called graph. It has been shown that, with naively sanitized graph data, an adversary is able to launch different types of privacy attacks that re-identify nodes, reveal edges between nodes (Hay et al., 2011). Therefore, graph data needs to be sanitized with formal, provable privacy guarantees before it can be released to the public.

Differential privacy (DP) (Dwork et al., 2006) is a widely accepted privacy model, which ensures that the output of the process undergoes sufficient perturbation to mask the presence or absence of any individual in the input, while offering both provable properties on the results and practical algorithms to provide them. Graph data publishing under DP, which implements by distilling an input graph into structural statistics (e.g. dK-series, etc.), adding noise to the extracted structure features, and then generating a synthetic graph using the perturbed structure features, has been extensively studied in the past few years (Sala et al., 2011), (Wang and Wu, 2013), (Xiao et al., 2014). One of the fundamental issues when sanitizing graph data is to avoid disclosure of individuals’ sensitive information while still permitting certain analysis on the graph. Degree distribution is an important data utility as it serves as fundamental operations for many graph analysis tasks, such as worm propagation, and protocol design, etc. In practice, degree distribution is nontrivially distorted, and the accompanying information loss is incurred by the following two aspects. First, it is difficult to capture local and global degree distribution information simultaneously. In particular, existing methods (Sala et al., 2011), (Wang and Wu, 2013) distill a graph into a set of dK-2 series (which is the degree distribution of connected components of size K=2 within a target graph). While this idea is appealing, generator algorithms have not yet been discovered for dK-series where K3, which results in Sala et al. (2011), Wang and Wu (2013) only capturing local degree information. Second, massive noise is injected into the exacted structure features. For example, the existing methods (Sala et al., 2011), (Wang and Wu, 2013) ensure DP on dK-series statistics. Yet, as graph data is highly correlated, this implies that a prohibitive amount of noise needs to be injected to mask the change of a single edge, which leads to poor overall data utility. After that, for alleviating the impact of a single, Xiao et al. (2014) encode a graph’s structure in terms of connection probabilities between nodes, which, unfortunately, still cannot achieve high data utility on degree distribution.

Recently, Generative Adversarial Network for graphs (NetGAN) (Bojchevski et al., 2018) has attracted much research interest in machine learning, due to its ability to learn both local and global degree correlations of a target (graph data) via biased random walks over a single graph, with two deep neural networks, namely the generator and the discriminator, in a minimax game, to generate high quality “fake” samples that are hard to be distinguished from real ones. Further, it allows us to move the burden of privacy-preserving to the learning procedure of its discriminator, which eliminates the impact of a single edge.

Motivated by this, we propose Priv-GAN, a private publishing model based on NetGAN for graph data. Unlike previous methods (Sala et al., 2011), (Wang and Wu, 2013), (Xiao et al., 2014) publishing a sanitized version of the original graph data, we aim to publish a deep generative model, which is trained utilizing the original data in a DP manner. With Priv-GAN, data holders, once equipped with this generative model, are able to produce synthetic graph data with degree distribution preservation. In contrast to every other solution, ours highlights two significant advantages. First, based on the stochastic gradient langevin dynamics (SGLD) (Raginsky et al., 2017) with the ability to escape local minima, a private Langevin with gradient estimate is designed as an optimizer for discriminator, which provides a theoretical gradient upper bound for the private Langevin and achieves DP by adding noise to the gradients according to this bound. It is worth pointing that since GAN and its variants involve a minimax problem formulation, deriving the gradient of such problem is often a prohibitively complex analytic calculation. For this, the gradient estimate (Duchi et al., 2015), which only uses the function values in optimization process, is employed to simplify such calculation. Second, importantly, the error bound of the noisy Langevin method is theoretically analyzed, which demonstrates that with appropriate parameter settings, Priv-GAN can maintain high utility guarantees. Briefly speaking, we make the following contributions:

  • We present Priv-GAN, a private publishing model based on NetGAN, which can preserve high data utility on degree distribution while obeying (ε,δ)-DP.

  • A private Langevin with gradient estimate is designed as an optimizer for discriminator, which provides a theoretical gradient upper bound and achieves DP by adding noise to the gradients. In particular, the gradient estimate, which only uses the function values in optimization process, simplifies intricate gradient calculations when using GAN model and its variants.

  • The error bound of the noisy Langevin method is theoretically analyzed, which reveals that with appropriate parameter settings, the proposed Priv-GAN is able to maintain high utility guarantees.

  • Experimental results on real datasets show that Priv-GAN outperforms its most relevant counterparts in synthesising DP graphs that preserve the degree distribution.

The remainder of our work is organized as follows. We discuss related works in Section 2. Section 3 describes the preliminaries of our solution. The problem statement is introduced in Section 4. Our proposed solution is presented with detailed analysis in Section 5. The experimental results are reported in Section 6, followed by conclusion in Section 7.

Section snippets

Related work

In this section, we first review previous work on two directions, namely DP-based graph statistics publishing and DP-based graph publishing, and discuss how our work differs from existing work. Then, we briefly introduce the DP-based deep learning.

Preliminaries

In this section, we first review DP and GAN, and then introduce SGLD and Gaussian smoothing, in which we also explain why SGLD is employed.

Problem statement

Throughout this paper, we investigate the following setup. A graph database curator wishes to publish a sanitized graph G˜ that mimics an original graph G in terms of some important characteristics such as degree distribution while satisfying (ε,δ)-DP. To this end, we start from a state-of-the-art, NetGAN, which can capture both local and global degree correlations over a single graph via biased random walks. Our objective is to develop a DP NetGAN, Priv-GAN, by devising a private Langevin with

The proposed solution priv-GAN

In this section, we elaborate the design of Priv-GAN, a NetGAN-based DP graph publishing approach to protect the privacy of individuals in the released graph, and theoretically retain high data utility on degree distribution.

Experiments

The purpose of our experiments is to empirically validate the following claim: our Priv-GAN model outperforms preceding models in terms of the preservation of degree distribution, while remaining comparable in terms of the effective preservation of other essential graph structural properties.

Conclusion

In this paper, we have proposed Priv-GAN, a DP graph publishing model based on NetGAN. The underlying highlights lie in the following aspects: (i) a private Langevin with gradient estimate is designed as an optimizer for discriminator, which provides a theoretical gradient upper bound and achieves DP by adding noise to the gradients; and (ii) the error bound of the noisy Langevin method is theoretically analyzed, which demonstrates that with appropriate parameter settings, Priv-GAN is able to

CRediT authorship contribution statement

Sen Zhang: Conceptualization, Methodology, Software, Writing - original draft. Weiwei Ni: Conceptualization, Writing - review & editing, Supervision. Nan Fu: Conceptualization, Methodology, Software, Writing - original draft.

Declaration of Competing Interest

The authors declare that they have no conflicts of interest to this work. We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Acknowledgments

The authors would like to sincerely thank anonymous reviewers for their valuable and constructive suggestions and encouraging comments. The work was supported by the National Natural Science Foundation of China under grant 61772131. Besides, it was partially supported by the State Grid Corporation of China Project (5700-202018268A-0-0-00).

SEN ZHANG is currently pursuing the Ph.D. degree with the Complex Data Management Laboratory, School of Computer Science and Engineering, Southeast University, Nanjing, Jiangsu, China. His research interests include data mining, data privacy protection, big data technology, and complex data management.

References (45)

  • M. Lee et al.

    Improved recurrent generative adversarial networks with regularization techniques and a controllable framework

    Inf Sci (Ny)

    (2020)
  • Y. Wang et al.

    Differential privacy preserving spectral graph analysis

    Pacific-Asia Conference on Knowledge Discovery and Data Mining

    (2013)
  • M. Abadi et al.

    Deep learning with differential privacy

    Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security

    (2016)
  • G. Acs et al.

    Differentially private mixture of generative neural networks

    IEEE Trans. Knowl. Data Eng.

    (2018)
  • F. Ahmed et al.

    Publishing social network graph eigen-spectrum with privacy guarantees

    IEEE Trans. Network Sci. Eng.

    (2020)
  • R. Arora et al.

    On differentially private graph sparsification and applications

    Advances in Neural Information Processing Systems

    (2019)
  • D. Bakry et al.

    A simple proof of the poincaré inequality for a large class of probability measures

    Electronic Communications in Probability

    (2008)
  • A.S. Bandeira et al.

    Sharp nonasymptotic bounds on the norm of random matrices with independent entries

    The Annals of Probability

    (2016)
  • A. Bojchevski et al.

    Netgan: Generating graphs via random walks

    Proceedings of the 35th International Conference on Machine Learning

    (2018)
  • R. Chen et al.

    Correlated network data publication via differential privacy

    VLDBJ.

    (2014)
  • S. Chen et al.

    Recursive mechanism: towards node differential privacy and unrestricted joins

    Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

    (2013)
  • T.S. Chiang et al.

    Diffusion for global optimization in Rn

    SIAM J. Control Optim.

    (1987)
  • K. Cho et al.

    On the properties of neural machine translation: encoder-decoder approaches

    arXiv preprint arXiv:1409.1259

    (2014)
  • J.C. Duchi et al.

    Optimal rates for zero-order convex optimization: the power of two function evaluations

    IEEE Trans. Inf. Theory

    (2015)
  • C. Dwork et al.

    Calibrating noise to sensitivity in private data analysis

    TCC

    (2006)
  • C. Dwork et al.

    The algorithmic foundations of differential privacy

    Foundations and Trends in Theoretical Computer Science.

    (2014)
  • M. Eliáš et al.

    Differentially private release of synthetic graphs

    Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms

    (2020)
  • T. Gao et al.

    Preserving persistent homology in differentially private graph publications

    IEEE INFOCOM 2019-IEEE Conference on Computer Communications

    (2019)
  • I. Goodfellow et al.

    Generative adversarial nets

    Advances in neural information processing systems

    (2014)
  • A. Gupta et al.

    Iterative constructions and private data release

    Theory of cryptography conference

    (2012)
  • M. Hay et al.

    Accurate estimation of the degree distribution of private networks

    2009 Ninth IEEE International Conference on Data Mining

    (2009)
  • M. Hay et al.

    Privacy-aware data management in information networks

    Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

    (2011)
  • Cited by (17)

    • PGAN framework for synthesizing sensor data privately

      2022, Journal of Information Security and Applications
    View all citing articles on Scopus

    SEN ZHANG is currently pursuing the Ph.D. degree with the Complex Data Management Laboratory, School of Computer Science and Engineering, Southeast University, Nanjing, Jiangsu, China. His research interests include data mining, data privacy protection, big data technology, and complex data management.

    WEIWEI NI received the B.E. and Ph.D. degrees in computer science from Southeast University, China, in 2001 and 2005, respectively. He is currently a Professor with Southeast University. His research interests include data mining, data provenance, data privacy protection, big data technology, and complex data management.

    NAN FU is currently pursuing the Ph.D. degree with the Complex Data Management Laboratory, School of Computer Science and Engineering, Southeast University, Nanjing, Jiangsu, China. His research interests include data mining and data privacy protection.

    View full text