ABSTRACT
Information networks, such as social media and email networks, often contain sensitive information. Releasing such network data could seriously jeopardize individual privacy. Therefore, we need to sanitize network data before the release. In this paper, we present a novel data sanitization solution that infers a network's structure in a differentially private manner. We observe that, by estimating the connection probabilities between vertices instead of considering the observed edges directly, the noise scale enforced by differential privacy can be greatly reduced. Our proposed method infers the network structure by using a statistical hierarchical random graph (HRG) model. The guarantee of differential privacy is achieved by sampling possible HRG structures in the model space via Markov chain Monte Carlo (MCMC). We theoretically prove that the sensitivity of such inference is only O(log n), where n is the number of vertices in a network. This bound implies less noise to be injected than those of existing works. We experimentally evaluate our approach on four real-life network datasets and show that our solution effectively preserves essential network structural properties like degree distribution, shortest path length distribution and influential nodes.
Supplemental Material
- J. Blocki, A. Blum, A. Datta, and O. Sheffet. Differentially private data analysis of social networks via restricted sensitivity. In ITCS, 2013. Google ScholarDigital Library
- S. Brooks, A. Gelman, G. L. Jones, and X.-L. Meng. Handbook of Markov Chain Monte Carlo. Chapman & Hall/CRC, 2011.Google ScholarCross Ref
- R. Chen, B. C. M. Fung, P. S. Yu, and B. C. Desai. Correlated network data publication via differential privacy. VLDBJ, in press. Google ScholarDigital Library
- S. Chen and S. Zhou. Recursive mechanism: towards node differential privacy and unrestricted joins. In SIGMOD, 2013. Google ScholarDigital Library
- J. Cheng, A. W. C. Fu, and J. Liu. K-isomorphism: privacy preserving network publication against structural attacks. In SIGMOD, 2010. Google ScholarDigital Library
- A. Clauset, C. Moore, and M. E. J. Newman. Structural inference of hierarchies in networks. In ICML on Statistical Network Analysis, 2007. Google ScholarDigital Library
- A. Clauset, C. Moore, and M. E. J. Newman. Hierarchical structure and the prediction of missing links in networks. Nature, 453:98--101, 2008.Google ScholarCross Ref
- G. Cormode, D. Srivastava, T. Yu, and Q. Zhang. Anonymizing bipartite graph data using safe groupings. PVLDB, 1(1):833--844, 2008. Google ScholarDigital Library
- C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, 2006. Google ScholarDigital Library
- A. Gupta, A. Roth, and J. Ullman. Iterative constructions and private data release. In TCC, 2012. Google ScholarDigital Library
- M. Hardt and A. Roth. Beating randomized response on incoherent matrices. In STOC, 2012. Google ScholarDigital Library
- M. Hay, C. Li, G. Miklau, and D. Jensen. Accurate estimation of the degree distribution of private networks. In ICDM, 2009. Google ScholarDigital Library
- M. Hay, G. Miklau, D. Jensen, D. F. Towsley, and P. Weis. Resisting structural re-identification in anonymized social networks. PVLDB, 1(1):102--114, 2008. Google ScholarDigital Library
- V. Karwa, S. Raskhodnikova, A. Smith, and G. Yaroslavtsev. Private analysis of graph structure. PVLDB, 4(11):1146--1157, 2011.Google ScholarDigital Library
- S. P. Kasiviswanathan, K. Nissim, S. Raskhodnikova, and A. Smith. Analyzing graphs with node differential privacy. In TCC, 2013. Google ScholarDigital Library
- K. Liu and E. Terzi. Towards identity anonymization on graphs. In SIGMOD, 2008. Google ScholarDigital Library
- F. McSherry. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. Commun. ACM, 53(9), 2010. Google ScholarDigital Library
- F. McSherry and K. Talwar. Mechanism design via differential privacy. In FOCS, 2007. Google ScholarDigital Library
- E. Mossel and E. Vigoda. Phylogenetic MCMC algorithms are misleading on mixtures of trees. Science, 309(5744):2207--2209, 2005.Google ScholarCross Ref
- D. Proserpio, S. Goldberg, and F. McSherry. A work ow for differentially-private graph synthesis. In WOSN, 2012. Google ScholarDigital Library
- A. Sala, X. Zhao, C. Wilson, H. Zheng, and B. Y. Zhao. Sharing graphs using differentially private graph models. In IMC, 2011. Google ScholarDigital Library
- P. Samarati. Protecting respondents' identities in microdata release. TKDE, 13(6):1010--1027, 2001. Google ScholarDigital Library
- E. Shen and T. Yu. Mining frequent graph patterns with differential privacy. In SIGKDD, 2013. Google ScholarDigital Library
- Y. Wang and X. Wu. Preserving differential privacy in degree-correlation based graph generation. TDP, 6(2), 2013. Google ScholarDigital Library
- Y. Wang, X. Wu, and L. Wu. Differential privacy preserving spectral graph analysis. In PAKDD, 2013.Google ScholarCross Ref
- L. Wu, X. Ying, X. Wu, and Z.-H. Zhou. Line orthogonality in adjacency eigenspace with application to community partition. In IJCAI, 2011. Google ScholarDigital Library
- B. Zhou and J. Pei. Preserving privacy in social networks against neighborhood attacks. In ICDE, 2008. Google ScholarDigital Library
- L. Zou, L. Chen, and M. T. Ozsu. K-automorphism: a general framework for privacy preserving network publication. PVLDB, 2(1):946--957, 2009. Google ScholarDigital Library
Index Terms
- Differentially private network data release via structural inference
Recommendations
A differentially private algorithm for location data release
The rise of mobile technologies in recent years has led to large volumes of location information, which are valuable resources for knowledge discovery such as travel patterns mining and traffic analysis. However, location dataset has been confronted ...
Differentially private data release for data mining
KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data miningPrivacy-preserving data publishing addresses the problem of disclosing sensitive data when mining for useful information. Among the existing privacy models, ∈-differential privacy provides one of the strongest privacy guarantees and has no assumptions ...
Differentially Private Network Data Release via Stochastic Kronecker Graph
WISE 2016: Proceedings of the 17th International Conference on Web Information Systems Engineering - Volume 10042Excessive sensitivity problem due to complication of data has been a non-negligible challenge to data privacy protection under differential privacy recently. We design a private data release framework called DPDR-SKG Differentially Private Data Release ...
Comments