Skip to main content
Log in

Correlated network data publication via differential privacy

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

With the increasing prevalence of information networks, research on privacy-preserving network data publishing has received substantial attention recently. There are two streams of relevant research, targeting different privacy requirements. A large body of existing works focus on preventing node re-identification against adversaries with structural background knowledge, while some other studies aim to thwart edge disclosure. In general, the line of research on preventing edge disclosure is less fruitful, largely due to lack of a formal privacy model. The recent emergence of differential privacy has shown great promise for rigorous prevention of edge disclosure. Yet recent research indicates that differential privacy is vulnerable to data correlation, which hinders its application to network data that may be inherently correlated. In this paper, we show that differential privacy could be tuned to provide provable privacy guarantees even in the correlated setting by introducing an extra parameter, which measures the extent of correlation. We subsequently provide a holistic solution for non-interactive network data publication. First, we generate a private vertex labeling for a given network dataset to make the corresponding adjacency matrix form dense clusters. Next, we adaptively identify dense regions of the adjacency matrix by a data-dependent partitioning process. Finally, we reconstruct a noisy adjacency matrix by a novel use of the exponential mechanism. To our best knowledge, this is the first work providing a practical solution for publishing real-life network data via differential privacy. Extensive experiments demonstrate that our approach performs well on different types of real-life network datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

Notes

  1. The strong attacker mentioned in [24] cannot be prevented as his prior knowledge (without accessing any database) has allowed him to succeed in a privacy attack. Such a privacy attack is not caused by data sharing and therefore beyond the scope of this paper.

  2. A random vertex labeling satisfies 0-differential privacy because it can be viewed as an application of the exponential mechanism with privacy parameter 0.

  3. If \(v_i\) and \(v_j\) are selected to swap with each other, only one query will be affected. This leads to a lower global sensitivity.

  4. For a region \(R\) not containing elements on the diagonal, \(GS(f)=1\) because a single edge difference cannot change two elements in this region.

  5. ca-GrQc, ca-HepTh and wiki-Vote are publicly available in the Stanford large network dataset collection (http://snap.stanford.edu/data/index.html). STM is provided by the Société de transport de Montréal (http://www.stm.info).

  6. Laplace can barely provide any useful information in terms of degree distribution because its KL-divergence is almost the same as an empty graph (i.e., \(|E| = 0\)).

  7. Step 1 means that sampling is not employed.

References

  1. Backstrom, L., Dwork, C., Kleinberg, J.: Wherefore art thou r3579x? anonymized social networks, hidden patterns, and structural steganography. In: Proceedings of the 16th International Conference on World Wide Web (WWW), pp. 181–190 (2007)

  2. Bearman, P.S., Moody, J., Stovel, K.: Chains of affection: the structure of adolescent romantic and sexual networks. Am. J. Sociol. 110(1), 44–91 (2004)

    Article  Google Scholar 

  3. Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)

    Article  MATH  MathSciNet  Google Scholar 

  4. Bhagat, S., Cormode, G., Krishnamurthy, B., Srivastava, D.: Class-based graph anonymization for social network data. Proc. VLDB Endow. 2(1), 766–777 (2009)

    Article  Google Scholar 

  5. Chen, R., Mohammed, N., Fung, B.C.M., Desai, B.C., Xiong, L.: Publishing set-valued data via differential privacy. Proc. VLDB Endow. 4(11), 1087–1098 (2011)

    Google Scholar 

  6. Cheng, J., Fu, A.W.C., Liu, J.: K-isomorphism: privacy preserving network publication against structural attacks. In: Proceedings of the 36th ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 459–470 (2010)

  7. Cormen, T.H., Stein, C., Rivest, R.L., Leiserson, C.E.: Introduction to Algorithms, 3rd edn. McGraw-Hill Higher Education, New York (2009)

    MATH  Google Scholar 

  8. Cormode, G., Srivastava, D., Yu, T., Zhang, Q.: Anonymizing bipartite graph data using safe groupings. Proc. VLDB Endow. 1(1), 833–844 (2008)

    Article  Google Scholar 

  9. Cormode, G., Procopiuc, M., Shen, E., Srivastava, D., Yu, T.: Differentially private spatial decompositions. In: Proceedings of the 27th IEEE International Conference on Data Engineering (ICDE), pp. 20–31 (2012)

  10. Cuthill, E., McKee, J.: Reducing the bandwidth of sparse symmetric matrices. In: Proceedings of the 1969 24th National Conference, pp. 157–172 (1969)

  11. Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Proceedings of the 3rd Theory of Cryptography Conference (TCC), pp. 265–284 (2006)

  12. Faloutsos, M., Faloutsos, P., Faloutsos, C.: On power-law relationships of the internet topology. In: Proceedings of the 23rd ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM), pp. 251–262 (1999)

  13. Finkel, R.A., Bentley, J.L.: Quad trees: a data structure for retrieval on composite keys. Acta Informatica 4(1), 1–9 (1974)

    Article  MATH  Google Scholar 

  14. Fortunato, S.: Community Detection in Graphs. CoRR abs/0906.0612 (2009)

  15. Gupta, A., Ligett, K., McSherry, F., Roth, A., Talwar, K.: Differentially private combinatorial optimization. In: Proceedings of the 21st ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 1106–1125 (2010)

  16. Gupta, A., Roth, A., Ullman, J.: Iterative constructions and private data release. In: Proceedings of the 9th Theory of Cryptography Conference (TCC), pp. 339–356 (2012)

  17. Hay, M., Miklau, G., Jensen, D., Towsley, D.F., Weis, P.: Resisting structural re-identification in anonymized social networks. Proc. VLDB Endow. 1(1), 102–114 (2008)

    Article  Google Scholar 

  18. Hay, M., Li, C., Miklau, G., Jensen, D.: Accurate estimation of the degree distribution of private networks. In: Proceedings of the 9th IEEE International Conference on Data Mining (ICDM), pp. 169–178 (2009)

  19. Hay, M., Rastogi, V., Miklau, G., Suciu, D.: Boosting the accuracy of differentially private histograms through consistency. Proc. VLDB Endow. 3(1), 1021–1032 (2010)

    Article  Google Scholar 

  20. Hay, M., Liu, K., Miklau, G., Pei, J., Terzi, E.: Privacy-aware data management in information networks. In: Proceedings of the 37th ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 1201–1204 (2011)

  21. Kamel, I., Faloutsos, C.: Hilbert R-tree: an improved R-tree using fractals. In: Proceedings of the 20th International Conference on Very Large Data Bases (VLDB), pp. 500–509 (1994)

  22. Karwa, V., Raskhodnikova, S., Smith, A., Yaroslavtsev, G.: Private analysis of graph structure. Proc VLDB Endow. 4(11), 1146–1157 (2011)

    Google Scholar 

  23. Kifer, D., Gehrke, J.: Injecting utility into anonymized datasets. In: Proceedings of the 32nd ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 217–228 (2006)

  24. Kifer, D., Machanavajjhala, A.: No free lunch in data privacy. In: Proceedings of the 37th ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 193–204 (2011)

  25. Kossinets, G., Watts, D.: Empirical analysis of an evolving social networks. Science 311(5757), 88–90 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  26. Liu, K., Terzi, E.: Towards identity anonymization on graphs. In: Proceedings of the 34th ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 93–106 (2008)

  27. Liu, L., Wang, J., Liu, J., Zhang, J.: Privacy preservation in social networks with sensitive edge weights. In: Proceedings of the 9th SIAM International Conference on Data Mining (SDM), pp. 954–965 (2009)

  28. McSherry, F.: Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: Proceedings of the 35th ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 19–30 (2009)

  29. McSherry, F., Talwar, K.: Mechanism design via differential privacy. In: Proceedings of the 48th IEEE Symposium on Foundations of Computer Science (FOCS), pp. 94–103 (2007)

  30. Mohammed, N., Chen, R., Fung, B.C.M., Yu, P.S.: Differentially private data release for data mining. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 493–501 (2011)

  31. Proserpio, D., Goldberg, S., McSherry, F.: A workflow for differentially-private graph synthesis. In: Proceedings of the 2012 ACM Workshop on Online Social Networks (WOSN), pp. 13–18 (2012)

  32. Sala, A., Zhao, X., Wilson, C., Zheng, H., Zhao, B.Y.: Sharing graphs using differentially private graph models. In: Proceedings of the 11th ACM SIGCOMM Conference on Internet Measurement (IMC), pp. 81–98 (2011)

  33. Schaeffer, S.E.: Graph clustering. Comput. Sci. Rev. 1(1), 27–64 (2007)

    Article  MathSciNet  Google Scholar 

  34. Wu, L., Ying, X., Wu, X.: Reconstruction from randomized graph via low rank approximation. In: Proceedings of the 10th SIAM International Conference on Data Mining (SDM), pp. 60–71 (2010)

  35. Xiao, X., Wang, G., Gehrke, J.: Differential privacy via wavelet transforms. In: Proceedings of the 26th IEEE International Conference on Data Engineering (ICDE), pp. 225–236 (2010)

  36. Xiao, X., Bender, G., Hay, M., Gehrke, J.: iReduct: differential privacy with reduced relative errors. In: Proceedings of the 37th ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 229–240 (2011)

  37. Ying, X., Wu, X.: Randomizing social networks: a spectrum preserving approach. In: Proceedings of the 8th SIAM International Conference on Data Mining (SDM), pp. 739–750 (2008)

  38. Yuan, M., Chen, L., Yu, P.S.: Personalized privacy protection in social networks. Proc. VLDB Endow. 4(2), 141–150 (2011)

    Article  Google Scholar 

  39. Zhou, B., Pei, J.: Preserving privacy in social networks against neighborhood attacks. In: Proceedings of the 24th IEEE International Conference on Data Engineering (ICDE), pp. 506–515 (2008)

  40. Zou, L., Chen, L., Ozsu, M.T.: K-automorphism: a general framework for privacy preserving network publication. Proc. VLDB Endow. 2(1), 946–957 (2009)

    Article  Google Scholar 

Download references

Acknowledgments

We sincerely thank the reviewers for their insightful comments. We thank James Cheng, Ada Wai-Chee Fu and Jia Liu for providing the source code of \(k\)-isomorphism. The research is supported in part by NSERC through Discovery Grants (356065-2013), US NSF through grants CNS-1115234, DBI-0960443 and OISE-1129076 and US Department of Army through grant W911NF-12-1-0066.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rui Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, R., Fung, B.C.M., Yu, P.S. et al. Correlated network data publication via differential privacy. The VLDB Journal 23, 653–676 (2014). https://doi.org/10.1007/s00778-013-0344-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-013-0344-8

Keywords

Navigation