Skip to main content
Log in

On the anonymizability of graphs

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Many applications such as social networks, recommendation systems, email communication patterns, and other collaborative applications are built on top of graph infrastructures. The data stored on such networks may contain personal information about individuals and may therefore be sensitive from a privacy point of view. Therefore, a natural solution is to remove identifying information from the nodes and perturb the graph structure, so that re-identification becomes difficult. Typical graphs encountered in real applications are sparse. In this paper, we will show that sparse graphs have certain theoretical properties which make them susceptible to re-identification attacks. We design a systematic way to exploit these theoretical properties in order to construct re-identification signatures, which are also known as characteristic vectors. These signatures have the property that they are extremely robust to perturbations, especially for sparse graphs. We use these signatures in order to create an effective attack algorithm. We supplement our theoretical results with experimental tests using a number of algorithms on real data sets. These results confirm that even low levels of anonymization require perturbation levels which are significant enough to result in a massive loss of utility. Our experimental results also show that the true anonymization level of graphs is much lower than is implied by measures such as \(k\)-anonymity. Thus, the results of this paper establish that the problem of graph anonymization has fundamental theoretical barriers which prevent a fully effective solution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. In the case of many social networking sites, the links are typically symmetric and therefore form an undirected network. In the case of a directed network, it is assumed that both inlinks and outlinks are known.

  2. Available at http://www.cs.helsinki.fi/u/tsaparas/MACN2006.

  3. Available at http://www.cs.helsinki.fi/u/tsaparas/MACN2006.

  4. Available at http://www.sommer.jp/graphs.

References

  1. Aggarwal CC, Li Y, Yu P (2011) On the hardness of graph anonymization, ICDM conference

  2. Aggarwal CC, Yu PS (2008) Privacy-preserving data mining: models and algorithms, Springer, Berlin

    Book  Google Scholar 

  3. Agrawal R, Srikant R (2000) Privacy-preserving data mining. ACM SIGMOD conference

  4. Agrawal D, Aggarwal CC (2001) On the design and quantification of privacy preserving data mining algorithms. ACM PODS conference

  5. Ahuja R, Orlin J, Magnanti T (1992) Network flows: theory, algorithms, and applications. Prentice Hall, Englewood Cliffs, NJ, USA

  6. Backstrom L, Dwork C, Kleinberg J (2007)Wherefore Art Thou R3579X? Anonymized social networks, hidden patterns, and structural steganography. WWW conference

  7. Bayardo RJ, Agrawal R (2005) Data privacy through optimal k-anonymization. ICDE conference

  8. Cormode G, Srivastava D, Yu T, Zhang Q (2008) Anonymizing bipartite graph data using safe groupings. VLDB conference

  9. Evfimievski A, Srikant R, Agrawal R, Gehrke J (2002) Privacy preserving mining of association rules. ACM KDD conference

  10. Fung B, Wang K, Yu PS (2007) Anonymizing classification data for privacy preservation. IEEE TKDE, pp 711–725

  11. Garey M, Johnson D (1979) Computers and intractability: a guide to the theory of NP-completeness, Freeman

  12. Hay M, Miklau G, Jensen D, Towsley D, Weis P (2008) Resisting structural re-identification in social networks, VLDB conference

  13. Hay M, Miklau G, Jensen D, Weis P, Srivastava S (2007) Anonymizing social networks. Technical report 07–19. University of Massachusetts, Amherst

    Google Scholar 

  14. Kifer D, Gehrke J (2006) Injecting utility into anonymized datasets. SIGMOD conference, pp 217–228

  15. LeFevre K, DeWitt D, Ramakrishnan R (2006) Mondrian multidimensional k-anonymity. ICDE conference

  16. Liu K, Terzi E (2008) Towards identity anonymization on graphs. ACM SIGMOD conference

  17. Machanavajjhala A A, Gehrke J, Kifer D, Venkitasubramaniam M (2006) l-Diversity: privacy beyond k-anonymity. ICDE conference

  18. Samarati P (2001) Protecting respondents identities in microdata release. IEEE TKDE 13(6):1010–1027

    Google Scholar 

  19. Tassa T, Cohen D (2013) Anonymization of centralized and distributed social networks by sequential clustering. IEEE TKDE 25:311–324

    Google Scholar 

  20. Vuokko N, Terzi E (2010) Reconstructing randomized social networks, SDM Conf.,

  21. Wu L, Ying X, Wu X (2010) Reconstruction from randomized graph via low rank approximation, SDM Conf.,

  22. Ying X, Wu X (2008) Randomizing social networks: a spectrum preserving approach. SDM conference

  23. Ying X, Pan K, Wu X, Guo L (2009) Comparisons of randomization and \(k\)-degree anonymization schemes for privacy-preserving social network publishing. ACM KDD Conference

  24. Zhou B, Pei J (2008) Preserving privacy in social networks against neighborhood attacks. ICDE conference

Download references

Acknowledgments

This work is supported in part by NSF through grants CNS-1115234 and OISE-1129076, and US Department of Army through grant W911NF-12-1-0066.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Charu C. Aggarwal.

Additional information

This paper is an extended version of the ICDM 2011 paper in [1].

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aggarwal, C.C., Li, Y. & Yu, P.S. On the anonymizability of graphs. Knowl Inf Syst 45, 571–588 (2015). https://doi.org/10.1007/s10115-014-0788-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-014-0788-1

Keywords

Navigation