Skip to main content
Log in

K-anonymity for social networks containing rich structural and textual information

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

When social networks are released for analysis, individuals’ sensitive information (e.g., node identities) in the network may be exposed. To avoid unwanted information exposure, social networks need to be anonymized before they are published. In the literature, many approaches exist to anonymize social networks to prevent attacks by adversaries that know the network structures such as node degrees and neighbors. However, these techniques cannot prevent the leakage of valuable identification information during social network analysis if the social network graphs contain both structural and textual information. In this paper, we study the problem of anonymizing social networks to prevent individual identifications which use both structural (node degrees) and textual (edge labels) information in graphs. We formally define the problem as Structure and Text aware \(K\)-anonymity of social networks (STK-Anonymity). In an STK-anonymized network, each individual is \(ST\)-equivalent to at least \(K-1\) other nodes. The major challenge in achieving STK-Anonymity comes from the correlation of edge labels, which causes the propagation of edge anonymization. It has been shown that it is intractable to optimally \(K\)-anonymizing the label sequences of edge-labeled graphs. To address the challenge, we present a two-phase approach which consists of two heuristics in the first phase to process partial graph structures (node degrees in particular) and a set-enumeration tree-based approach in the second phase to anonymize edge labels. Results from extensive experiments on both real and synthetic datasets are presented to show the effectiveness and efficiency of our approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Similar content being viewed by others

Notes

  1. A table with missing values is an exception.

  2. Multiple edges are combined to one topological edge with multiple edge labels.

  3. In anonymizing microdata, the cell-based anonymization strategy (Meyerson and Williams 2004; Park and Shim 2007) uses a many to many mapping to perform the anonymization, i.e., the same label \(el\) may be generalized to multiple other labels. Such \(K\)-anonymization problem has shown to be NP-Hard even when the attribute values are ternary (Aggarwal et al. 2005). In this work, we do not consider the anonymization solutions with such fine granularities.

References

  • Aggarwal CC, Khan A, Yan X (2011) On flow authority discovery in social networks. In: Proceedings of SIAM international conference on data mining (SDM). SIAM/Omnipress, pp 522–533

  • Aggarwal G, Feder T, Kenthapadi K, Motwani R, Panigrahy R, Thomas D, Zhu A (2005) Anonymizing tables. In: ICDT, pp 246–258

  • Backstrom L, Dwork C, Kleinberg JM (2007) Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography. In: Proceedings of World Wide Web Conference (WWW), pp 181–190

  • Backstrom L, Huttenlocher DP, Kleinberg JM, Lan X (2006) Group formation in large social networks: membership, growth, and evolution. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining, pp 44–54

  • Bhagat S, Cormode G, Krishnamurthy B, Srivastava D (2010) Privacy in dynamic social networks. In: Proceedings of World Wide Web Conference (WWW), pp 1059–1060

  • Bonchi F, Gionis A, Tassa T (2011) Identity obfuscation in graphs through the information theoretic lens. In: Proceedings of IEEE International conference on data engineering (ICDE), pp 924–935

  • Campan A, Truta TM (2008) Data and structural k-anonymity in social networks. In: ACM International workshop on privacy, security, and trust in KDD (PinKDD), pp 33–54

  • Chakrabarti D, Zhan Y, Faloutsos C (2004) R-MAT: a recursive model for graph mining. In: Proceedings of SIAM International conference on data mining (SDM)

  • Chen C, Yan X, Zhu F, Han J, Yu PS (2008) Graph olap: towards online analytical processing on graphs. In: Proceedings of IEEE International conference on data mining (ICDM). IEEE Computer Society, pp 103–112

  • Cheng J, Fu AWC, Liu J (2010) K-isomorphism: privacy preserving network publication against structural attacks. In: Proceedings of ACM SIGMOD International conference on management of data, pp 459–470

  • Chester S, Kapron BM, Srivastava G, Venkatesh S (2013) Complexity of social network anonymization. Soc Netw Anal Min 3(2):151–166

    Article  Google Scholar 

  • Cormen TH, Leiserson CE, Rivest RL (2009) Introduction to Algorithms. The MIT Press, Massachusetts

    MATH  Google Scholar 

  • Cormode G, Srivastava D, Bhagat S, Krishnamurthy B (2009) Class-based graph anonymization for social network data. Proc VLDB Endow 2(1):766–777

    Article  Google Scholar 

  • Das S, Egecioglu Ö, Abbadi AE (2010) Anonymizing weighted social network graphs. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), pp 904–907

  • Das S, Egecioglu Ö, El Abbadi A (2012) Anónimos: an LP-based approach for anonymizing weighted social network graphs. IEEE Trans Knowl Data Eng 24(4):590–604

    Article  Google Scholar 

  • Fard AM, Wang K, Yu PS (2012) Limiting link disclosure in social network analysis through subgraph-wise perturbation. In: Proceedings of international conference on extending database technology (EDBT), pp 109–119

  • Han J, Yan X, Yu PS (2009) Scalable olap and mining of information networks. In: Proceedings of international conference n extending database technology (EDBT), p 1159

  • Hay M, Li C, Miklau G, Jensen D (2009) Accurate estimation of the degree distribution of private networks. In: Proceedings of IEEE international conference on data mining (ICDM), pp 169–178

  • Hay M, Miklau G, Jensen D, Towsley DF, Li C (2010) Resisting structural re-identification in anonymized social networks. VLDB J 19(6):797–823

    Article  Google Scholar 

  • Hay M, Miklau G, Jensen D, Towsley DF, Weis P (2008) Resisting structural re-identification in anonymized social networks. Proc VLDB Endow 1(1):102–114

    Article  Google Scholar 

  • Bayardo RJ, Agrawal R (2005) Data privacy through optimal k-anonymization. In: Proceedings of IEEE intlernational conference on data engineering (ICDE), pp 217–228

  • Kumar R, Novak J, Tomkins A (2006) Structure and evolution of online social networks. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining, pp 611–617

  • Lee Y-S (1995) Graphical demonstration of an optimality property of the median. Am Stat 49(4):369–372

    Google Scholar 

  • LeFevre K, DeWitt DJ, Ramakrishnan R (2005) Incognito: efficient full-domain k-anonymity. In: Proceedings of ACM SIGMOD intlernational conference on management of data, pp 49–60

  • Li N, Li T, Venkatasubramanian S (2007) t-closeness: privacy beyond k-anonymity and l-diversity. In: Proceedings of IEEE international conference on data engineering (ICDE), pp 106–115

  • Liu K, Terzi E (2008) Towards identity anonymization on graphs. In: Proceedings of ACM SIGMOD international conference on management of data, pp 93–106

  • Liu L, Wang J, Liu J, Zhang J (2009) Privacy preservation in social networks with sensitive edge weights. In: Proceedings of SIAM international conference on data mining (SDM), pp 954–965

  • Liu X, Yang X (2011) A generalization based approach for anonymizing weighted social network graphs. In: WAIM, pp 118–130

  • Lu X, Song Y, Bressan S (2012) Fast identity anonymization on graphs. In: Proceedings of international conference on database and expert systems applications (DEXA), pp 281–295

  • Machanavajjhala A, Gehrke J, Kifer D, Venkitasubramaniam M (2006) l-diversity: privacy beyond k-anonymity. In: Proceedings of IEEE international conference on data engineering (ICDE), p 24

  • McCallum A, Corrada-Emmanuel A, Wang X (2005) Topic and role discovery in social networks. In: International joint conference on artificial intelligence (IJCAI), pp 786–791

  • Medforth N, Wang K (2011) Privacy risk in graph stream publishing for social network data. In: Proceedings of IEEE international conference on data mining (ICDM), pp 437–446

  • Meyerson A, Williams R (2004) On the complexity of optimal k-anonymity. In: Proceedings of ACM symposium on principles of database systems (PODS), pp 223–228

  • Narayanan A, Shmatikov V (2009) De-anonymizing social networks. In: IEEE symposium on security and privacy, pp 173–187

  • Nobari S, Karras P, Pang H, Bressan S (2014) L-opacity: linkage-aware graph anonymization. In: Proceedings of international conference on extending database technology (EDBT), pp 583–594

  • Park H, Shim K (2007) Approximate algorithms for k-anonymity. In: Proceedings of ACM SIGMOD international conference on management of data, pp 67–78

  • Rymon R (1992) Search through systematic set enumeration. In: International conference on principles of knowledge representation and reasoning (KR), pp 539–550

  • Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13(6):1010–1027

    Article  Google Scholar 

  • Seary AJ, Richards WD (2000) Spectral methods for analyzing and visualizing networks: an introduction. In: Workshop summary and papers, pp 209–228

  • Song Y, Karras P, Xiao Q, Bressan S (2012) Sensitive label privacy protection on social network data. In: International conference on scientific and statistical database management (SSDBM), pp 562–571

  • Sweeney L (2002) k-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10(5):557–570

    Article  MathSciNet  MATH  Google Scholar 

  • Tai CH, Yu PS, Yang DN, Chen MS (2011) Privacy-preserving social network publication against friendship attacks. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining, pp 1262–1270

  • Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’ networks. Nature 393:440–442

    Article  Google Scholar 

  • Wu W, Xiao Y, Wang W, He Z, Wang Z (2010) K-symmetry model for identity anonymization in social networks. In: Proceedings of international conference on extending database technology (EDBT), pp 111–122

  • Xue M, Karras P, Raïssi C, Kalnis P, Pung HK (2012) In: CIKM Delineating social network data anonymization via random edge perturbation, pp 475–484

  • Ying X, Pan K, Wu X, Guo L (2009) Comparisons of randomization and k-degree anonymization schemes for privacy preserving social network publishing. In: Workshop on social network mining and analysis (SNA-KDD), p 10

  • Ying X, Wu X (2008) Randomizing social networks: a spectrum preserving approach. In: Proceedings of SIAM International Conference on Data Mining (SDM), pp 739–750

  • Yuan M, Chen L (2011) Node protection in weighted social networks. DASFAA 1:123–137

    Google Scholar 

  • Yuan M, Chen L, Yu PS (2010) Personalized privacy protection in social networks. Proc VLDB Endow 4(2):141–150

    Article  Google Scholar 

  • Zheleva E, Getoor L (2007) Preserving the privacy of sensitive relationships in graph data. In: ACM international workshop on privacy, security, and trust in KDD (PinKDD), pp 153–171

  • Zhou B, Pei J (2008) Preserving privacy in social networks against neighborhood attacks. In: Proceedings of IEEE international conference on data engineering (ICDE), pp 506–515

  • Zhou B, Pei J (2011) The k-anonymity and l-diversity approaches for privacy preservation in social networks against neighborhood attacks. Knowl Inf Syst 28(1):47–77

    Article  MathSciNet  Google Scholar 

  • Zou L, Chen L, Özsu MT (2009) K-automorphism: a general framework for privacy preserving network publication. Proc VLDB Endow 2(1):946–957

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huiping Cao.

Appendix

Appendix

This section contains more figures with the utility measures for datasets with different size: Fig. 23 (DBLP 1000), Fig. 24 (DBLP 2000), Fig. 25 (DBLP 4000), Fig. 26 (DBLP8000), Fig. 27 (DBLP 32000), Fig. 28 (Synthetic N1000), Fig. 29 (Synthetic N5000), Fig. 30 (Synthetic N10000, E20000), Fig. 31 (Synthetic N10000, E40000), Fig. 32 (Synthetic N10000, E50000), Fig. 33 (Synthetic N20000, E60000). The trend that we observed from these figures is the same to that in Sect. 6.1.2.

Fig. 23
figure 23

Utility measures: DBLP1000

Fig. 24
figure 24

Utility measures: DBLP2000

Fig. 25
figure 25

Utility measures: DBLP4000

Fig. 26
figure 26

Utility measures: DBLP8000

Fig. 27
figure 27

Utility measures: DBLP32000

Fig. 28
figure 28

Utility measures: Synthetic N1000

Fig. 29
figure 29

Utility measures: Synthetic N5000

Fig. 30
figure 30

Utility measures: Synthetic N10000 E20000

Fig. 31
figure 31

Utility measures: Synthetic N10000 E40000

Fig. 32
figure 32

Utility measures: Synthetic N10000 E50000

Fig. 33
figure 33

Utility measures: Synthetic N20000

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hao, Y., Cao, H., Hu, C. et al. K-anonymity for social networks containing rich structural and textual information. Soc. Netw. Anal. Min. 4, 223 (2014). https://doi.org/10.1007/s13278-014-0223-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-014-0223-3

Keywords

Navigation