Abstract
When social networks are released for analysis, individuals’ sensitive information (e.g., node identities) in the network may be exposed. To avoid unwanted information exposure, social networks need to be anonymized before they are published. In the literature, many approaches exist to anonymize social networks to prevent attacks by adversaries that know the network structures such as node degrees and neighbors. However, these techniques cannot prevent the leakage of valuable identification information during social network analysis if the social network graphs contain both structural and textual information. In this paper, we study the problem of anonymizing social networks to prevent individual identifications which use both structural (node degrees) and textual (edge labels) information in graphs. We formally define the problem as Structure and Text aware \(K\)-anonymity of social networks (STK-Anonymity). In an STK-anonymized network, each individual is \(ST\)-equivalent to at least \(K-1\) other nodes. The major challenge in achieving STK-Anonymity comes from the correlation of edge labels, which causes the propagation of edge anonymization. It has been shown that it is intractable to optimally \(K\)-anonymizing the label sequences of edge-labeled graphs. To address the challenge, we present a two-phase approach which consists of two heuristics in the first phase to process partial graph structures (node degrees in particular) and a set-enumeration tree-based approach in the second phase to anonymize edge labels. Results from extensive experiments on both real and synthetic datasets are presented to show the effectiveness and efficiency of our approaches.
Similar content being viewed by others
Notes
A table with missing values is an exception.
Multiple edges are combined to one topological edge with multiple edge labels.
In anonymizing microdata, the cell-based anonymization strategy (Meyerson and Williams 2004; Park and Shim 2007) uses a many to many mapping to perform the anonymization, i.e., the same label \(el\) may be generalized to multiple other labels. Such \(K\)-anonymization problem has shown to be NP-Hard even when the attribute values are ternary (Aggarwal et al. 2005). In this work, we do not consider the anonymization solutions with such fine granularities.
References
Aggarwal CC, Khan A, Yan X (2011) On flow authority discovery in social networks. In: Proceedings of SIAM international conference on data mining (SDM). SIAM/Omnipress, pp 522–533
Aggarwal G, Feder T, Kenthapadi K, Motwani R, Panigrahy R, Thomas D, Zhu A (2005) Anonymizing tables. In: ICDT, pp 246–258
Backstrom L, Dwork C, Kleinberg JM (2007) Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography. In: Proceedings of World Wide Web Conference (WWW), pp 181–190
Backstrom L, Huttenlocher DP, Kleinberg JM, Lan X (2006) Group formation in large social networks: membership, growth, and evolution. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining, pp 44–54
Bhagat S, Cormode G, Krishnamurthy B, Srivastava D (2010) Privacy in dynamic social networks. In: Proceedings of World Wide Web Conference (WWW), pp 1059–1060
Bonchi F, Gionis A, Tassa T (2011) Identity obfuscation in graphs through the information theoretic lens. In: Proceedings of IEEE International conference on data engineering (ICDE), pp 924–935
Campan A, Truta TM (2008) Data and structural k-anonymity in social networks. In: ACM International workshop on privacy, security, and trust in KDD (PinKDD), pp 33–54
Chakrabarti D, Zhan Y, Faloutsos C (2004) R-MAT: a recursive model for graph mining. In: Proceedings of SIAM International conference on data mining (SDM)
Chen C, Yan X, Zhu F, Han J, Yu PS (2008) Graph olap: towards online analytical processing on graphs. In: Proceedings of IEEE International conference on data mining (ICDM). IEEE Computer Society, pp 103–112
Cheng J, Fu AWC, Liu J (2010) K-isomorphism: privacy preserving network publication against structural attacks. In: Proceedings of ACM SIGMOD International conference on management of data, pp 459–470
Chester S, Kapron BM, Srivastava G, Venkatesh S (2013) Complexity of social network anonymization. Soc Netw Anal Min 3(2):151–166
Cormen TH, Leiserson CE, Rivest RL (2009) Introduction to Algorithms. The MIT Press, Massachusetts
Cormode G, Srivastava D, Bhagat S, Krishnamurthy B (2009) Class-based graph anonymization for social network data. Proc VLDB Endow 2(1):766–777
Das S, Egecioglu Ö, Abbadi AE (2010) Anonymizing weighted social network graphs. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), pp 904–907
Das S, Egecioglu Ö, El Abbadi A (2012) Anónimos: an LP-based approach for anonymizing weighted social network graphs. IEEE Trans Knowl Data Eng 24(4):590–604
Fard AM, Wang K, Yu PS (2012) Limiting link disclosure in social network analysis through subgraph-wise perturbation. In: Proceedings of international conference on extending database technology (EDBT), pp 109–119
Han J, Yan X, Yu PS (2009) Scalable olap and mining of information networks. In: Proceedings of international conference n extending database technology (EDBT), p 1159
Hay M, Li C, Miklau G, Jensen D (2009) Accurate estimation of the degree distribution of private networks. In: Proceedings of IEEE international conference on data mining (ICDM), pp 169–178
Hay M, Miklau G, Jensen D, Towsley DF, Li C (2010) Resisting structural re-identification in anonymized social networks. VLDB J 19(6):797–823
Hay M, Miklau G, Jensen D, Towsley DF, Weis P (2008) Resisting structural re-identification in anonymized social networks. Proc VLDB Endow 1(1):102–114
Bayardo RJ, Agrawal R (2005) Data privacy through optimal k-anonymization. In: Proceedings of IEEE intlernational conference on data engineering (ICDE), pp 217–228
Kumar R, Novak J, Tomkins A (2006) Structure and evolution of online social networks. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining, pp 611–617
Lee Y-S (1995) Graphical demonstration of an optimality property of the median. Am Stat 49(4):369–372
LeFevre K, DeWitt DJ, Ramakrishnan R (2005) Incognito: efficient full-domain k-anonymity. In: Proceedings of ACM SIGMOD intlernational conference on management of data, pp 49–60
Li N, Li T, Venkatasubramanian S (2007) t-closeness: privacy beyond k-anonymity and l-diversity. In: Proceedings of IEEE international conference on data engineering (ICDE), pp 106–115
Liu K, Terzi E (2008) Towards identity anonymization on graphs. In: Proceedings of ACM SIGMOD international conference on management of data, pp 93–106
Liu L, Wang J, Liu J, Zhang J (2009) Privacy preservation in social networks with sensitive edge weights. In: Proceedings of SIAM international conference on data mining (SDM), pp 954–965
Liu X, Yang X (2011) A generalization based approach for anonymizing weighted social network graphs. In: WAIM, pp 118–130
Lu X, Song Y, Bressan S (2012) Fast identity anonymization on graphs. In: Proceedings of international conference on database and expert systems applications (DEXA), pp 281–295
Machanavajjhala A, Gehrke J, Kifer D, Venkitasubramaniam M (2006) l-diversity: privacy beyond k-anonymity. In: Proceedings of IEEE international conference on data engineering (ICDE), p 24
McCallum A, Corrada-Emmanuel A, Wang X (2005) Topic and role discovery in social networks. In: International joint conference on artificial intelligence (IJCAI), pp 786–791
Medforth N, Wang K (2011) Privacy risk in graph stream publishing for social network data. In: Proceedings of IEEE international conference on data mining (ICDM), pp 437–446
Meyerson A, Williams R (2004) On the complexity of optimal k-anonymity. In: Proceedings of ACM symposium on principles of database systems (PODS), pp 223–228
Narayanan A, Shmatikov V (2009) De-anonymizing social networks. In: IEEE symposium on security and privacy, pp 173–187
Nobari S, Karras P, Pang H, Bressan S (2014) L-opacity: linkage-aware graph anonymization. In: Proceedings of international conference on extending database technology (EDBT), pp 583–594
Park H, Shim K (2007) Approximate algorithms for k-anonymity. In: Proceedings of ACM SIGMOD international conference on management of data, pp 67–78
Rymon R (1992) Search through systematic set enumeration. In: International conference on principles of knowledge representation and reasoning (KR), pp 539–550
Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13(6):1010–1027
Seary AJ, Richards WD (2000) Spectral methods for analyzing and visualizing networks: an introduction. In: Workshop summary and papers, pp 209–228
Song Y, Karras P, Xiao Q, Bressan S (2012) Sensitive label privacy protection on social network data. In: International conference on scientific and statistical database management (SSDBM), pp 562–571
Sweeney L (2002) k-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10(5):557–570
Tai CH, Yu PS, Yang DN, Chen MS (2011) Privacy-preserving social network publication against friendship attacks. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining, pp 1262–1270
Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’ networks. Nature 393:440–442
Wu W, Xiao Y, Wang W, He Z, Wang Z (2010) K-symmetry model for identity anonymization in social networks. In: Proceedings of international conference on extending database technology (EDBT), pp 111–122
Xue M, Karras P, Raïssi C, Kalnis P, Pung HK (2012) In: CIKM Delineating social network data anonymization via random edge perturbation, pp 475–484
Ying X, Pan K, Wu X, Guo L (2009) Comparisons of randomization and k-degree anonymization schemes for privacy preserving social network publishing. In: Workshop on social network mining and analysis (SNA-KDD), p 10
Ying X, Wu X (2008) Randomizing social networks: a spectrum preserving approach. In: Proceedings of SIAM International Conference on Data Mining (SDM), pp 739–750
Yuan M, Chen L (2011) Node protection in weighted social networks. DASFAA 1:123–137
Yuan M, Chen L, Yu PS (2010) Personalized privacy protection in social networks. Proc VLDB Endow 4(2):141–150
Zheleva E, Getoor L (2007) Preserving the privacy of sensitive relationships in graph data. In: ACM international workshop on privacy, security, and trust in KDD (PinKDD), pp 153–171
Zhou B, Pei J (2008) Preserving privacy in social networks against neighborhood attacks. In: Proceedings of IEEE international conference on data engineering (ICDE), pp 506–515
Zhou B, Pei J (2011) The k-anonymity and l-diversity approaches for privacy preservation in social networks against neighborhood attacks. Knowl Inf Syst 28(1):47–77
Zou L, Chen L, Özsu MT (2009) K-automorphism: a general framework for privacy preserving network publication. Proc VLDB Endow 2(1):946–957
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
This section contains more figures with the utility measures for datasets with different size: Fig. 23 (DBLP 1000), Fig. 24 (DBLP 2000), Fig. 25 (DBLP 4000), Fig. 26 (DBLP8000), Fig. 27 (DBLP 32000), Fig. 28 (Synthetic N1000), Fig. 29 (Synthetic N5000), Fig. 30 (Synthetic N10000, E20000), Fig. 31 (Synthetic N10000, E40000), Fig. 32 (Synthetic N10000, E50000), Fig. 33 (Synthetic N20000, E60000). The trend that we observed from these figures is the same to that in Sect. 6.1.2.
Rights and permissions
About this article
Cite this article
Hao, Y., Cao, H., Hu, C. et al. K-anonymity for social networks containing rich structural and textual information. Soc. Netw. Anal. Min. 4, 223 (2014). https://doi.org/10.1007/s13278-014-0223-3
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-014-0223-3