ABSTRACT
You are on Facebook or you are out. Of course, this assessment is controversial and its rationale arguable. It is nevertheless not far, for many of us, from the reason behind our joining social media and publishing and sharing details of our professional and private lives. Not only the personal details we may reveal but also the very structure of the networks themselves are sources of invaluable information for any organization wanting to understand and learn about social groups, their dynamics and their members. These organizations may or may not be benevolent. It is therefore important to devise, design and evaluate solutions that guarantee some privacy. One approach that attempts to reconcile the different stakeholders' requirement is the publication of a modified graph. The perturbation is hoped to be sufficient to protect members' privacy while it maintains sufficient utility for analysts wanting to study the social media as a whole. It is necessarily a compromise. In this paper we try and empirically quantify the inevitable trade-off between utility and privacy. We do so for one state-of-the-art graph anonymization algorithm that protects against most structural attacks, the k-automorphism algorithm. We measure several metrics for a series of real graphs from various social media before and after their anonymization under various settings.
- A. Narayanan and V. Shmatikov. De-anonymizing social networks. In IEEE Symposium on Security and Privacy, 2009. Google ScholarDigital Library
- L. Backstrom, C. Dwork, and J. Kleinberg. Wherefore art thou R3579X?: Anonymized social networks, hidden patterns, and structural steganography. In WWW, 2007. Google ScholarDigital Library
- S. Bhagat, G. Cormode, B. Krishnamurthy, and D. Srivastava. Class-based graph anonymization for social network data. PVLDB, 2009. Google ScholarDigital Library
- S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the seventh international conference on WWW, 1998. Google ScholarDigital Library
- B. Zhou and J. Pei. Preserving privacy in social networks against neighborhood attacks. In Proceedings of the 24th IEEE International Conference on Data Engineering (ICDE'08), 2008. Google ScholarDigital Library
- J. Cheng, A. W.-C. Fu, and J. Liu. K-isomorphism: Privacy preserving network publication against structural attacks. In SIGMOD, 2010. Google ScholarDigital Library
- F. R. K. Chung. Spectral Graph Theory. American Mathematical Society, 1997.Google Scholar
- G. Cormode, D. Srivastava, T. Yu, and Q. Zhang. Anonymizing bipartite graph data using safe groupings. In VLDB 2010. Google ScholarDigital Library
- L. Danon, A. Diaz-Guilera, F. Giralt, and A. Arenas. Self-similar community structure in a network of human interactions. Physical Review E, 68, 2003.Google Scholar
- K. Dimitrios. Greek construction firms formation and topological analysis of a collaboration network. International Research Journal of Finance and Economics, 2010.Google Scholar
- Email-URV. http://deim.urv.cat/aarenas/data/welcome.htm.Google Scholar
- M. Hay, G. Miklau, D. Jensen, D. Towsley, and P. Weis. Resisting structural re-identification in anonymized social networks. PVLDB, 2008. Google ScholarDigital Library
- R. Kumar, J. Novak, and A. Tomkins. Structure and evolution of online social networks. KDD '06. ACM. Google ScholarDigital Library
- M. Kuramochi and G. Karypis. Finding frequent patterns in a large sparse graph. 2003.Google Scholar
- J. Leskovec, D. Huttenlocher, and J. Kleinberg. Predicting positive and negative links in online social networks. In Proceedings of the 19th international conference on World Wide Web, 2010. Google ScholarDigital Library
- K. Liu and E. Terzi. Towards identity anonymization on graphs. In SIGMOD, 2008. Google ScholarDigital Library
- P. Massa and P. Avesani. Trust Metrics in Recommender Systems. 2009.Google Scholar
- M. Hay, G. Miklau, D. Jensen, P. Weis, and S. Srivastava. Anonymizing social networks. Technical report, Computer Science Department, University of Massachusetts Amherst, 2007.Google Scholar
- A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and Analysis of Online Social Networks. In IMC, 2007. Google ScholarDigital Library
- MPI. http://socialnetworks.mpi-sws.org/.Google Scholar
- Networkx. http://networkx.lanl.gov/.Google Scholar
- Y. Rubner, C. Tomasi, and L. J. Guibas. A metric for distributions with applications to image databases. 1998.Google Scholar
- SNAP. http://snap.stanford.edu/data.Google Scholar
- TrustLet. http://www.trustlet.org/.Google Scholar
- B. Viswanath, A. Mislove, M. Cha, and K. P. Gummadi. On the evolution of user interaction in facebook. In WOSN. Google ScholarDigital Library
- W. Wu, Y. Xiao, W. Wang, Z. He, and Z. Wang. k-symmetry model for identity anonymization in social networks. In EDBT, 2010. Google ScholarDigital Library
- X. Ying and X. Wu. Randomizing social networks: a spectrum perserving approach. In SDM, 2008.Google ScholarCross Ref
- L. Zhang and W. Zhang. Edge anonymity in social network graphs.Google Scholar
- L. Zou, L. Chen, and M. T. Özsu. k-automorphism: a general framework for privacy preserving network publication. Proc. VLDB Endow. Google ScholarDigital Library
Index Terms
On the privacy and utility of anonymized social networks
Recommendations
On the privacy of anonymized networks
KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data miningThe proliferation of online social networks, and the concomitant accumulation of user data, give rise to hotly debated issues of privacy, security, and control. One specific challenge is the sharing or public release of anonymized data without ...
Injecting utility into anonymized datasets
SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of dataLimiting disclosure in data publishing requires a careful balance between privacy and utility. Information about individuals must not be revealed, but a dataset should still be useful for studying the characteristics of a population. Privacy ...
Can the Utility of Anonymized Data be Used for Privacy Breaches?
Group based anonymization is the most widely studied approach for privacy-preserving data publishing. Privacy models/definitions using group based anonymization includes k-anonymity, l-diversity, and t-closeness, to name a few. The goal of this article ...
Comments