ABSTRACT
Online social network providers have become treasure troves of information for marketers and researchers. To profit from their data while honoring the privacy of their customers, social networking services share `anonymized' social network datasets, where, for example, identities of users are removed from the social network graph. However, by using external information such as a reference social graph (from the same network or another network with similar users), researchers have shown how such datasets can be de-anonymized. These approaches use `network alignment' techniques to map nodes from the reference graph into the anonymized graph and are often sensitive to larger network sizes, the number of seeds, and noise --- which may be added to preserve privacy. We propose a divide-and-conquer approach to strengthen the power of such algorithms. Our approach partitions the networks into `communities' and performs a two-stage mapping: first at the community level, and then for the entire network. Through extensive simulation on real-world social network datasets, we show how such community-aware network alignment improves de-anonymization performance under high levels of noise, large network sizes, and a low number of seeds. Even when nodes cannot be explicitly mapped, the community structure can be mapped between both networks, thus reducing the anonymity of users. For example, for our (real-world) Twitter dataset with 90,000 nodes, 20% noise, and 16 seeds, the state-of-the-art technique reduces anonymity by 0 bits, whereas our approach reduces anonymity by 9.71 bits (with 40% of nodes mapped).
- Infomap clustering tool. http://www.tp.umu.se/~rosvall/code.html.Google Scholar
- A. Abou-Rjeili and G. Karypis. Multilevel algorithms for partitioning power-law graphs. In Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International, pages 10--pp. IEEE, 2006. Google ScholarDigital Library
- A. Acquisti and R. Gross. Predicting social security numbers from public data. PNAS, 106(27):10975--10980, 2009.Google ScholarCross Ref
- Y.-Y. Ahn, J. P. Bagrow, and S. Lehmann. Link communities reveal multiscale complexity in networks. Nature, 466:761--764, 2010.Google ScholarCross Ref
- L. Backstrom, C. Dwork, and J. Kleinberg. Wherefore art thou r3579x?: Anonymized social networks, hidden patterns, and structural steganography. In Proceedings of the 16th International Conference on World Wide Web, WWW '07, pages 181--190, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- M. Balduzzi, C. Platzer, T. Holz, E. Kirda, D. Balzarotti, and C. Kruegel. Abusing social networks for automated user profiling. In Recent Advances in Intrusion Detection, pages 422--441. Springer, 2010. Google ScholarDigital Library
- M. Bayati, M. Gerritsen, D. F. Gleich, A. Saberi, and Y. Wang. Algorithms for large, sparse network alignment problems. In Data Mining, 2009. ICDM'09. Ninth IEEE International Conference on, pages 705--710. IEEE, 2009. Google ScholarDigital Library
- O. Berthold, A. Pfitzmann, and R. Standtke. The disadvantages of free mix routes and how to overcome them. In Designing Privacy Enhancing Technologies, pages 30--45. Springer, 2001. Google ScholarDigital Library
- A. Campan and T. Truta. Data and structural k-anonymity in social networks. Privacy, Security, and Trust in KDD, pages 33--54, 2009. Google ScholarDigital Library
- D. Chaum. The dining cryptographers problem: Unconditional sender and recipient untraceability. Journal of cryptology, 1(1):65--75, 1988. Google ScholarCross Ref
- G. Cormode, D. Srivastava, T. Yu, and Q. Zhang. Anonymizing bipartite graph data using safe groupings. Proceedings of the VLDB Endowment, 1(1):833--844, 2008. Google ScholarDigital Library
- C. Diaz, S. Seys, J. Claessens, and B. Preneel. Towards measuring anonymity. In Privacy Enhancing Technologies, pages 54--68. Springer, 2003. Google ScholarDigital Library
- C. Diaz, C. Troncoso, and A. Serjantov. On the impact of social network profiling on anonymity. In Privacy Enhancing Technologies, pages 44--62. Springer, 2008. Google ScholarDigital Library
- Facebook First Quarter 2013 Results, May 1, 2013. http://investor.fb.com/releasedetail.cfm?ReleaseID=761090.Google Scholar
- S. Fortunato. Community detection in graphs. Physics Reports, 486(3):75--174, 2010.Google ScholarCross Ref
- M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY, USA, 1979. Google ScholarDigital Library
- M. Girvan and M. Newman. Community structure in social and biological networks. PNAS, 99(12):7821--7826, 2002.Google ScholarCross Ref
- Google+ Bigger than Twitter with 359 Million Active Users (IGN Report), May 3, 2013.small http://www.ign.com/articles/2013/05/02/report-google-bigger-than-twitter-with-359-million-active-users.Google Scholar
- V. Griffith and M. Jakobsson. Messin' with texas deriving mother's maiden names using public records. In Applied Cryptography and Network Security, pages 91--103. Springer, 2005. Google ScholarDigital Library
- R. Gross and A. Acquisti. Information revelation and privacy in online social networks. In Proceedings of the 2005 ACM Workshop on Privacy in the Electronic Society, WPES '05, pages 71--80, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
- R. Heatherly, M. Kantarcioglu, and B. Thuraisingham. Preventing private information inference attacks on social networks. 2009.Google Scholar
- W. Hu, Y. Qu, and G. Cheng. Matching large ontologies: A divide-and-conquer approach. Data & Knowledge Engineering, 67(1):140--160, 2008. Google ScholarDigital Library
- T. Iofciu, P. Fankhauser, F. Abel, and K. Bischoff. Identifying users across social tagging systems. In Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media (ICWSM'11), Barcelona, Spain, July 2011.Google Scholar
- B. W. Kernighan and S. Lin. An Efficient Heuristic Procedure for Partitioning Graphs. The Bell system technical journal, 49(1):291--307, 1970.Google Scholar
- D. Kesdogan, J. Egner, and R. Büschkes. Stop-and-go-mixes providing probabilistic anonymity in an open system. In Information Hiding, pages 83--98. Springer, 1998.Google ScholarCross Ref
- G. W. Klau. A new graph-based method for pairwise global network alignment. BMC bioinformatics, 10(Suppl 1):S59, 2009.Google ScholarCross Ref
- J. Kleinberg. Anonymized social networks, hidden patterns, and privacy breaches. In International Workshop and Conference on Network Science (NetSci07), May 2007.Google Scholar
- G. Kollias, S. Mohammadi, and A. Grama. Network similarity decomposition (nsd): A fast and scalable approach to network alignment. IEEE Trans. on Knowl. and Data Eng., 24(12):2232--2243, Dec. 2012. Google ScholarDigital Library
- M. Korayem and D. J. Crandall. De-anonymizing users across heterogeneous social computing platforms. In ICWSM, 2013.Google Scholar
- O. Kuchaiev, T. Milenković, V. Memisević, W. Hayes, and N. Pr\vzulj. Topological network alignment uncovers biological function and phylogeny. Journal of the Royal Society Interface, 7(50):1341--1354, 2010.Google ScholarCross Ref
- A. Lancichinetti and S. Fortunato. Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. Physical Review E, 80(1):016118, 2009.Google ScholarCross Ref
- A. Lancichinetti and S. Fortunato. Community detection algorithms: a comparative analysis. Physical Review E, 80(5):056117, 2009.Google ScholarCross Ref
- D. Lazer, A. Pentland, L. Adamic, S. Aral, A.-L. Barabási, D. Brewer, N. Christakis, N. Contractor, J. Fowler, M. Gutmann, T. Jebara, G. King, M. Macy, D. Roy, and M. Van~Alstyne. Computational social science. Science, 323(5915):721--723, 2009.Google ScholarCross Ref
- P. Mittal, C. Papamanthou, and D. Song. Preserving link privacy in social network based systems. 2013.Google Scholar
- A. Narayanan, E. Shi, and B. I. P. Rubinstein. Link prediction by de-anonymization: How we won the kaggle social network challenge. In IJCNN, pages 1825--1834. IEEE, 2011.Google ScholarCross Ref
- A. Narayanan and V. Shmatikov. Robust de-anonymization of large sparse datasets. In Proceedings of the 2008 IEEE Symposium on Security and Privacy, SP'08, pages 111--125, 2008. Google ScholarDigital Library
- A. Narayanan and V. Shmatikov. De-anonymizing social networks. In Proceedings of the 2009 30th IEEE Symposium on Security and Privacy, SP'09, pages 173--187. IEEE Computer Society, 2009. Google ScholarDigital Library
- M. E. Newman. The structure of scientific collaboration networks. PNAS, 98(2):404--409, 2001.Google ScholarCross Ref
- S. Nilizadeh, N. Alam, N. Husted, and A. Kapadia. Pythia: A privacy aware, peer-to-peer network for social search. In Proceedings of the 10th Annual ACM Workshop on Privacy in the Electronic Society, WPES '11. ACM, 2011. Google ScholarDigital Library
- G. Palla, I. Derényi, I. Farkas, and T. Vicsek. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435(7043):814--818, June 2005.Google ScholarCross Ref
- A. Pfitzmann and M. Köhntopp. Anonymity, unobservability, and pseudonymity -- a proposal for terminology. In Designing privacy enhancing technologies, pages 1--9. Springer, 2001. Google ScholarCross Ref
- A. Pothen. Graph partitioning algorithms with applications to scientific computing. In Parallel Numerical Algorithms, pages 323--368. Springer, 1997.Google ScholarCross Ref
- D. Rosenblum. What anyone can know: The privacy risks of social networking sites. Security & Privacy, IEEE, 5(3):40--49, 2007. Google ScholarDigital Library
- M. Rosvall and C. Bergstrom. Mapping change in large networks. PLOS One, 5(1):e8694, 2010.Google ScholarCross Ref
- M. Rosvall and C. T. Bergstrom. Maps of random walks on complex networks reveal community structure. PNAS, 105(4):1118--1123, 2008.Google ScholarCross Ref
- K. Schloegel, G. Karypis, and V. Kumar. Graph partitioning for high performance scientific simulations. Army High Performance Computing Research Center, 2000.Google Scholar
- A. Serjantov and G. Danezis. Towards an information theoretic metric for anonymity. In Privacy Enhancing Technologies, pages 41--53. Springer, 2003. Google ScholarDigital Library
- S. Sharma, P. Gupta, and V. Bhatnagar. Anonymisation in social network: A literature survey and classification. International Journal of Social Network Mining, 1(1):51--66, 2012.Google ScholarCross Ref
- R. Singh, J. Xu, and B. Berger. Pairwise global alignment of protein interaction networks by matching neighborhood topology. In Research in computational molecular biology, pages 16--31. Springer, 2007. Google ScholarDigital Library
- R. Singh, J. Xu, and B. Berger. Global alignment of multiple protein interaction networks with application to functional orthology detection. Proceedings of the National Academy of Sciences, 105(35):12763--12768, 2008.Google ScholarCross Ref
- B. Tripathy and G. Panda. A new approach to manage security against neighborhood attacks in social networks. In Advances in Social Networks Analysis and Mining (ASONAM), 2010 International Conference on, pages 264--269. IEEE, 2010. Google ScholarDigital Library
- L. Weng, F. Menczer, and Y.-Y. Ahn. Virality Prediction and Community Structure in Social Networks. Scientific Reports, 3, Aug. 2013.Google ScholarCross Ref
- G. Wondracek, T. Holz, E. Kirda, and C. Kruegel. A practical attack to de-anonymize social network users. In Proceedings of the 2010 IEEE Symposium on Security and Privacy, SP'10, pages 223--238. IEEE Computer Society, 2010. Google ScholarDigital Library
- X. Ying and X. Wu. Randomizing social networks: a spectrum preserving approach. In SDM, volume~8, pages 739--750. SIAM, 2008.Google Scholar
- X. Ying and X. Wu. On link privacy in randomizing social networks. Knowledge and information systems, 28(3):645--663, 2011. Google ScholarDigital Library
- E. Zheleva and L. Getoor. Preserving the privacy of sensitive relationships in graph data. In Privacy, security, and trust in KDD, pages 153--171. Springer, 2008. Google ScholarDigital Library
- E. Zheleva and L. Getoor. To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles. In Proceedings of the 18th international conference on World wide web, pages 531--540. ACM, 2009. Google ScholarDigital Library
- B. Zhou and J. Pei. The k-anonymity and l-diversity approaches for privacy preservation in social networks against neighborhood attacks. Knowledge and Information Systems, 28(1):47--77, 2011. Google ScholarDigital Library
- B. Zhou, J. Pei, and W. Luk. A brief survey on anonymization techniques for privacy preserving publishing of social network data. ACM SIGKDD Explorations Newsletter, 10(2):12--22, 2008. Google ScholarDigital Library
- L. Zou, L. Chen, and M. T. Özsu. K-automorphism: A general framework for privacy preserving network publication. Proceedings of the VLDB Endowment, 2(1):946--957, 2009. Google ScholarDigital Library
Index Terms
- Community-Enhanced De-anonymization of Online Social Networks
Recommendations
Community detection for emerging social networks
Many famous online social networks, e.g., Facebook and Twitter, have achieved great success in the last several years. Users in these online social networks can establish various connections via both social links and shared attribute information. ...
Privacy preserving social graphs for high precision community detection
SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of DataDiscovering communities from a social network requires publishing the social network's data. However, community detection from raw data of a social network may reveal many sensitive information of the involved parties, e.g., how much a user is involved ...
Comments