ABSTRACT
How can web services that depend on user generated content discern fraudulent input by spammers from legitimate input? In this paper we focus on the social network Facebook and the problem of discerning ill-gotten Page Likes, made by spammers hoping to turn a profit, from legitimate Page Likes. Our method, which we refer to as CopyCatch, detects lockstep Page Like patterns on Facebook by analyzing only the social graph between users and Pages and the times at which the edges in the graph (the Likes) were created. We offer the following contributions: (1) We give a novel problem formulation, with a simple concrete definition of suspicious behavior in terms of graph structure and edge constraints. (2) We offer two algorithms to find such suspicious lockstep behavior - one provably-convergent iterative algorithm and one approximate, scalable MapReduce implementation. (3) We show that our method severely limits "greedy attacks" and analyze the bounds from the application of the Zarankiewicz problem to our setting. Finally, we demonstrate and discuss the effectiveness of CopyCatch at Facebook and on synthetic data, as well as potential extensions to anomaly detection problems in other domains. CopyCatch is actively in use at Facebook, searching for attacks on Facebook's social graph of over a billion users, many millions of Pages, and billions of Page Likes.
- Apache Hadoop. http://hadoop.apache.org/, 2012.Google Scholar
- L. Akoglu, M. Mcglohon, and C. Faloutsos. RTM: Laws and a recursive generator for weighted time-evolving graphs. In International Conference on Data Mining, December 2008. Google ScholarDigital Library
- A. Anagnostopoulos, A. Dasgupta, and R. Kumar. Approximation algorithms for co-clustering. In Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, PODS '08, pages 201--210, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- A. Banerjee, I. Dhillon, J. Ghosh, S. Merugu, and D. Modha. A generalized maximum entropy approach to bregman co-clustering and matrix approximation. The Journal of Machine Learning Research, 8:1919--1986, October 2007. Google ScholarDigital Library
- Y. Cheng. Mean shift, mode seeking, and clustering. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 17(8):790--799, aug 1995. Google ScholarDigital Library
- K. Crammer and G. Chechik. A needle in a haystack: local one-class optimization. In Proceedings of the twenty-first international conference on Machine learning, ICML '04, pages 26--, New York, NY, USA, 2004. ACM. Google ScholarDigital Library
- J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. OSDI'04, Dec. 2004. Google ScholarDigital Library
- I. S. Dhillon, S. Mallela, and D. S. Modha. Information-theoretic co-clustering. In Conference of the ACM Special Interest Group on Knowledge Discovery and Data Mining, New York, NY, 2003. ACM Press. Google ScholarDigital Library
- Facebook. Better Security through Software. blog.facebook.com/blog.php?post=248766257130. 2010.Google Scholar
- Facebook. Working Together to Keep You Secure. blog.facebook.com/blog.php?post=68886667130, 2009.Google Scholar
- Facebook. Staying in Control of Your Facebook Logins. blog.facebook.com/blog.php?post=389991097130, 2010.Google Scholar
- Facebook. Improvements to our Site Integrity Systems. facebook.com/10151005934870766, 2012.Google Scholar
- Z. Furedi. An upper bound on Zarankiewicz' Problem. Combinatorics, Probability and Computing, 5(01):29--33, 1996.Google ScholarCross Ref
- T. George and S. Merugu. A scalable collaborative filtering framework based on co-clustering. In Proceedings of the Fifth IEEE International Conference on Data Mining, ICDM '05, pages 625--628, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarDigital Library
- G. Gupta and J. Ghosh. Robust one-class clustering using hybrid global and local search. In Proceedings of the 22nd international conference on Machine learning, ICML '05, pages 273--280, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
- H.-P. Kriegel, P. Kroger, and A. Zimek. Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans. Knowl. Discov. Data, 3(1):1:1--1:58, Helen Martin 2009. Google ScholarDigital Library
- K. Maruhashi, F. Guo, and C. Faloutsos. Multiaspectforensics: Pattern mining on large-scale heterogeneous networks with tensor analysis. In Proceedings of the Third International Conference on Advances in Social Network Analysis and Mining, 2011. Google ScholarDigital Library
- S. Pandit, D. Chau, S. Wang, and C. Faloutsos. Netprobe: a fast and scalable system for fraud detection in online auction networks. In Proceedings of the 16th international conference on World Wide Web, pages 201--210. ACM, 2007. Google ScholarDigital Library
- S. Papadimitriou and J. Sun. Disco: Distributed co-clustering with map-reduce: A case study towards petabyte-scale end-to-end mining. In Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on, pages 512--521, dec. 2008. Google ScholarDigital Library
- E. Papalexakis, A. Beutel, and P. Steenkiste. Network anomaly detection using co-clustering. In 2012 International Conference on Advances in Social Network Analysis and Mining, ASONAM 2012, 2012. Google ScholarDigital Library
- E. Papalexakis and N. Sidiropoulos. Co-clustering as multilinear decomposition with sparse latent factors. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, pages 2064--2067. IEEE, 2011.Google ScholarCross Ref
- R. Peeters. The maximum edge biclique problem is np-complete. Discrete Appl. Math., 131(3):651--654, Sept. 2003. Google ScholarDigital Library
- J. Pei, D. Jiang, and A. Zhang. On mining cross-graph quasi-cliques. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, KDD '05, pages 228--238, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
- B. A. Prakash, M. Seshadri, A. Sridharan, S. Machiraju, and C. Faloutsos. Eigenspokes: Surprising patterns and scalable community chipping in large graphs. PAKDD 2010, 21-24 June 2010. Google ScholarDigital Library
- T. Stein, E. Chen, and K. Mangla. Facebook immune system. In Proceedings of the 4th Workshop on Social Network Systems, SNS '11, pages 8:1--8:8, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- A. Thusoo, Z. Shao, S. Anthony, D. Borthakur, N. Jain, J. Sen Sarma, R. Murthy, and H. Liu. Data warehousing and analytics infrastructure at facebook. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, SIGMOD '10, pages 1013--1020, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- K. Zarankiewicz. Problem p 101. In Colloq. Math, volume 2, page 301, 1951.Google Scholar
Index Terms
- CopyCatch: stopping group attacks by spotting lockstep behavior in social networks
Recommendations
A one-class classification approach for bot detection on Twitter
Highlights- A review of the methods used for Twitter Bot Detection.
- A comparison of binary ...
AbstractTwitter is a popular online social network with hundreds of millions of users, where n important part of the accounts in this social network are not humans. Approximately 48 million Twitter accounts are managed by automated programs ...
Big graph mining for the web and social media: algorithms, anomaly detection, and applications
WSDM '14: Proceedings of the 7th ACM international conference on Web search and data miningGraphs are everywhere: social networks, computer net- works, mobile call networks, the World Wide Web, protein interaction networks, and many more. The lower cost of disk storage, the success of social networking websites and Web 2.0 applications, and ...
Big Social Network Mining for "Following" Patterns
C3S2E '15: Proceedings of the Eighth International C* Conference on Computer Science & Software EngineeringMany social networking sites such as Facebook and Twitter have been used for sharing knowledge and information among social entities. Social entities in these social networks are often linked by some interdependency such as friendship or "following" ...
Comments