research-article

CopyCatch: stopping group attacks by spotting lockstep behavior in social networks

Authors:

Alex Beutel,

Wanhong Xu,

Venkatesan Guruswami,

Christopher Palow,

Christos FaloutsosAuthors Info & Claims

WWW '13: Proceedings of the 22nd international conference on World Wide Web

Pages 119 - 130

https://doi.org/10.1145/2488388.2488400

Published: 13 May 2013 Publication History

Get Access

Abstract

How can web services that depend on user generated content discern fraudulent input by spammers from legitimate input? In this paper we focus on the social network Facebook and the problem of discerning ill-gotten Page Likes, made by spammers hoping to turn a profit, from legitimate Page Likes. Our method, which we refer to as CopyCatch, detects lockstep Page Like patterns on Facebook by analyzing only the social graph between users and Pages and the times at which the edges in the graph (the Likes) were created. We offer the following contributions: (1) We give a novel problem formulation, with a simple concrete definition of suspicious behavior in terms of graph structure and edge constraints. (2) We offer two algorithms to find such suspicious lockstep behavior - one provably-convergent iterative algorithm and one approximate, scalable MapReduce implementation. (3) We show that our method severely limits "greedy attacks" and analyze the bounds from the application of the Zarankiewicz problem to our setting. Finally, we demonstrate and discuss the effectiveness of CopyCatch at Facebook and on synthetic data, as well as potential extensions to anomaly detection problems in other domains. CopyCatch is actively in use at Facebook, searching for attacks on Facebook's social graph of over a billion users, many millions of Pages, and billions of Page Likes.

References

[1]

Apache Hadoop. http://hadoop.apache.org/, 2012.

Google Scholar

[2]

L. Akoglu, M. Mcglohon, and C. Faloutsos. RTM: Laws and a recursive generator for weighted time-evolving graphs. In International Conference on Data Mining, December 2008.

Digital Library

Google Scholar

[3]

A. Anagnostopoulos, A. Dasgupta, and R. Kumar. Approximation algorithms for co-clustering. In Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, PODS '08, pages 201--210, New York, NY, USA, 2008. ACM.

Digital Library

Google Scholar

[4]

A. Banerjee, I. Dhillon, J. Ghosh, S. Merugu, and D. Modha. A generalized maximum entropy approach to bregman co-clustering and matrix approximation. The Journal of Machine Learning Research, 8:1919--1986, October 2007.

Digital Library

Google Scholar

[5]

Y. Cheng. Mean shift, mode seeking, and clustering. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 17(8):790--799, aug 1995.

Digital Library

Google Scholar

[6]

K. Crammer and G. Chechik. A needle in a haystack: local one-class optimization. In Proceedings of the twenty-first international conference on Machine learning, ICML '04, pages 26--, New York, NY, USA, 2004. ACM.

Digital Library

Google Scholar

[7]

J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. OSDI'04, Dec. 2004.

Digital Library

Google Scholar

[8]

I. S. Dhillon, S. Mallela, and D. S. Modha. Information-theoretic co-clustering. In Conference of the ACM Special Interest Group on Knowledge Discovery and Data Mining, New York, NY, 2003. ACM Press.

Digital Library

Google Scholar

[9]

Facebook. Better Security through Software. blog.facebook.com/blog.php?post=248766257130. 2010.

Google Scholar

[10]

Facebook. Working Together to Keep You Secure. blog.facebook.com/blog.php?post=68886667130, 2009.

Google Scholar

[11]

Facebook. Staying in Control of Your Facebook Logins. blog.facebook.com/blog.php?post=389991097130, 2010.

Google Scholar

[12]

Facebook. Improvements to our Site Integrity Systems. facebook.com/10151005934870766, 2012.

Google Scholar

[13]

Z. Furedi. An upper bound on Zarankiewicz' Problem. Combinatorics, Probability and Computing, 5(01):29--33, 1996.

Crossref

Google Scholar

[14]

T. George and S. Merugu. A scalable collaborative filtering framework based on co-clustering. In Proceedings of the Fifth IEEE International Conference on Data Mining, ICDM '05, pages 625--628, Washington, DC, USA, 2005. IEEE Computer Society.

Digital Library

Google Scholar

[15]

G. Gupta and J. Ghosh. Robust one-class clustering using hybrid global and local search. In Proceedings of the 22nd international conference on Machine learning, ICML '05, pages 273--280, New York, NY, USA, 2005. ACM.

Digital Library

Google Scholar

[16]

H.-P. Kriegel, P. Kroger, and A. Zimek. Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans. Knowl. Discov. Data, 3(1):1:1--1:58, Helen Martin 2009.

Digital Library

Google Scholar

[17]

K. Maruhashi, F. Guo, and C. Faloutsos. Multiaspectforensics: Pattern mining on large-scale heterogeneous networks with tensor analysis. In Proceedings of the Third International Conference on Advances in Social Network Analysis and Mining, 2011.

Digital Library

Google Scholar

[18]

S. Pandit, D. Chau, S. Wang, and C. Faloutsos. Netprobe: a fast and scalable system for fraud detection in online auction networks. In Proceedings of the 16th international conference on World Wide Web, pages 201--210. ACM, 2007.

Digital Library

Google Scholar

[19]

S. Papadimitriou and J. Sun. Disco: Distributed co-clustering with map-reduce: A case study towards petabyte-scale end-to-end mining. In Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on, pages 512--521, dec. 2008.

Digital Library

Google Scholar

[20]

E. Papalexakis, A. Beutel, and P. Steenkiste. Network anomaly detection using co-clustering. In 2012 International Conference on Advances in Social Network Analysis and Mining, ASONAM 2012, 2012.

Digital Library

Google Scholar

[21]

E. Papalexakis and N. Sidiropoulos. Co-clustering as multilinear decomposition with sparse latent factors. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, pages 2064--2067. IEEE, 2011.

Crossref

Google Scholar

[22]

R. Peeters. The maximum edge biclique problem is np-complete. Discrete Appl. Math., 131(3):651--654, Sept. 2003.

Digital Library

Google Scholar

[23]

J. Pei, D. Jiang, and A. Zhang. On mining cross-graph quasi-cliques. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, KDD '05, pages 228--238, New York, NY, USA, 2005. ACM.

Digital Library

Google Scholar

[24]

B. A. Prakash, M. Seshadri, A. Sridharan, S. Machiraju, and C. Faloutsos. Eigenspokes: Surprising patterns and scalable community chipping in large graphs. PAKDD 2010, 21-24 June 2010.

Digital Library

Google Scholar

[25]

T. Stein, E. Chen, and K. Mangla. Facebook immune system. In Proceedings of the 4th Workshop on Social Network Systems, SNS '11, pages 8:1--8:8, New York, NY, USA, 2011. ACM.

Digital Library

Google Scholar

[26]

A. Thusoo, Z. Shao, S. Anthony, D. Borthakur, N. Jain, J. Sen Sarma, R. Murthy, and H. Liu. Data warehousing and analytics infrastructure at facebook. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, SIGMOD '10, pages 1013--1020, New York, NY, USA, 2010. ACM.

Digital Library

Google Scholar

[27]

K. Zarankiewicz. Problem p 101. In Colloq. Math, volume 2, page 301, 1951.

Google Scholar

Cited By

View all

Chen YJiang JSun SHe BChen M(2024)RUSH: Real-Time Burst Subgraph Detection in Dynamic GraphsProceedings of the VLDB Endowment10.14778/3681954.368202817:11(3657-3665)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.14778/3681954.3682028
Wu YSun RWang XWen DZhang YQin LLin X(2024)Efficient Maximal Frequent Group Enumeration in Temporal Bipartite GraphsProceedings of the VLDB Endowment10.14778/3681954.368199717:11(3243-3255)Online publication date: 30-Aug-2024
https://doi.org/10.14778/3681954.3681997
Zhang YLi RZhang QQin HWang G(2024)Efficient Algorithms for Density Decomposition on Large Static and Dynamic GraphsProceedings of the VLDB Endowment10.14778/3681954.368197417:11(2933-2945)Online publication date: 30-Aug-2024
https://doi.org/10.14778/3681954.3681974
Show More Cited By

Index Terms

CopyCatch: stopping group attacks by spotting lockstep behavior in social networks
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis

Recommendations

A one-class classification approach for bot detection on Twitter
Highlights
- A review of the methods used for Twitter Bot Detection.
- A comparison of binary ...
Abstract
Twitter is a popular online social network with hundreds of millions of users, where n important part of the accounts in this social network are not humans. Approximately 48 million Twitter accounts are managed by automated programs ...
Big graph mining for the web and social media: algorithms, anomaly detection, and applications
WSDM '14: Proceedings of the 7th ACM international conference on Web search and data mining

Graphs are everywhere: social networks, computer net- works, mobile call networks, the World Wide Web, protein interaction networks, and many more. The lower cost of disk storage, the success of social networking websites and Web 2.0 applications, and ...
Big Social Network Mining for "Following" Patterns
C3S2E '15: Proceedings of the Eighth International C* Conference on Computer Science & Software Engineering

Many social networking sites such as Facebook and Twitter have been used for sharing knowledge and information among social entities. Social entities in these social networks are often linked by some interdependency such as friendship or "following" ...

Reviews

Reviewer: Jose F Rodrigues

In Web 2.0, the content is no longer static, but rather dynamically user generated. In this universe, the more interaction a product page or user profile gets, the greater the potential profits an individual or company may achieve with advertisements. Hence, fraudulent behavior has come up in some means of Web 2.0 interaction. Artificial comments, evaluations, recommendations, and likes define artificial interest that may illegitimately portray the importance of online competitors. This paper looks at one specific kind of fraud: illegitimate likes in the Facebook social network. CopyCatch is a method that identifies artificial behavior between users and pages in Facebook. It has been designed to find what the authors call lockstep behavior, which occurs when groups of users act together, generally liking the same pages at around the same time. CopyCatch was designed for the immense scales observed nowadays, the ones produced by Facebook; to cope with this data, it works in parallel, following Hadoop's MapReduce framework. As the problem is comparable to the subspace clustering problem, it is arguable that it is NP-hard; therefore, CopyCatch does not try to find the exact solution, but only a good one, as expected in such problems. CopyCatch uses two heuristics to achieve a good solution: the set of users is defined to maximize the sum of likes for a given set of pages, and the set of liked pages must never shrink. The algorithm is iterative, converging when these two sets present no changes. CopyCatch is an interesting massive processing algorithm that has proved its value in the challenging environment of Facebook (cosponsor of the research). It is also a good example of what is possible when the data scales too high: parallel optimization. The paper excludes some details of the process due to confidentiality, but nothing that would prevent its reproduction. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

WWW '13: Proceedings of the 22nd international conference on World Wide Web

May 2013

1628 pages

ISBN:9781450320351

DOI:10.1145/2488388

General Chairs:
Daniel Schwabe
PUC-Rio - Brazil
,
Virgílio Almeida
UFMG - Brazil
,
Hartmut Glaser
CGI.br - Brazil
,
Program Chairs:
Ricardo Baeza-Yates
Yahoo! Labs - Spain & Chile
,
Sue Moon
KAIST - South Korea

In-Cooperation

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WWW '13

Sponsor:

NICBR
CGIBR

WWW '13: 22nd International World Wide Web Conference

May 13 - 17, 2013

Rio de Janeiro, Brazil

Acceptance Rates

WWW '13 Paper Acceptance Rate 125 of 831 submissions, 15%;

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

217
Total Citations
View Citations
984
Total Downloads

Downloads (Last 12 months)65
Downloads (Last 6 weeks)17

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Chen YJiang JSun SHe BChen M(2024)RUSH: Real-Time Burst Subgraph Detection in Dynamic GraphsProceedings of the VLDB Endowment10.14778/3681954.368202817:11(3657-3665)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.14778/3681954.3682028
Wu YSun RWang XWen DZhang YQin LLin X(2024)Efficient Maximal Frequent Group Enumeration in Temporal Bipartite GraphsProceedings of the VLDB Endowment10.14778/3681954.368199717:11(3243-3255)Online publication date: 30-Aug-2024
https://doi.org/10.14778/3681954.3681997
Zhang YLi RZhang QQin HWang G(2024)Efficient Algorithms for Density Decomposition on Large Static and Dynamic GraphsProceedings of the VLDB Endowment10.14778/3681954.368197417:11(2933-2945)Online publication date: 30-Aug-2024
https://doi.org/10.14778/3681954.3681974
Xu ZLiu MTang ZWang Y(2024)Unveiling iOS Scamwares through Crowdturfing ReviewsProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661178(399-404)Online publication date: 18-Jun-2024
https://dl.acm.org/doi/10.1145/3661167.3661178
Dai QLi RCui DLiao MQiu YWang G(2024)Efficient Maximal Biplex Enumerations with Improved Worst-Case Time GuaranteeProceedings of the ACM on Management of Data10.1145/36549382:3(1-26)Online publication date: 30-May-2024
https://doi.org/10.1145/3654938
Khodabandehlou SGolpayegani A(2024)FiFrauD: Unsupervised Financial Fraud Detection in Dynamic Graph StreamsACM Transactions on Knowledge Discovery from Data10.1145/364185718:5(1-29)Online publication date: 27-Feb-2024
https://dl.acm.org/doi/10.1145/3641857
Xiao FCai SChen GJagadish HOoi BZhang MBaeza-Yates RBonchi F(2024)VecAug: Unveiling Camouflaged Frauds with Cohort Augmentation for Enhanced DetectionProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671527(6025-6036)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671527
Sun GZhao YLi YSerra ESpezzano F(2024)FABLE: Approximate Butterfly Counting in Bipartite Graph Stream with Duplicate EdgesProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679812(2158-2167)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679812
Oak RShafiq Z(2024)Understanding Underground Incentivized Review ServicesProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642342(1-18)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642342
Wang HZhao MHu WMa YLu Z(2024)Critical Heritage Studies as a Lens to Understand Short Video Sharing of Intangible Cultural Heritage on DouyinProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642138(1-21)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642138
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

A one-class classification approach for bot detection on Twitter

Big graph mining for the web and social media: algorithms, anomaly detection, and applications

Big Social Network Mining for "Following" Patterns

Reviews

Access critical reviews of Computing literature here