skip to main content
10.1145/2488388.2488400acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

CopyCatch: stopping group attacks by spotting lockstep behavior in social networks

Published: 13 May 2013 Publication History

Abstract

How can web services that depend on user generated content discern fraudulent input by spammers from legitimate input? In this paper we focus on the social network Facebook and the problem of discerning ill-gotten Page Likes, made by spammers hoping to turn a profit, from legitimate Page Likes. Our method, which we refer to as CopyCatch, detects lockstep Page Like patterns on Facebook by analyzing only the social graph between users and Pages and the times at which the edges in the graph (the Likes) were created. We offer the following contributions: (1) We give a novel problem formulation, with a simple concrete definition of suspicious behavior in terms of graph structure and edge constraints. (2) We offer two algorithms to find such suspicious lockstep behavior - one provably-convergent iterative algorithm and one approximate, scalable MapReduce implementation. (3) We show that our method severely limits "greedy attacks" and analyze the bounds from the application of the Zarankiewicz problem to our setting. Finally, we demonstrate and discuss the effectiveness of CopyCatch at Facebook and on synthetic data, as well as potential extensions to anomaly detection problems in other domains. CopyCatch is actively in use at Facebook, searching for attacks on Facebook's social graph of over a billion users, many millions of Pages, and billions of Page Likes.

References

[1]
Apache Hadoop. http://hadoop.apache.org/, 2012.
[2]
L. Akoglu, M. Mcglohon, and C. Faloutsos. RTM: Laws and a recursive generator for weighted time-evolving graphs. In International Conference on Data Mining, December 2008.
[3]
A. Anagnostopoulos, A. Dasgupta, and R. Kumar. Approximation algorithms for co-clustering. In Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, PODS '08, pages 201--210, New York, NY, USA, 2008. ACM.
[4]
A. Banerjee, I. Dhillon, J. Ghosh, S. Merugu, and D. Modha. A generalized maximum entropy approach to bregman co-clustering and matrix approximation. The Journal of Machine Learning Research, 8:1919--1986, October 2007.
[5]
Y. Cheng. Mean shift, mode seeking, and clustering. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 17(8):790--799, aug 1995.
[6]
K. Crammer and G. Chechik. A needle in a haystack: local one-class optimization. In Proceedings of the twenty-first international conference on Machine learning, ICML '04, pages 26--, New York, NY, USA, 2004. ACM.
[7]
J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. OSDI'04, Dec. 2004.
[8]
I. S. Dhillon, S. Mallela, and D. S. Modha. Information-theoretic co-clustering. In Conference of the ACM Special Interest Group on Knowledge Discovery and Data Mining, New York, NY, 2003. ACM Press.
[9]
Facebook. Better Security through Software. blog.facebook.com/blog.php?post=248766257130. 2010.
[10]
Facebook. Working Together to Keep You Secure. blog.facebook.com/blog.php?post=68886667130, 2009.
[11]
Facebook. Staying in Control of Your Facebook Logins. blog.facebook.com/blog.php?post=389991097130, 2010.
[12]
Facebook. Improvements to our Site Integrity Systems. facebook.com/10151005934870766, 2012.
[13]
Z. Furedi. An upper bound on Zarankiewicz' Problem. Combinatorics, Probability and Computing, 5(01):29--33, 1996.
[14]
T. George and S. Merugu. A scalable collaborative filtering framework based on co-clustering. In Proceedings of the Fifth IEEE International Conference on Data Mining, ICDM '05, pages 625--628, Washington, DC, USA, 2005. IEEE Computer Society.
[15]
G. Gupta and J. Ghosh. Robust one-class clustering using hybrid global and local search. In Proceedings of the 22nd international conference on Machine learning, ICML '05, pages 273--280, New York, NY, USA, 2005. ACM.
[16]
H.-P. Kriegel, P. Kroger, and A. Zimek. Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans. Knowl. Discov. Data, 3(1):1:1--1:58, Helen Martin 2009.
[17]
K. Maruhashi, F. Guo, and C. Faloutsos. Multiaspectforensics: Pattern mining on large-scale heterogeneous networks with tensor analysis. In Proceedings of the Third International Conference on Advances in Social Network Analysis and Mining, 2011.
[18]
S. Pandit, D. Chau, S. Wang, and C. Faloutsos. Netprobe: a fast and scalable system for fraud detection in online auction networks. In Proceedings of the 16th international conference on World Wide Web, pages 201--210. ACM, 2007.
[19]
S. Papadimitriou and J. Sun. Disco: Distributed co-clustering with map-reduce: A case study towards petabyte-scale end-to-end mining. In Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on, pages 512--521, dec. 2008.
[20]
E. Papalexakis, A. Beutel, and P. Steenkiste. Network anomaly detection using co-clustering. In 2012 International Conference on Advances in Social Network Analysis and Mining, ASONAM 2012, 2012.
[21]
E. Papalexakis and N. Sidiropoulos. Co-clustering as multilinear decomposition with sparse latent factors. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, pages 2064--2067. IEEE, 2011.
[22]
R. Peeters. The maximum edge biclique problem is np-complete. Discrete Appl. Math., 131(3):651--654, Sept. 2003.
[23]
J. Pei, D. Jiang, and A. Zhang. On mining cross-graph quasi-cliques. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, KDD '05, pages 228--238, New York, NY, USA, 2005. ACM.
[24]
B. A. Prakash, M. Seshadri, A. Sridharan, S. Machiraju, and C. Faloutsos. Eigenspokes: Surprising patterns and scalable community chipping in large graphs. PAKDD 2010, 21-24 June 2010.
[25]
T. Stein, E. Chen, and K. Mangla. Facebook immune system. In Proceedings of the 4th Workshop on Social Network Systems, SNS '11, pages 8:1--8:8, New York, NY, USA, 2011. ACM.
[26]
A. Thusoo, Z. Shao, S. Anthony, D. Borthakur, N. Jain, J. Sen Sarma, R. Murthy, and H. Liu. Data warehousing and analytics infrastructure at facebook. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, SIGMOD '10, pages 1013--1020, New York, NY, USA, 2010. ACM.
[27]
K. Zarankiewicz. Problem p 101. In Colloq. Math, volume 2, page 301, 1951.

Cited By

View all
  • (2024)RUSH: Real-Time Burst Subgraph Detection in Dynamic GraphsProceedings of the VLDB Endowment10.14778/3681954.368202817:11(3657-3665)Online publication date: 1-Jul-2024
  • (2024)Efficient Maximal Frequent Group Enumeration in Temporal Bipartite GraphsProceedings of the VLDB Endowment10.14778/3681954.368199717:11(3243-3255)Online publication date: 30-Aug-2024
  • (2024)Efficient Algorithms for Density Decomposition on Large Static and Dynamic GraphsProceedings of the VLDB Endowment10.14778/3681954.368197417:11(2933-2945)Online publication date: 30-Aug-2024
  • Show More Cited By

Index Terms

  1. CopyCatch: stopping group attacks by spotting lockstep behavior in social networks

    Recommendations

    Reviews

    Jose F Rodrigues

    In Web 2.0, the content is no longer static, but rather dynamically user generated. In this universe, the more interaction a product page or user profile gets, the greater the potential profits an individual or company may achieve with advertisements. Hence, fraudulent behavior has come up in some means of Web 2.0 interaction. Artificial comments, evaluations, recommendations, and likes define artificial interest that may illegitimately portray the importance of online competitors. This paper looks at one specific kind of fraud: illegitimate likes in the Facebook social network. CopyCatch is a method that identifies artificial behavior between users and pages in Facebook. It has been designed to find what the authors call lockstep behavior, which occurs when groups of users act together, generally liking the same pages at around the same time. CopyCatch was designed for the immense scales observed nowadays, the ones produced by Facebook; to cope with this data, it works in parallel, following Hadoop's MapReduce framework. As the problem is comparable to the subspace clustering problem, it is arguable that it is NP-hard; therefore, CopyCatch does not try to find the exact solution, but only a good one, as expected in such problems. CopyCatch uses two heuristics to achieve a good solution: the set of users is defined to maximize the sum of likes for a given set of pages, and the set of liked pages must never shrink. The algorithm is iterative, converging when these two sets present no changes. CopyCatch is an interesting massive processing algorithm that has proved its value in the challenging environment of Facebook (cosponsor of the research). It is also a good example of what is possible when the data scales too high: parallel optimization. The paper excludes some details of the process due to confidentiality, but nothing that would prevent its reproduction. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WWW '13: Proceedings of the 22nd international conference on World Wide Web
    May 2013
    1628 pages
    ISBN:9781450320351
    DOI:10.1145/2488388

    Sponsors

    • NICBR: Nucleo de Informatcao e Coordenacao do Ponto BR
    • CGIBR: Comite Gestor da Internet no Brazil

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 May 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. anomaly detection
    2. bipartite cores
    3. mapreduce
    4. social networks

    Qualifiers

    • Research-article

    Conference

    WWW '13
    Sponsor:
    • NICBR
    • CGIBR
    WWW '13: 22nd International World Wide Web Conference
    May 13 - 17, 2013
    Rio de Janeiro, Brazil

    Acceptance Rates

    WWW '13 Paper Acceptance Rate 125 of 831 submissions, 15%;
    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)65
    • Downloads (Last 6 weeks)17
    Reflects downloads up to 01 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)RUSH: Real-Time Burst Subgraph Detection in Dynamic GraphsProceedings of the VLDB Endowment10.14778/3681954.368202817:11(3657-3665)Online publication date: 1-Jul-2024
    • (2024)Efficient Maximal Frequent Group Enumeration in Temporal Bipartite GraphsProceedings of the VLDB Endowment10.14778/3681954.368199717:11(3243-3255)Online publication date: 30-Aug-2024
    • (2024)Efficient Algorithms for Density Decomposition on Large Static and Dynamic GraphsProceedings of the VLDB Endowment10.14778/3681954.368197417:11(2933-2945)Online publication date: 30-Aug-2024
    • (2024)Unveiling iOS Scamwares through Crowdturfing ReviewsProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661178(399-404)Online publication date: 18-Jun-2024
    • (2024)Efficient Maximal Biplex Enumerations with Improved Worst-Case Time GuaranteeProceedings of the ACM on Management of Data10.1145/36549382:3(1-26)Online publication date: 30-May-2024
    • (2024)FiFrauD: Unsupervised Financial Fraud Detection in Dynamic Graph StreamsACM Transactions on Knowledge Discovery from Data10.1145/364185718:5(1-29)Online publication date: 27-Feb-2024
    • (2024)VecAug: Unveiling Camouflaged Frauds with Cohort Augmentation for Enhanced DetectionProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671527(6025-6036)Online publication date: 25-Aug-2024
    • (2024)FABLE: Approximate Butterfly Counting in Bipartite Graph Stream with Duplicate EdgesProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679812(2158-2167)Online publication date: 21-Oct-2024
    • (2024)Understanding Underground Incentivized Review ServicesProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642342(1-18)Online publication date: 11-May-2024
    • (2024)Critical Heritage Studies as a Lens to Understand Short Video Sharing of Intangible Cultural Heritage on DouyinProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642138(1-21)Online publication date: 11-May-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media