skip to main content
10.1145/2488388.2488400acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

CopyCatch: stopping group attacks by spotting lockstep behavior in social networks

Published:13 May 2013Publication History

ABSTRACT

How can web services that depend on user generated content discern fraudulent input by spammers from legitimate input? In this paper we focus on the social network Facebook and the problem of discerning ill-gotten Page Likes, made by spammers hoping to turn a profit, from legitimate Page Likes. Our method, which we refer to as CopyCatch, detects lockstep Page Like patterns on Facebook by analyzing only the social graph between users and Pages and the times at which the edges in the graph (the Likes) were created. We offer the following contributions: (1) We give a novel problem formulation, with a simple concrete definition of suspicious behavior in terms of graph structure and edge constraints. (2) We offer two algorithms to find such suspicious lockstep behavior - one provably-convergent iterative algorithm and one approximate, scalable MapReduce implementation. (3) We show that our method severely limits "greedy attacks" and analyze the bounds from the application of the Zarankiewicz problem to our setting. Finally, we demonstrate and discuss the effectiveness of CopyCatch at Facebook and on synthetic data, as well as potential extensions to anomaly detection problems in other domains. CopyCatch is actively in use at Facebook, searching for attacks on Facebook's social graph of over a billion users, many millions of Pages, and billions of Page Likes.

References

  1. Apache Hadoop. http://hadoop.apache.org/, 2012.Google ScholarGoogle Scholar
  2. L. Akoglu, M. Mcglohon, and C. Faloutsos. RTM: Laws and a recursive generator for weighted time-evolving graphs. In International Conference on Data Mining, December 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Anagnostopoulos, A. Dasgupta, and R. Kumar. Approximation algorithms for co-clustering. In Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, PODS '08, pages 201--210, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Banerjee, I. Dhillon, J. Ghosh, S. Merugu, and D. Modha. A generalized maximum entropy approach to bregman co-clustering and matrix approximation. The Journal of Machine Learning Research, 8:1919--1986, October 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Y. Cheng. Mean shift, mode seeking, and clustering. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 17(8):790--799, aug 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. K. Crammer and G. Chechik. A needle in a haystack: local one-class optimization. In Proceedings of the twenty-first international conference on Machine learning, ICML '04, pages 26--, New York, NY, USA, 2004. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. OSDI'04, Dec. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. I. S. Dhillon, S. Mallela, and D. S. Modha. Information-theoretic co-clustering. In Conference of the ACM Special Interest Group on Knowledge Discovery and Data Mining, New York, NY, 2003. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Facebook. Better Security through Software. blog.facebook.com/blog.php?post=248766257130. 2010.Google ScholarGoogle Scholar
  10. Facebook. Working Together to Keep You Secure. blog.facebook.com/blog.php?post=68886667130, 2009.Google ScholarGoogle Scholar
  11. Facebook. Staying in Control of Your Facebook Logins. blog.facebook.com/blog.php?post=389991097130, 2010.Google ScholarGoogle Scholar
  12. Facebook. Improvements to our Site Integrity Systems. facebook.com/10151005934870766, 2012.Google ScholarGoogle Scholar
  13. Z. Furedi. An upper bound on Zarankiewicz' Problem. Combinatorics, Probability and Computing, 5(01):29--33, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  14. T. George and S. Merugu. A scalable collaborative filtering framework based on co-clustering. In Proceedings of the Fifth IEEE International Conference on Data Mining, ICDM '05, pages 625--628, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. G. Gupta and J. Ghosh. Robust one-class clustering using hybrid global and local search. In Proceedings of the 22nd international conference on Machine learning, ICML '05, pages 273--280, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H.-P. Kriegel, P. Kroger, and A. Zimek. Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans. Knowl. Discov. Data, 3(1):1:1--1:58, Helen Martin 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. K. Maruhashi, F. Guo, and C. Faloutsos. Multiaspectforensics: Pattern mining on large-scale heterogeneous networks with tensor analysis. In Proceedings of the Third International Conference on Advances in Social Network Analysis and Mining, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Pandit, D. Chau, S. Wang, and C. Faloutsos. Netprobe: a fast and scalable system for fraud detection in online auction networks. In Proceedings of the 16th international conference on World Wide Web, pages 201--210. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Papadimitriou and J. Sun. Disco: Distributed co-clustering with map-reduce: A case study towards petabyte-scale end-to-end mining. In Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on, pages 512--521, dec. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. E. Papalexakis, A. Beutel, and P. Steenkiste. Network anomaly detection using co-clustering. In 2012 International Conference on Advances in Social Network Analysis and Mining, ASONAM 2012, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. E. Papalexakis and N. Sidiropoulos. Co-clustering as multilinear decomposition with sparse latent factors. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, pages 2064--2067. IEEE, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  22. R. Peeters. The maximum edge biclique problem is np-complete. Discrete Appl. Math., 131(3):651--654, Sept. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Pei, D. Jiang, and A. Zhang. On mining cross-graph quasi-cliques. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, KDD '05, pages 228--238, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. B. A. Prakash, M. Seshadri, A. Sridharan, S. Machiraju, and C. Faloutsos. Eigenspokes: Surprising patterns and scalable community chipping in large graphs. PAKDD 2010, 21-24 June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. T. Stein, E. Chen, and K. Mangla. Facebook immune system. In Proceedings of the 4th Workshop on Social Network Systems, SNS '11, pages 8:1--8:8, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Thusoo, Z. Shao, S. Anthony, D. Borthakur, N. Jain, J. Sen Sarma, R. Murthy, and H. Liu. Data warehousing and analytics infrastructure at facebook. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, SIGMOD '10, pages 1013--1020, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. K. Zarankiewicz. Problem p 101. In Colloq. Math, volume 2, page 301, 1951.Google ScholarGoogle Scholar

Index Terms

  1. CopyCatch: stopping group attacks by spotting lockstep behavior in social networks

    Recommendations

    Reviews

    Jose F Rodrigues

    In Web 2.0, the content is no longer static, but rather dynamically user generated. In this universe, the more interaction a product page or user profile gets, the greater the potential profits an individual or company may achieve with advertisements. Hence, fraudulent behavior has come up in some means of Web 2.0 interaction. Artificial comments, evaluations, recommendations, and likes define artificial interest that may illegitimately portray the importance of online competitors. This paper looks at one specific kind of fraud: illegitimate likes in the Facebook social network. CopyCatch is a method that identifies artificial behavior between users and pages in Facebook. It has been designed to find what the authors call lockstep behavior, which occurs when groups of users act together, generally liking the same pages at around the same time. CopyCatch was designed for the immense scales observed nowadays, the ones produced by Facebook; to cope with this data, it works in parallel, following Hadoop's MapReduce framework. As the problem is comparable to the subspace clustering problem, it is arguable that it is NP-hard; therefore, CopyCatch does not try to find the exact solution, but only a good one, as expected in such problems. CopyCatch uses two heuristics to achieve a good solution: the set of users is defined to maximize the sum of likes for a given set of pages, and the set of liked pages must never shrink. The algorithm is iterative, converging when these two sets present no changes. CopyCatch is an interesting massive processing algorithm that has proved its value in the challenging environment of Facebook (cosponsor of the research). It is also a good example of what is possible when the data scales too high: parallel optimization. The paper excludes some details of the process due to confidentiality, but nothing that would prevent its reproduction. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      WWW '13: Proceedings of the 22nd international conference on World Wide Web
      May 2013
      1628 pages
      ISBN:9781450320351
      DOI:10.1145/2488388

      Copyright © 2013 Copyright is held by the International World Wide Web Conference Committee (IW3C2).

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 13 May 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      WWW '13 Paper Acceptance Rate125of831submissions,15%Overall Acceptance Rate1,899of8,196submissions,23%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader