Abstract
Given multimillion-node graphs such as “who-follows-whom”, “patent-cites-patent”, “user-likes-page” and “actor/director-makes-movie” networks, how can we find unexpected behaviors? When companies operate on the graphs with monetary incentives to sell Twitter “Followers” and Facebook page “Likes”, the graphs show strange connectivity patterns. In this paper, we study a complete graph from a large Twitter-style social network, spanning up to 3.33 billion edges. We report strange deviations from typical patterns like smooth degree distributions. We find that such deviations are often due to “lockstep behavior” that large groups of followers connect to the same groups of followees. Our first contribution is that we study strange patterns on the adjacency matrix and in the spectral subspaces with respect to several flavors of lockstep. We discover that (a) the lockstep behaviors on the graph shape dense “block” in its adjacency matrix and creates “rays” in spectral subspaces, and (b) partially overlapping of the behaviors shape “staircase” in its adjacency matrix and creates “pearls” in spectral subspaces. The second contribution is that we provide a fast algorithm, using the discovery as a guide for practitioners, to detect users who offer the lockstep behaviors in undirected/directed/bipartite graphs. We carry out extensive experiments on both synthetic and real datasets, as well as public datasets from IMDb and US Patent. The results demonstrate the scalability and effectiveness of our proposed algorithm.
Similar content being viewed by others
References
Becker RA, Volinsky C, Wilks AR (2010) Fraud detection in telecommunications: history and lessons learned. Technometrics 52(1):20–33
Chau DH, Pandit S, Faloutsos C (2006) Detecting fraudulent personalities in networks of online auctioneers. In: Fürnkranz J, Scheffer T, Spiliopoulou M (eds) Knowledge discovery in databases: PKDD 2006. Springer, Berlin Heidelberg, pp 103–114
Beutel A, Xu W, Guruswami V, Palow C, Faloutsos C (2013) CopyCatch: stopping group attacks by spotting lockstep behavior in social networks. In: Proceedings of the 22nd international conference on World Wide Web, pp 119–130. International World Wide Web Conferences Steering Committee
Leskovec J, Lang KJ, Dasgupta A, Mahoney MW (2008) Statistical properties of community structure in large social and information networks. In: Proceedings of the 17th international conference on World Wide Web, pp 695-704. ACM
Fortunato Santo (2010) Community detection in graphs. Phys Rep 486(3):75–174
Chen Jie, Saad Yousef (2012) Dense subgraph extraction with application to community detection. Knowl Data Eng IEEE Trans 24(7):1216–1230
Zha H, He X, Ding C, Simon H, Gu M (2001) Bipartite graph partitioning and data clustering. In: Proceedings of the tenth international conference on Information and knowledge management, pp 25–32. ACM
Gnnemann S, Boden B, Frber I, Seidl T (2013) Efficient Mining of Combined Subspace and Subgraph Clusters in Graphs with Feature Vectors. In: Pei J, Tseng VS, Cao L, Motoda H, Xu G (eds) Advances in knowledge discovery and data mining. Springer, Berlin, Heidelberg, pp 261–275
Chung F, Linyuan L (2002) The average distances in random graphs with given expected degrees. Proc Natl Acad Sci 99(25):15879–15882
Jiang M, Cui P, Beutel A, Faloutsos C, Yang S (2014) Inferring strange behavior from connectivity pattern in social networks. In: Tseng VS, Ho TB, Zhou Z-H, Chen ALP, Kao H-Y (eds) Advances in knowledge discovery and data mining. Springer, pp 126–138
Chakrabarti S (2002) Mining the web: discovering knowledge from hypertext data. Elsevier, San Francisco
Aggarwal CC, Wang H (2010) Managing and mining graph data, vol 40. Springer, New York
Pei J, Jiang D, Zhang A (2005) On mining cross-graph quasi-cliques. In: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pp 228–238. ACM
Jiang D, Pei J (2009) Mining frequent cross-graph quasi-cliques. ACM Trans Knowl Discov Data (TKDD) 2(4):16
Yan X, Han J (2003) CloseGraph: mining closed frequent graph patterns. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 286–295. ACM
Lahiri M, Berger-Wolf TY (2010) Periodic subgraph mining in dynamic networks. Knowl Inf Syst 24(3):467–497
Bahmani B, Kumar R, Vassilvitskii Sergei (2012) Densest subgraph in streaming and mapreduce. Proc VLDB Endow 5(5):454–465
Jiang M, Cui P, Beutel A, Faloutsos C, Yang S (2014) CatchSync: catching synchronized behavior in large directed graphs. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 941–950. ACM
Moonesinghe HDK, Tan P-N (2008) Outrank: a graph-based outlier detection framework using random walk. Int J Artif Intell Tools 17(1):19–36
Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392
Dhillon IS, Guan Y, Kulis B (2007) Weighted graph cuts without eigenvectors a multilevel approach. IEEE Trans Pattern Anal Mach Intell 29(11):1944–1957
Chakrabarti D (2004) Autopart: Parameter-free graph partitioning and outlier detection. In: Boulicaut J-F, Esposito F, Giannotti F, Pedreschi D (eds) Knowledge discovery in databases: PKDD 2004, vol 3202. Springer, Berlin, Heidelberg, pp 112–124
Akoglu L, McGlohon M, Faloutsos C (2010) Oddball: spotting anomalies in weighted graphs. AKDDM 17(1):410–421
Feng J, He X, Konte B, Bhm C, Plant C (2012) Summarization-based mining bipartite graphs. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 1249–1257. ACM
Jiang M, Cui P, Beutel A, Faloutsos C, Yang S (2014) Detecting suspicious following behavior in multimillion-node social networks. In: Proceedings of the companion publication of the 23rd international conference on World wide web companion, pp 305–306. International World Wide Web Conferences Steering Committee
Jiang M, Hooi B, Beutel A, Yang S, Cui P, Faloutsos C (2015) A general suspiciousness metric for dense blocks in multimodal data. In: Proceedings of IEEE international conference on data mining. IEEE
Yan D, Huang L, Jordan MI (2009) Fast approximate spectral clustering. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 907–916. ACM
Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering analysis and an algorithm. In: Proceedings of advances in neural information processing systems. Cambridge, MIT Press 14: 849–856
Huang L, Yan D, Jordan MI, Taft N (2008) Spectral clustering with perturbed data. In: NIPS, vol 21
Prakash BA, Sridharan A, Seshadri M, Machiraju S, Faloutsos C (2010) Eigenspokes: surprising patterns and scalable community chipping in large graphs. In: Advances in knowledge discovery and data mining, pp 435–448. Springer, Berlin, Heidelberg
Ying X, Xintao W (2009) On randomness measures for social networks. SDM 9:709–720
Wu L, Ying X, Wu X, Zhou Z-H (2011) Line orthogonality in adjacency eigenspace with application to community partition. In: Proceedings of the twenty-second international joint conference on artificial intelligence, Vol 3, pp 2349–2354. AAAI Press
Newman Mark EJ (2006) Finding community structure in networks using the eigenvectors of matrices. Phys Rev E 74(3):036104
Satuluri V, Parthasarathy S (2009) Scalable graph clustering using stochastic flows: applications to community discovery. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 737–746. ACM
Aaron C, Newman MEJ, Cristopher M (2004) Finding community structure in very large networks. Phys Rev E 70(6):066111
Wakita K, Tsurumi T (2007) Finding community structure in mega-scale social networks:[extended abstract]. In: Proceedings of the 16th international conference on World Wide Web, pp 1275–1276. ACM
Kalman D (1996) A singularly valuable decomposition: the SVD of a matrix. Coll Math J 27:2–23
Brownrigg DRK (1984) The weighted median filter. Commun ACM 27(8):807–818
Kang U, Meeder B, Papalexakis EE, Faloutsos C (2014) Heigen: spectral analysis for billion-scale graphs. Knowl Data Eng IEEE Trans 26(2):350–362
Broder A, Kumar R, Maghoul F, Raghavan P, Rajagopalan S, Stata R, Tomkins A, Wiener Janet (2000) Graph structure in the web. Comput Netw 33(1):309–320
Faloutsos M, Faloutsos P, Faloutsos C (1999) On power-law relationships of the internet topology. In: ACM SIGCOMM computer communication review, vol 29, no 4, pp 251–262. ACM
Hall BH, Jaffe AB, Trajtenberg M (2001) The NBER patent citations data file: lessons, insights and methodological tools. In: NBER working papers 8498, National Bureau of Economic Research, Inc
Trappey CV, Trappey AJC, Wu C-Y (2001) Clustering patents using non-exhaustive overlaps. J Syst Sci Syst Eng 19(2):162–181
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jiang, M., Cui, P., Beutel, A. et al. Inferring lockstep behavior from connectivity pattern in large graphs. Knowl Inf Syst 48, 399–428 (2016). https://doi.org/10.1007/s10115-015-0883-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-015-0883-y