Abstract
As the popularity and usage of social media exploded over the years, understanding how social network users’ interests evolve gained importance in diverse fields, ranging from sociological studies to marketing. In this paper, we use two snapshots from the Twitter network and analyze data interest patterns of users in time to understand individual and collective user behavior on social networks. Building topical profiles of users, we propose novel metrics to identify anomalous friendships, and validate our results with Amazon Mechanical Turk experiments. We show that although more than 80 % of all friendships on Twitter are created due to data interests, 83 % of all users have at least one friendship that can be explained neither by users’ past interest nor collective behavior of other similar users.
Similar content being viewed by others
Notes
In this paper, we use the term “anomaly” to represent such significant changes in user behavior.
In Twitter API, friends of a user are the accounts followed by the user.
Two senators are excluded in bioLDA because of short or blank bios.
Other words from the topic include words such as green, water, power, wind, oil and gas.
The number of new friendships is greater than the total number of queried Twitter users because we have queried Twitter breadth first, and many new friendships are shared by seed users.
Approved by the Office of Research Compliance-University of Texas at Dallas, human experiment IRB MR 13-231.
For Fleiss’ Kappa, >0.2 Fair agreement, >0.40 Moderate agreement, >0.6 Substantial agreement
References
Akcora CG, Carminati B, Ferrari E (2012) Risks of friendships on social networks. In: Data Mining (ICDM), 2012 IEEE 12th International Conference
Akcora CG, Carminati B, Ferrari E, Kantarcioglu M (2014) Twitter diff dataset: friends of users in 2009 and 2013. http://strict.dista.uninsubria.it/?p=364, 2014
Akcora CG, Carminati B, Ferrari E, Kantarcioglu M (2014) Twitter taf dataset: detecting topically anomalous friendships. http://strict.dista.uninsubria.it/?p=442, 2014
Anantharam P, Sheth A (2012) Topical anomaly detection from twitter stream. Proc ACM Web Sci 2012:11–14
Blei D, Ng A, Jordan M (2003) Latent dirichlet allocation. J Mac Learn Res 3:993–1022
Bollen J, Mao H, Zeng X (2011) Twitter mood predicts the stock market. J Comput Sci 2(1):1–8
Cataldi M, Di Caro L, Schifanella C (2010) Emerging topic detection on twitter based on temporal and social terms evaluation. In: Proceedings of the Tenth International Workshop on Multimedia Data Mining. ACM, p 4
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv (CSUR) 41(3):15
Choudhury MD (2011) Tie formation on twitter: homophily and structure of egocentric networks. In: SocialCom/PASSAT, p 465–470
Chu Z, Gianvecchio S, Wang H, Jajodia S (2010) Who is tweeting on twitter: human, bot, or cyborg? In: Proceedings of the 26th Annual Computer Security Applications Conference, ACM, p 21–30
Fleiss JL, Levin B, Paik MC (1981) The measurement of interrater agreement. Stat Methods Rates Proportions 2:212–236
Gan G, Ma C, Wu J (2007) Data clustering. SIAM, Society for Industrial and Applied Mathematics
Hong L, Davison B (2010) Empirical study of topic modeling in twitter. In: Proceedings of the First Workshop on Social Media Analytics, ACM, p 80–88
Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media? In WWW ’10: Proceedings of the 19th international conference on World wide web, ACM, New York, NY, USA, p 591–600
Lee S, Kim J (2012) Warningbird: detecting suspicious urls in twitter stream. In: Symposium on Network and Distributed System Security (NDSS)
Leetaru K, Wang S, Cao G, Padmanabhan A, Shook E (2013) Mapping the global twitter heartbeat: The geography of twitter. First Monday 18(5)
Lucia W, Akcora CG, Ferrari E (2013) Multi-dimensional conversation analysis across online social networks. In: Cloud and Green Computing (CGC), 2013 Third International Conference, IEEE, p 369–376
Mathioudakis M, Koudas N (2010) Twittermonitor: trend detection over the twitter stream. In: Proceedings of the 2010 international conference on Management of data. ACM, p 1155–1158
McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: homophily in social networks. Ann Rev Sociol 415–444
Meeder B, Karrer B, Sayedi A, Ravi R, Borgs C, Chayes J (2011) We know who you followed last summer: inferring social link creation times in twitter. In: Proceedings of the 20th international conference on World wide web. ACM, p 517–526
Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. In: Proceedings of LREC, vol 2010
Papadimitriou P, Dasdan A, Garcia-Molina H (2010) Web graph similarity for anomaly detection. J Internet Serv Appl 1(1):19–30
Ramage D, Dumais S, Liebling D (2010) Characterizing microblogs with topic models. In: ICWSM
Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th international conference on World wide web. ACM, p 851–860
Takahashi T, Tomioka R, Yamanishi K (2011) Discovering emerging topics in social streams via link anomaly detection. In Data Mining (ICDM), 2011 IEEE 11th International Conference. IEEE, p 1230–1235
Thomases H (2010) Twitter marketing: an hour a day. Sybex
Traud AL, Mucha PJ, Porter MA (2012) Social structure of facebook networks. Phys A Stat Mech Appl 391(16):4165–4180
Zhao W, Jiang J, Weng J, He J, Lim E-P, Yan H, Li X (2011) Comparing twitter and traditional media using topic models. Adv Inf Retr 338–349
Acknowledgments
This work is partially funded by National Science Foundation (NSF) Grants Career—CNS-0845803, CNS-0964350, CNS-1016343, CNS-1111529, CNS-1228198.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Akcora, C.G., Carminati, B., Ferrari, E. et al. Detecting anomalies in social network data consumption. Soc. Netw. Anal. Min. 4, 231 (2014). https://doi.org/10.1007/s13278-014-0231-3
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-014-0231-3