Skip to main content
Log in

Detecting anomalies in social network data consumption

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

As the popularity and usage of social media exploded over the years, understanding how social network users’ interests evolve gained importance in diverse fields, ranging from sociological studies to marketing. In this paper, we use two snapshots from the Twitter network and analyze data interest patterns of users in time to understand individual and collective user behavior on social networks. Building topical profiles of users, we propose novel metrics to identify anomalous friendships, and validate our results with Amazon Mechanical Turk experiments. We show that although more than 80 % of all friendships on Twitter are created due to data interests, 83 % of all users have at least one friendship that can be explained neither by users’ past interest nor collective behavior of other similar users.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. In this paper, we use the term “anomaly” to represent such significant changes in user behavior.

  2. https://dev.twitter.com/.

  3. In Twitter API, friends of a user are the accounts followed by the user.

  4. Two senators are excluded in bioLDA because of short or blank bios.

  5. Other words from the topic include words such as green, water, power, wind, oil and gas.

  6. The number of new friendships is greater than the total number of queried Twitter users because we have queried Twitter breadth first, and many new friendships are shared by seed users.

  7. http://sight.dicom.uninsubria.it/anomaly/.

  8. Approved by the Office of Research Compliance-University of Texas at Dallas, human experiment IRB MR 13-231.

  9. For Fleiss’ Kappa, >0.2 Fair agreement, >0.40 Moderate agreement, >0.6 Substantial agreement

References

  • Akcora CG, Carminati B, Ferrari E (2012) Risks of friendships on social networks. In: Data Mining (ICDM), 2012 IEEE 12th International Conference

  • Akcora CG, Carminati B, Ferrari E, Kantarcioglu M (2014) Twitter diff dataset: friends of users in 2009 and 2013. http://strict.dista.uninsubria.it/?p=364, 2014

  • Akcora CG, Carminati B, Ferrari E, Kantarcioglu M (2014) Twitter taf dataset: detecting topically anomalous friendships. http://strict.dista.uninsubria.it/?p=442, 2014

  • Anantharam P, Sheth A (2012) Topical anomaly detection from twitter stream. Proc ACM Web Sci 2012:11–14

    Article  Google Scholar 

  • Blei D, Ng A, Jordan M (2003) Latent dirichlet allocation. J Mac Learn Res 3:993–1022

    MATH  Google Scholar 

  • Bollen J, Mao H, Zeng X (2011) Twitter mood predicts the stock market. J Comput Sci 2(1):1–8

    Article  Google Scholar 

  • Cataldi M, Di Caro L, Schifanella C (2010) Emerging topic detection on twitter based on temporal and social terms evaluation. In: Proceedings of the Tenth International Workshop on Multimedia Data Mining. ACM, p 4

  • Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv (CSUR) 41(3):15

    Article  Google Scholar 

  • Choudhury MD (2011) Tie formation on twitter: homophily and structure of egocentric networks. In: SocialCom/PASSAT, p 465–470

  • Chu Z, Gianvecchio S, Wang H, Jajodia S (2010) Who is tweeting on twitter: human, bot, or cyborg? In: Proceedings of the 26th Annual Computer Security Applications Conference, ACM, p 21–30

  • Fleiss JL, Levin B, Paik MC (1981) The measurement of interrater agreement. Stat Methods Rates Proportions 2:212–236

    Google Scholar 

  • Gan G, Ma C, Wu J (2007) Data clustering. SIAM, Society for Industrial and Applied Mathematics

  • Hong L, Davison B (2010) Empirical study of topic modeling in twitter. In: Proceedings of the First Workshop on Social Media Analytics, ACM, p 80–88

  • Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media? In WWW ’10: Proceedings of the 19th international conference on World wide web, ACM, New York, NY, USA, p 591–600

  • Lee S, Kim J (2012) Warningbird: detecting suspicious urls in twitter stream. In: Symposium on Network and Distributed System Security (NDSS)

  • Leetaru K, Wang S, Cao G, Padmanabhan A, Shook E (2013) Mapping the global twitter heartbeat: The geography of twitter. First Monday 18(5)

  • Lucia W, Akcora CG, Ferrari E (2013) Multi-dimensional conversation analysis across online social networks. In: Cloud and Green Computing (CGC), 2013 Third International Conference, IEEE, p 369–376

  • Mathioudakis M, Koudas N (2010) Twittermonitor: trend detection over the twitter stream. In: Proceedings of the 2010 international conference on Management of data. ACM, p 1155–1158

  • McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: homophily in social networks. Ann Rev Sociol 415–444

  • Meeder B, Karrer B, Sayedi A, Ravi R, Borgs C, Chayes J (2011) We know who you followed last summer: inferring social link creation times in twitter. In: Proceedings of the 20th international conference on World wide web. ACM, p 517–526

  • Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. In: Proceedings of LREC, vol 2010

  • Papadimitriou P, Dasdan A, Garcia-Molina H (2010) Web graph similarity for anomaly detection. J Internet Serv Appl 1(1):19–30

    Article  Google Scholar 

  • Ramage D, Dumais S, Liebling D (2010) Characterizing microblogs with topic models. In: ICWSM

  • Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th international conference on World wide web. ACM, p 851–860

  • Takahashi T, Tomioka R, Yamanishi K (2011) Discovering emerging topics in social streams via link anomaly detection. In Data Mining (ICDM), 2011 IEEE 11th International Conference. IEEE, p 1230–1235

  • Thomases H (2010) Twitter marketing: an hour a day. Sybex

  • Traud AL, Mucha PJ, Porter MA (2012) Social structure of facebook networks. Phys A Stat Mech Appl 391(16):4165–4180

    Article  Google Scholar 

  • Zhao W, Jiang J, Weng J, He J, Lim E-P, Yan H, Li X (2011) Comparing twitter and traditional media using topic models. Adv Inf Retr 338–349

Download references

Acknowledgments

This work is partially funded by National Science Foundation (NSF) Grants Career—CNS-0845803, CNS-0964350, CNS-1016343, CNS-1111529, CNS-1228198.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cuneyt Gurcan Akcora.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Akcora, C.G., Carminati, B., Ferrari, E. et al. Detecting anomalies in social network data consumption. Soc. Netw. Anal. Min. 4, 231 (2014). https://doi.org/10.1007/s13278-014-0231-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-014-0231-3

Keywords

Navigation