Skip to main content
Log in

Characterizing the language-production dynamics of social media users

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

In this paper, we propose a characterization of social media users based on language usage over time in order to make more rigorous the notions of organic and inorganic online behavior. This characterization describes the extent to which a user’s word usage within a particular time period subverts expectations based on preceding time periods. To do this, we adapt the use of an information theoretic measure of cognitive surprise and apply it to a set of behaviorally diverse Twitter users. We then compare the language-production dynamics across users based on term frequencies at multiple levels of granularity. We then illustrate the intuition behind this characterization through case studies of salient users identified from this method. Through these case studies, we find that this characterization can be linked to the degree to which a user’s word usage is organic, inorganic, or a mixture of both.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. The reason for doing so is to ensure that each group of tweets represents a roughly equal amount of linguistic activity, regardless of the time spanned by each grouping. If we were to instead group tweets based on a fixed unit of time (e.g., grouping all tweets from the same month), we would run into a serious problem in cases where a user has periods of lessened tweet activity. Such periods of fewer tweets would result in sparser word distributions, which would cause the calculated surprise (detailed in the following section) to be arbitrarily high. To put it more simply, if we want to compare a user’s tweet to the tweet before it, the time lag between the two tweets is irrelevant—the last tweet simply represents the most recent action of the user and thus all that we have available to make our comparison.

References

  • Bail CA et al (2018) Exposure to opposing views on social media can increase political polarization. PNAS 115(37):9216–9221. https://doi.org/10.1073/pnas.1804840115

    Article  Google Scholar 

  • Bakshy E, Rosenn I, Marlow C, Adamic L (2012) The role of social networks in information diffusion. In: Proceedings of the 21st international conference on world wide web. Lyon, France, pp 519–528

  • Barron ATJ, Huang J, Spang RL, DeDeo S (2018) Individuals, institutions, and innovation in the debates of the French Revolution. PNAS 115(18):4607–4612. https://doi.org/10.1073/pnas.1717729115

    Article  Google Scholar 

  • Bird S, Klein E, Loper E (2009) Natural Language Processing with Python. O’Reilly Media, Sebastopol

    MATH  Google Scholar 

  • Chu Z, Gianvecchio S, Wang H, Jajodia S (2012) Detecting automation of Twitter accounts: are you a human, bot, or cyborg? IEEE Trans Dependable Secur Comput 9(6):811–824. https://doi.org/10.1109/TDSC.2012.75

    Article  Google Scholar 

  • Cresci S, Di Pietro R, Petrocchi M, Spognardi A, Tesconi M (2017) The paradigm-shift of social spambots: evidence, theories, and tools for the arms race. In: Proceedings of the 26th international conference on world wide web companion. Perth, Australia, pp 963–972. https://doi.org/10.1145/3041021.3055135

  • Cresci S, Di Pietro R, Petrocchi M, Spognardi A, Tesconi M (2018) Social fingerprinting: detection of spambot groups through DNA-inspired behavioral modeling. IEEE Trans Dependable Secur Comput 15(4):561–576. https://doi.org/10.1109/TDSC.2017.2681672

    Article  Google Scholar 

  • Cresci S, Petrocchi M, Spognardi A, Tognazzi S (2019) On the capability of evolved spambots to evade detection via genetic engineering. Online Soc Netw Media 9:1–6. https://doi.org/10.1016/j.osnem.2018.10.005

    Article  Google Scholar 

  • Del Vicario M et al (2016) The spreading of misinformation online. PNAS 113(3):554–559. https://doi.org/10.1073/pnas.1517441113

    Article  Google Scholar 

  • Dickerson JP, Kagan V, Subrahmanian VS (2014) Using sentiment to detect bots on Twitter: are humans more opinionated than bots? In: Proceedings of the 2014 IEEE/ACM international conference on advances in social networks analysis and mining. Beijing, China, pp 620–627

  • Ferrara E, Varol O, Davis C, Menczer F, Flammini A (2016) The rise of social bots. Commun ACM 59(7):96–104. https://doi.org/10.1145/2818717

    Article  Google Scholar 

  • Gilani Z, Almeida M, Farahbakhsh R, Wang L, Crowcroft J (2016) Stweeler: A framework for Twitter bot analysis. In: Proceedings of the 25th international conference companion on world wide web. Montréal, Canada, pp 37–38. https://doi.org/10.1145/2872518.2889360

  • Gilani Z, Farahbakhsh R, Tyson G, Wang L, Crowcroft J (2017) Of bots and humans (on Twitter). In: Proceedings of the 2017 IEEE/ACM international conference on advances in social networks analysis and mining. Sydney, Australia, pp 349–354. https://doi.org/10.1145/3110025.3110090

  • Grimme C, Assenmacher D, Adam L (2018) Changing perspectives: Is it sufficient to detect social bots? In: Meiselwitz G (ed) Social computing and social media, user experience and behavior, SCSM 2018 lecture notes in computer science. Springer, Cham, pp 445–461. https://doi.org/10.1007/978-3-319-91521-0_32

    Chapter  Google Scholar 

  • Guo L, Tan E, Chen S, Zhang X, Zhao Y (2009) Analyzing patterns of user content generation in online social networks. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. Paris, France, pp 369–378. https://doi.org/10.1145/1557019.1557064

  • Liao TW (2005) Clustering of time series data—a survey. Pattern Recognit 38(11):1857–1874. https://doi.org/10.1016/j.patcog.2005.01.025

    Article  MATH  Google Scholar 

  • Murdock J, Allen C, DeDeo S (2017) Exploration and exploitation of Victorian science in Darwin’s reading notebooks. Cognition 159:117–126. https://doi.org/10.1016/j.cognition.2016.11.012

    Article  Google Scholar 

  • Oliphant TE (2006) A guide to NumPy. Trelgol Publishing, Provo

    Google Scholar 

  • Paavola J, Helo T, Jalonen H, Sartonen M, Huhtinen AM (2016) Understanding the trolling phenomenon: the automated detection of bots and cyborgs in the social media. J Inf Warf 15(4):100–111

    Google Scholar 

  • Paavola J, Helo T, Jalonen H, Sartonen M, Huhtinen AM (2017) May I introduce you to a troll? Defining and categorizing internet behaviour commonly referred to as trolling. In: Proceedings of the 16th European conference on cyber warfare and security. Dublin, Ireland, pp 734–740

  • Rao D, Yarowsky D, Shreevats A, Gupta M (2010) Classifying latent user attributes in Twitter. In: Proceedings of the 2nd international workshop on search and mining user-generated contents. Toronto, Canada, pp 37–44. https://doi.org/10.1145/1871985.1871993

  • Řehůřek R, Sojka P (2010) Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 workshop on new challenges for NLP frameworks. Valletta, Malta, pp 45–50

  • Schmidt AL et al (2017) Anatomy of news consumption on Facebook. PNAS 114(12):3035–3039. https://doi.org/10.1073/pnas.1617052114

    Article  Google Scholar 

  • Stine ZK, Khaund T, Agarwal N (2018) Measuring the information-foraging behaviors of social bots through word usage. In: Proceedings of the IEEE/ACM international conference on advances in social networks analysis and mining. Barcelona, Spain, pp 570–671. https://doi.org/10.1109/ASONAM.2018.8508811

  • Varol O, Ferrara E, Davis CA, Menczer F, Flammini A (2017) Online human-bot interactions: detection, estimation, and characterization. In: Proceedings of the 11th international AAAI conference on web and social media. pp 280–289

  • Volkova S, Bachrach Y, Armstrong M, Sharma V (2015) Inferring latent user properties from texts published in social media. In: Proceedings of the 29th AAAI conference on artificial intelligence. pp 4296–4297

  • Yang C, Harkreader R, Gu G (2013) Empirical evaluation and new design for fighting evolving Twitter spammers. IEEE Trans Inf Forensics Secur 8(8):1280–1293. https://doi.org/10.1109/TIFS.2013.2267732

    Article  Google Scholar 

Download references

Acknowledgements

The authors wish to thank Stefano Cresci and colleagues for generously making the dataset available to us. This research is funded in part by the U.S. National Science Foundation (IIS-1636933, ACI-1429160, and IIS-1110868), U.S. Office of Naval Research (N00014-10-1-0091, N00014-14-1-0489, N00014-15-P-1187, N00014-16-1-2016, N00014-16-1-2412, N00014-17-1-2605, N00014-17-1-2675, N00014-19-1-2336), U.S. Air Force Research Lab, U.S. Army Research Office (W911NF-16-1-0189), U.S. Defense Advanced Research Projects Agency (W31P4Q-17-C-0059), Arkansas Research Alliance, and the Jerry L. Maulden/Entergy Endowment at the University of Arkansas at Little Rock. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding organizations. The researchers gratefully acknowledge the support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zachary K. Stine.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Stine, Z.K., Agarwal, N. Characterizing the language-production dynamics of social media users. Soc. Netw. Anal. Min. 9, 60 (2019). https://doi.org/10.1007/s13278-019-0605-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-019-0605-7

Keywords

Navigation