Skip to main content

Mean User-Text Agglomeration (MUTA): Practical User Representation and Visualization for Detection of Online Influence Operations

  • Conference paper
  • First Online:
Book cover Computational Data and Social Networks (CSoNet 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 13116))

Included in the following conference series:

Abstract

Online influence operations (OIOs) present a serious threat to the integrity of online social spaces and to real-world democratic elections. While many OIO detection approaches have focused on classification algorithms for individual social media posts (often with artificially balanced datasets), we present a novel system centering around a human analyst. This system incorporates a user representation and visualization procedure for unbalanced social media data. Our content-based social media user representation, the Mean User-Text Agglomeration (MUTA), summarizes a user’s social media activity with respect to Transformer embeddings of texts authored by the user. We apply MUTA to a real social media dataset in advance of an election event and flag a number of suspicious Reddit users that were later removed by the social media platform. When projected to a 2-dimensional visualizable space, MUTA user representations are shown, via extrinsic cluster quality measures, to outperform BERT representations for analyst identification of OIO accounts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alexa Internet, I.: Alexa rankings by country (2021). Accessed 06 July 2021

    Google Scholar 

  2. Alizadeh, M., Shapiro, J.N., Buntain, C., Tucker, J.A.: Content-based features predict social media infl. operations. Sci. Adv. 6(30), eabb5824 (2020)

    Article  Google Scholar 

  3. Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics. Inf. Retrieval 12(4), 461–486 (2009)

    Article  Google Scholar 

  4. Andrews, N., Bishop, M.: Learning invariant representations of social media users. In: EMNLP/IJCNLP (2019)

    Google Scholar 

  5. Baumgartner, J., Zannettou, S., Keegan, B., Squire, M., Blackburn, J.: The pushshift reddit dataset. ArXiv abs/2001.08435 (2020)

    Google Scholar 

  6. Behrisch, M., et al.: Quality metrics for information visualization. In: Computer Graphics Forum. Wiley Online Library, vol. 37, pp. 625–662 (2018)

    Google Scholar 

  7. Benton, A., Arora, R., Dredze, M.: Learning multiview embeddings of twitter users. In: 54th Annual Meeting of the ACL (Volume 2: Short Papers), pp. 14–19 (2016)

    Google Scholar 

  8. Coenen, A., Reif, E., Yuan, A., Kim, B., Pearce, A., Viégas, F.B., Wattenberg, M.: Visualizing and measuring the geometry of Bert. In: NeurIPS (2019)

    Google Scholar 

  9. Coscia, A.: Reddit suspicious accounts dataset (2018). https://github.com/ALCC01/reddit-suspicious-accounts. Accessed 20 Apr 2019

  10. Crothers, E., Japkowicz, N., Viktor, H.L.: Towards ethical content-based detection of online influence campaigns. In: IEEE MLSP 2019, pp. 1–6 (2019). https://doi.org/10.1109/MLSP.2019.8918842

  11. Crothers, E.: Ethical detection of online influence campaigns using transformer language models. université d’Ottawa/University of Ottawa (2020)

    Google Scholar 

  12. Crothers, E.: Muta-2021 (2021). https://github.com/ecrows/MUTA-2021

  13. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers. CoRR abs/1810.04805 (2018). http://arxiv.org/abs/1810.04805

  14. Explosion: Spacy python library. https://github.com/explosion/spaCy (2019). Version 2.0.16

  15. Fornacciari, P., Mordonini, M., Poggi, A., Sani, L., Tomaiuolo, M.: A holistic system for troll detection on twitter. Comput. Hum. Behav. 89, 258–268 (2018)

    Article  Google Scholar 

  16. Gao, H., Hu, J., Wilson, C., Li, Z., Chen, Y., Zhao, B.Y.: Detecting and characterizing social spam campaigns. In: Proceedings of ACM IMC 2010, p. 35–47. ACM, New York, NY, USA (2010). https://doi.org/10.1145/1879141.1879147

  17. Gencoglu, O.: Deep representation learning for clustering of health tweets. CoRR abs/1901.00439 (2019). http://arxiv.org/abs/1901.00439

  18. Gleicher, N.: Removing coordinated inauthentic behavior (2020). https://about.fb.com/news/2020/07/removing-political-coordinated-inauthentic-behavior/

  19. Hleg, E.H.L.E.G.o.A.: Ethics guidelines for trustworthy AI (2019). https://ec.europa.eu/digital-single-market/en/news/ethics-guidelines-trustworthy-ai

  20. Huffman, S.: Reddit 2017 transparency report findings (2018). Accessed 23 May 2019

    Google Scholar 

  21. Kaminski, M., Malgieri, G.: Algo. impact assessments under the GDPR: Producing multi-layered explanations. SSRN (2019). https://doi.org/10.2139/ssrn.3456224

  22. Kennedy, S., Walsh, N., Sloka, K., McCarren, A., Foster, J.: Fact or factitious? contextualized opinion spam detection. In: ACL 57: Student Research Workshop. ACL, Florence, Italy, pp. 344–350 (2019). https://doi.org/10.18653/v1/P19-2048

  23. van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008). http://www.jmlr.org/papers/v9/vandermaaten08a.html

  24. McInnes, L.: Parameter selection for HDBSCAN (2016). https://hdbscan.readthedocs.io/en/latest/parameter_selection.html

  25. McInnes, L., Healy, J.: UMAP: Uniform Manifold Approximation and Projection for dimension reduction. ArXiv abs/1802.03426 (2018)

    Google Scholar 

  26. McInnes, L., Healy, J., Astels, S.: HDBSCAN: Hierarchical Density based clustering. JOSS 2(11) (2017). https://doi.org/10.21105/joss.00205, https://doi.org/10.21105

  27. Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese bert-networks. In: EMNLP/IJCNLP (2019)

    Google Scholar 

  28. Reimers, N., Schiller, B., Beck, T., Daxenberger, J., Stab, C., Gurevych, I.: Classification and clustering of arguments with contextualized word embeddings. In: ACL 57, pp. 567–578. ACL, Florence, Italy (2019). https://doi.org/10.18653/v1/P19-1054

  29. Ribeiro, M., Calais, P., Santos, Y., Almeida, V., Meira Jr, W.: Characterizing and detecting hateful users on twitter. In: ICWSM, vol. 12 (2018)

    Google Scholar 

  30. Foundation of evaluation: van Rijsbergen. J. Documentation 30, 365–373 (1974)

    Google Scholar 

  31. Rosales-Méndez, H., Ramírez-Cruz, Y.: CICE-BCubed: a new evaluation measure for overlapping clustering algorithms. In: Ruiz-Shulcloper, J., Sanniti di Baja, G. (eds.) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, CIARP 2013. LNCS, vol. 8258, pp. 157–164. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41822-8_20

  32. Singh, K., Shakya, H., Biswas, B.: Clustering of people in social network based on textual similarity. Perspect. Sci. 8, 570–573 (2016). https://doi.org/10.1016/j.pisc.2016.06.023

    Article  Google Scholar 

  33. Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. JMLR 3, 583–617 (2003). https://doi.org/10.1162/153244303321897735

    Article  MathSciNet  MATH  Google Scholar 

  34. Twitter: Twitter elections integrity dataset. Internet (2019). Accessed 20 Apr 2019

    Google Scholar 

  35. Yang, C., Harkreader, R., Zhang, J., Shin, S., Gu, G.: Analyzing spammers’ social networks for fun and profit: A case study of cyber criminal ecosystem on twitter. In: WWW 2012. p. 71–80. ACM, New York, NY, USA (2012). https://doi.org/10.1145/2187836.2187847

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Evan Crothers .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Crothers, E., Viktor, H., Japkowicz, N. (2021). Mean User-Text Agglomeration (MUTA): Practical User Representation and Visualization for Detection of Online Influence Operations. In: Mohaisen, D., Jin, R. (eds) Computational Data and Social Networks. CSoNet 2021. Lecture Notes in Computer Science(), vol 13116. Springer, Cham. https://doi.org/10.1007/978-3-030-91434-9_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-91434-9_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-91433-2

  • Online ISBN: 978-3-030-91434-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics