Mean User-Text Agglomeration (MUTA): Practical User Representation and Visualization for Detection of Online Influence Operations

Crothers, Evan; Viktor, Herna; Japkowicz, Nathalie

doi:10.1007/978-3-030-91434-9_27

Evan Crothers¹⁰,
Herna Viktor¹⁰ &
Nathalie Japkowicz^10,11

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 13116))

Included in the following conference series:

International Conference on Computational Data and Social Networks

797 Accesses
1 Citations

Abstract

Online influence operations (OIOs) present a serious threat to the integrity of online social spaces and to real-world democratic elections. While many OIO detection approaches have focused on classification algorithms for individual social media posts (often with artificially balanced datasets), we present a novel system centering around a human analyst. This system incorporates a user representation and visualization procedure for unbalanced social media data. Our content-based social media user representation, the Mean User-Text Agglomeration (MUTA), summarizes a user’s social media activity with respect to Transformer embeddings of texts authored by the user. We apply MUTA to a real social media dataset in advance of an election event and flag a number of suspicious Reddit users that were later removed by the social media platform. When projected to a 2-dimensional visualizable space, MUTA user representations are shown, via extrinsic cluster quality measures, to outperform BERT representations for analyst identification of OIO accounts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alexa Internet, I.: Alexa rankings by country (2021). Accessed 06 July 2021
Google Scholar
Alizadeh, M., Shapiro, J.N., Buntain, C., Tucker, J.A.: Content-based features predict social media infl. operations. Sci. Adv. 6(30), eabb5824 (2020)
Article Google Scholar
Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics. Inf. Retrieval 12(4), 461–486 (2009)
Article Google Scholar
Andrews, N., Bishop, M.: Learning invariant representations of social media users. In: EMNLP/IJCNLP (2019)
Google Scholar
Baumgartner, J., Zannettou, S., Keegan, B., Squire, M., Blackburn, J.: The pushshift reddit dataset. ArXiv abs/2001.08435 (2020)
Google Scholar
Behrisch, M., et al.: Quality metrics for information visualization. In: Computer Graphics Forum. Wiley Online Library, vol. 37, pp. 625–662 (2018)
Google Scholar
Benton, A., Arora, R., Dredze, M.: Learning multiview embeddings of twitter users. In: 54th Annual Meeting of the ACL (Volume 2: Short Papers), pp. 14–19 (2016)
Google Scholar
Coenen, A., Reif, E., Yuan, A., Kim, B., Pearce, A., Viégas, F.B., Wattenberg, M.: Visualizing and measuring the geometry of Bert. In: NeurIPS (2019)
Google Scholar
Coscia, A.: Reddit suspicious accounts dataset (2018). https://github.com/ALCC01/reddit-suspicious-accounts. Accessed 20 Apr 2019
Crothers, E., Japkowicz, N., Viktor, H.L.: Towards ethical content-based detection of online influence campaigns. In: IEEE MLSP 2019, pp. 1–6 (2019). https://doi.org/10.1109/MLSP.2019.8918842
Crothers, E.: Ethical detection of online influence campaigns using transformer language models. université d’Ottawa/University of Ottawa (2020)
Google Scholar
Crothers, E.: Muta-2021 (2021). https://github.com/ecrows/MUTA-2021
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers. CoRR abs/1810.04805 (2018). http://arxiv.org/abs/1810.04805
Explosion: Spacy python library. https://github.com/explosion/spaCy (2019). Version 2.0.16
Fornacciari, P., Mordonini, M., Poggi, A., Sani, L., Tomaiuolo, M.: A holistic system for troll detection on twitter. Comput. Hum. Behav. 89, 258–268 (2018)
Article Google Scholar
Gao, H., Hu, J., Wilson, C., Li, Z., Chen, Y., Zhao, B.Y.: Detecting and characterizing social spam campaigns. In: Proceedings of ACM IMC 2010, p. 35–47. ACM, New York, NY, USA (2010). https://doi.org/10.1145/1879141.1879147
Gencoglu, O.: Deep representation learning for clustering of health tweets. CoRR abs/1901.00439 (2019). http://arxiv.org/abs/1901.00439
Gleicher, N.: Removing coordinated inauthentic behavior (2020). https://about.fb.com/news/2020/07/removing-political-coordinated-inauthentic-behavior/
Hleg, E.H.L.E.G.o.A.: Ethics guidelines for trustworthy AI (2019). https://ec.europa.eu/digital-single-market/en/news/ethics-guidelines-trustworthy-ai
Huffman, S.: Reddit 2017 transparency report findings (2018). Accessed 23 May 2019
Google Scholar
Kaminski, M., Malgieri, G.: Algo. impact assessments under the GDPR: Producing multi-layered explanations. SSRN (2019). https://doi.org/10.2139/ssrn.3456224
Kennedy, S., Walsh, N., Sloka, K., McCarren, A., Foster, J.: Fact or factitious? contextualized opinion spam detection. In: ACL 57: Student Research Workshop. ACL, Florence, Italy, pp. 344–350 (2019). https://doi.org/10.18653/v1/P19-2048
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008). http://www.jmlr.org/papers/v9/vandermaaten08a.html
McInnes, L.: Parameter selection for HDBSCAN (2016). https://hdbscan.readthedocs.io/en/latest/parameter_selection.html
McInnes, L., Healy, J.: UMAP: Uniform Manifold Approximation and Projection for dimension reduction. ArXiv abs/1802.03426 (2018)
Google Scholar
McInnes, L., Healy, J., Astels, S.: HDBSCAN: Hierarchical Density based clustering. JOSS 2(11) (2017). https://doi.org/10.21105/joss.00205, https://doi.org/10.21105
Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese bert-networks. In: EMNLP/IJCNLP (2019)
Google Scholar
Reimers, N., Schiller, B., Beck, T., Daxenberger, J., Stab, C., Gurevych, I.: Classification and clustering of arguments with contextualized word embeddings. In: ACL 57, pp. 567–578. ACL, Florence, Italy (2019). https://doi.org/10.18653/v1/P19-1054
Ribeiro, M., Calais, P., Santos, Y., Almeida, V., Meira Jr, W.: Characterizing and detecting hateful users on twitter. In: ICWSM, vol. 12 (2018)
Google Scholar
Foundation of evaluation: van Rijsbergen. J. Documentation 30, 365–373 (1974)
Google Scholar
Rosales-Méndez, H., Ramírez-Cruz, Y.: CICE-BCubed: a new evaluation measure for overlapping clustering algorithms. In: Ruiz-Shulcloper, J., Sanniti di Baja, G. (eds.) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, CIARP 2013. LNCS, vol. 8258, pp. 157–164. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41822-8_20
Singh, K., Shakya, H., Biswas, B.: Clustering of people in social network based on textual similarity. Perspect. Sci. 8, 570–573 (2016). https://doi.org/10.1016/j.pisc.2016.06.023
Article Google Scholar
Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. JMLR 3, 583–617 (2003). https://doi.org/10.1162/153244303321897735
Article MathSciNet MATH Google Scholar
Twitter: Twitter elections integrity dataset. Internet (2019). Accessed 20 Apr 2019
Google Scholar
Yang, C., Harkreader, R., Zhang, J., Shin, S., Gu, G.: Analyzing spammers’ social networks for fun and profit: A case study of cyber criminal ecosystem on twitter. In: WWW 2012. p. 71–80. ACM, New York, NY, USA (2012). https://doi.org/10.1145/2187836.2187847

Download references

Author information

Authors and Affiliations

University of Ottawa, Ontario, Canada
Evan Crothers, Herna Viktor & Nathalie Japkowicz
American University, Washington DC, USA
Nathalie Japkowicz

Authors

Evan Crothers
View author publications
You can also search for this author in PubMed Google Scholar
Herna Viktor
View author publications
You can also search for this author in PubMed Google Scholar
Nathalie Japkowicz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Evan Crothers .

Editor information

Editors and Affiliations

University of Central Florida, Orlando, FL, USA
David Mohaisen
Kent State University, Kent, OH, USA
Ruoming Jin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Crothers, E., Viktor, H., Japkowicz, N. (2021). Mean User-Text Agglomeration (MUTA): Practical User Representation and Visualization for Detection of Online Influence Operations. In: Mohaisen, D., Jin, R. (eds) Computational Data and Social Networks. CSoNet 2021. Lecture Notes in Computer Science(), vol 13116. Springer, Cham. https://doi.org/10.1007/978-3-030-91434-9_27

Download citation

DOI: https://doi.org/10.1007/978-3-030-91434-9_27
Published: 04 December 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91433-2
Online ISBN: 978-3-030-91434-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics