Small Languages and Big Models: Using ML to Generate Norwegian Language Social Media Content for Training Purposes

Aasen, Ole Joachim Arnesen; Lugo, Ricardo G.; Knox, Benjamin J.

doi:10.1007/978-3-031-61572-6_8

Ole Joachim Arnesen Aasen²⁶,
Ricardo G. Lugo^27,28 &
Benjamin J. Knox^26,28,29

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14695))

Included in the following conference series:

International Conference on Human-Computer Interaction

382 Accesses

Abstract

The advancement of language models has showcased their tremendous potential for both good purposes, and harmful misuse. However, the majority of research have been concentrated on high-resource languages, leaving much to be desired in low-resource languages. This article focuses on exploring the use of language models in Norwegian, a low-resource language. Addressing the threats these models pose in the context of influence operations in social media.

The methodology uses a mixed-methods approach, combining quantitative analysis and qualitative investigations. The quantitative analysis entails evaluating the performance of language models across various contexts, assessing their ability to generate perceived authentic content, and analyzing user responses to such generated content. The qualitative investigations involve conducting interviews and surveys to gather insights from participants, aiming to understand their experiences, perceptions, and concerns regarding the use of language models.

By investigating the use of language models in a low-resource language, this thesis aims to contribute to the advancement of natural language processing research in an underrepresented linguistic context. As well as exploring the use of these language models for training purposes in isolated social networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

The ChatGPT and Education Tweets Dataset

Virtual Linguistic Landscapes from Below: A Hashtag Analysis of the European Day of Languages

Cross-Cultural Implications of Large Language Models: An Extended Comparative Analysis

References

Ahmed, A.A.A., Aljabouh, A., Donepudi, P.K., Choi, M.S.: Detecting fake news using machine learning: a systematic literature review (2021)
Google Scholar
Brown, T.B., et al.: Language models are few-shot learners. CoRR https://arxiv.org/abs/2005.14165(2020)
Buchanan, B., Lohn, A., Musser, M., Sedova, K.: Truth, lies, and automation. Technical report. Center for Security and Emerging Technology (2021)
Google Scholar
Gereme, F., Zhu, W., Ayall, T., Alemu, D.: Combating fake news in “low-resource” languages: amharic fake news detection accompanied by resource crafting. Information (Basel) 12(1), 20 (2021)
Google Scholar
Goldstein, J.A., Chao, J., Grossman, S., Stamos, A., Tomz, M.: Can AI write persuasive propaganda? (2023). https://osf.io/preprints/socarxiv/fp87b/
Helkala, K.M., Rønnfeldt, C.F.: Understanding and gaining human resilience against negative effects of digitalization. In: Lehto, M., Neittaanmaki, P. (eds.) Cyber Security, vol. 56, pp. 79–91. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-91293-2_4
Chapter Google Scholar
Koch, T.K., Frischlich, L., Lermer, E.: Effects of fact-checking warning labels and social endorsement cues on climate change fake news credibility and engagement on social media. J. Appl. Social Psychol. (2023). https://doi.org/10.1111/jasp.12959
Kreps, S., McCain, R.M., Brundage, M.: All the news that’s fit to fabricate: Ai-generated text as a tool of media misinformation. J. Exp. Polit. Sci. 9(1), 104–117 (2022). https://doi.org/10.1017/XPS.2020.37
Article Google Scholar
Kummervold, P.E., De la Rosa, J., Wetjen, F., Brygfjeld, S.A.: Operationalizing a national digital library: the case for a Norwegian transformer model. In: Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), pp. 20–29 (2021). https://aclanthology.org/2021.nodalida-main.3/
Linvill, D.L., Warren, P.L.: Troll factories: manufacturing specialized disinformation on twitter. Polit. Commun. 37(4), 447–467 (2020)
Article Google Scholar
Mackey, R.R.: Information warfare (2014). https://www.oxfordbibliographies.com/view/document/obo-9780199791279/obo-9780199791279-0024.xml. Accessed 26 Apr 2022
Moravec, P.L., Minas, R.K., Dennis, A.R.: Fake news on social media: people believe what they want to believe when it makes no sense at all. MIS Q. 43(4) (2019)
Google Scholar
of Norway, N.L.: Nbailab/nb-gpt-j-6b - huggingface. https://huggingface.co/NbAiLab/nb-gpt-j-6B. Accessed 15 Feb 2024
Pew Research Center: Social media and news fact sheet. Technical report, Washington, D.C. (2022). https://www.pewresearch.org/journalism/fact-sheet/social-media-and-news-fact-sheet/
Riedel, B., Augenstein, I., Spithourakis, G.P., Riedel, S.: A simple but tough-to-beat baseline for the fake news challenge stance detection task (2018)
Google Scholar
Sanderson, Z., Brown, M.A., Bonneau, R., Nagler, J., Tucker, T.J.: Twitter flagged donald trump’s tweets with election misinformation: they continued to spread both on and off the platform (2021). https://doi.org/10.37016/mr-2020-77. https://misinforeview.hks.harvard.edu/article/twitter-flagged-donald-trumps-tweets-with-election-misinformation-they-continued-to-spread-both-on-and-off-the-platform/
Sharevski, F., Alsaadi, R., Jachim, P., Pieroni, E.: Misinformation warning labels: twitter’s soft moderation effects on covid-19 vaccine belief echoes (2021)
Google Scholar
Sivertsen, E.G., Hellum, N., A., B., Bjørnstad, L.B.: Hvordan gjøre samfunnet mer robust mot uønsket påvirkning i sosiale medier (2021). https://www.ffi.no/publikasjoner/arkiv/hvordan-gjore-samfunnet-mer-robust-mot-uonsket-pavirkning-i-sosiale-medier
Sütterlin, S., et al.: The role of it background for metacognitive accuracy, confidence and overestimation of deep fake recognition skills. Lect. Notes Comput. Sci. 13310, 103–119 (2022)
Google Scholar
Talwar, S., Dhir, A., Kaur, P., Zafar, N., Alrasheedy, M.: Why do people share fake news? associations between the dark side of social media use and fake news sharing behavior. J. Retail. Cons. Serv. 51 (2019)
Google Scholar
Talwar, S., Dhir, A., Singh, D., Virk, G.S., Salo, J.: Sharing of fake news on social media: application of the honeycomb framework and the third-person effect hypothesis. J. Retail. Consum. Serv. 57, 102197 (2020)
Article Google Scholar
Tarman, B., Yigit, M.F.: The impact of social media on globalization, democratization and participative citizenship. J. Soc. Sci. Educ. 12(1) (2012)
Google Scholar
Wang, B., Komatsuzaki, A.: GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model (2021). https://github.com/kingoflolz/mesh-transformer-jax

Download references

Author information

Authors and Affiliations

Department of Information Security and Communication Technology, Norwegian University of Science and Technology, Gjøvik, Norway
Ole Joachim Arnesen Aasen & Benjamin J. Knox
Center for Digital Forensics and Cyber Security, TalTech, Tallinn, Estonia
Ricardo G. Lugo
Faculty of Health, Welfare and Organisation, Østfold University College, Halden, Norway
Ricardo G. Lugo & Benjamin J. Knox
Norwegian Armed Forces Cyber Defence, Jørstadmoen, Norway
Benjamin J. Knox

Authors

Ole Joachim Arnesen Aasen
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo G. Lugo
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin J. Knox
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ole Joachim Arnesen Aasen .

Editor information

Editors and Affiliations

Soar Technology Inc., Orlando, FL, USA
Dylan D. Schmorrow
Katmai Government Services, Orlando, FL, USA
Cali M. Fidopiastis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aasen, O.J.A., Lugo, R.G., Knox, B.J. (2024). Small Languages and Big Models: Using ML to Generate Norwegian Language Social Media Content for Training Purposes. In: Schmorrow, D.D., Fidopiastis, C.M. (eds) Augmented Cognition. HCII 2024. Lecture Notes in Computer Science(), vol 14695. Springer, Cham. https://doi.org/10.1007/978-3-031-61572-6_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-61572-6_8
Published: 01 June 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-61571-9
Online ISBN: 978-3-031-61572-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Small Languages and Big Models: Using ML to Generate Norwegian Language Social Media Content for Training Purposes