Abstract
Language models are at the core of modern Natural Language Processing. We present a new BERT-style language model dedicated to political texts in Scandinavian languages. Concretely, we introduce SP-BERT, a model trained with parliamentary speeches in Norwegian, Swedish, Danish, and Icelandic. To show its utility, we evaluate its ability to predict the speakers’ party affiliation and explore language shifts of politicians transitioning between Cabinet and Opposition.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
Due to space limitation, we omit the detailed pre-processing steps.
- 7.
References
Abercrombie, G., Batista-Navarro, R.: Semantic change in the language of UK parliamentary debates. In: Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change, pp. 210–215. ACL (2019). https://doi.org/10.18653/v1/W19-4726
Barnes, J., Touileb, S., Øvrelid, L., Velldal, E.: Lexicon information in neural sentiment analysis: a multi-task learning approach. In: Proceedings of the 22nd Nordic Conference on Computational Linguistics, Turku, Finland (2019). https://www.aclweb.org/anthology/W19-6119/
Chen, W.F., Al Khatib, K., Wachsmuth, H., Stein, B.: Analyzing political bias and unfairness in news articles at different levels of granularity. In: Proceedings of the 4th Workshop on Natural Language Processing and Computational Social Science, pp. 149–154. ACL (2020). https://doi.org/10.18653/v1/2020.nlpcss-1.16
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL 2019, pp. 4171–4186. ACL (2019). https://doi.org/10.18653/v1/N19-1423
Doan, T.M., Kille, B., Gulla, J.A.: Using language models for classifying the party affiliation of political texts. In: Rosso, P., Basile, V., Martínez, R., Métais, E., Meziane, F. (eds.) NLDB 2022. LNCS, vol. 13286, pp. 382–393. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-08473-7_35
Goutte, C., Gaussier, E.: A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 345–359. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31865-1_25
Hoffman, M.D., Gelman, A., et al.: The no-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 15(1), 1593–1623 (2014)
Hu, Y., et al.: ConfliBERT: a pre-trained language model for political conflict and violence. In: Proceedings of NAACL 2022, pp. 5469–5482. ACL (2022). https://doi.org/10.18653/v1/2022.naacl-main.400
Hvingelby, R., Pauli, A.B., Barrett, M., Rosted, C., Lidegaard, L.M., Søgaard, A.: DaNE: a named entity resource for Danish. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 4597–4604 (2020)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kummervold, P.E., De la Rosa, J., Wetjen, F., Brygfjeld, S.A.: Operationalizing a national digital library: the case for a Norwegian transformer model. In: Proceedings of NoDaLiDa 2021, pp. 20–29 (2021)
Kutuzov, A., Barnes, J., Velldal, E., Øvrelid, L., Oepen, S.: Large-scale contextualised language modelling for Norwegian. In: Proceedings of NoDaLiDa 2021, pp. 30–40. Linköping University Electronic Press, Sweden, Reykjavik, Iceland (2021)
Lapponi, E., Søyland, M.G., Velldal, E., Oepen, S.: The talk of Norway: a richly annotated corpus of the Norwegian parliament, 1998–2016. Lang. Resour. Eval. 52(3), 873–893 (2018). https://doi.org/10.1007/s10579-018-9411-5
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Liu, Y., Zhang, X.F., Wegsman, D., Beauchamp, N., Wang, L.: POLITICS: pretraining with same-story article comparison for ideology prediction and stance detection. In: Findings of the Association for Computational Linguistics: NAACL 2022, pp. 1354–1374. ACL (2022). https://doi.org/10.18653/v1/2022.findings-naacl.101
Magnusson, M., Öhrvall, R., Barrling, K., Mimno, D.: Voices from the far right: a text analysis of Swedish parliamentary debates (2018)
Malmsten, M., Börjeson, L., Haffenden, C.: Playing with words at the national library of Sweden - making a Swedish BERT. CoRR abs/2007.01658 (2020)
Maronikolakis, A., Sánchez Villegas, D., Preotiuc-Pietro, D., Aletras, N.: Analyzing political parody in social media. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4373–4384. ACL (2020). https://doi.org/10.18653/v1/2020.acl-main.403
Rauh, C., Schwalbach, J.: The ParlSpeech V2 data set: full-text corpora of 6.3 million parliamentary speeches in the key legislative chambers of nine representative democracies (2020). https://doi.org/10.7910/DVN/L4OAKN
Snæbjarnarson, V., et al.: A warm start and a clean crawled corpus — a recipe for good language models. In: Proceedings of the 13rd Language Resources and Evaluation Conference, pp. 4356–4366. European Language Resources Association, Marseille (2022)
Solberg, P.E., Ortiz, P.: The Norwegian parliamentary speech corpus. arXiv preprint arXiv:2201.10881 (2022)
Steingrímsson, S., Barkarson, S., Örnólfsson, G.T.: IGC-Parl: icelandic corpus of parliamentary proceedings. In: Proceedings of the Second ParlaCLARIN Workshop, pp. 11–17. European Language Resources Association (2020). ISBN 979-10-95546-47-4
Virtanen, A., et al.: Multilingual is not enough: BERT for Finnish. arXiv preprint arXiv:1912.07076 (2019)
Walter, T., Kirschner, C., Eger, S., Glavaš, G., Lauscher, A., Ponzetto, S.P.: Diachronic analysis of German parliamentary proceedings: ideological shifts through the lens of political biases. In: 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 51–60 (2021). https://doi.org/10.1109/JCDL52503.2021.00017
Acknowledgements
This work is done as part of the Trondheim Analytica project and funded under Digital Transformation program at Norwegian University of Science and Technology (NTNU), 7034 Trondheim, Norway.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Doan, T.M., Kille, B., Gulla, J.A. (2023). SP-BERT: A Language Model for Political Text in Scandinavian Languages. In: Métais, E., Meziane, F., Sugumaran, V., Manning, W., Reiff-Marganiec, S. (eds) Natural Language Processing and Information Systems. NLDB 2023. Lecture Notes in Computer Science, vol 13913. Springer, Cham. https://doi.org/10.1007/978-3-031-35320-8_34
Download citation
DOI: https://doi.org/10.1007/978-3-031-35320-8_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-35319-2
Online ISBN: 978-3-031-35320-8
eBook Packages: Computer ScienceComputer Science (R0)