SP-BERT: A Language Model for Political Text in Scandinavian Languages

Doan, Tu My; Kille, Benjamin; Gulla, Jon Atle

doi:10.1007/978-3-031-35320-8_34

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13913))

Included in the following conference series:

International Conference on Applications of Natural Language to Information Systems

Abstract

Language models are at the core of modern Natural Language Processing. We present a new BERT-style language model dedicated to political texts in Scandinavian languages. Concretely, we introduce SP-BERT, a model trained with parliamentary speeches in Norwegian, Swedish, Danish, and Icelandic. To show its utility, we evaluate its ability to predict the speakers’ party affiliation and explore language shifts of politicians transitioning between Cabinet and Opposition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Using Language Models for Classifying the Party Affiliation of Political Texts

Automatically Detecting Political Viewpoints in Norwegian Text

BERTweet.BR: a pre-trained language model for tweets in Portuguese

Article 21 December 2024

Notes

1.
https://data.stortinget.no/om-datatjenesten/bruksvilkar/.
2.
https://data.riksdagen.se/data/anforanden/.
3.
https://huggingface.co/tumd/sp-bert.
4.
https://huggingface.co/bert-base-multilingual-cased.
5.
https://scikit-learn.org/stable/modules/generated/sklearn.utils.class_weight.compute_class_weight.html.
6.
Due to space limitation, we omit the detailed pre-processing steps.
7.
https://spacy.io.

References

Abercrombie, G., Batista-Navarro, R.: Semantic change in the language of UK parliamentary debates. In: Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change, pp. 210–215. ACL (2019). https://doi.org/10.18653/v1/W19-4726
Barnes, J., Touileb, S., Øvrelid, L., Velldal, E.: Lexicon information in neural sentiment analysis: a multi-task learning approach. In: Proceedings of the 22nd Nordic Conference on Computational Linguistics, Turku, Finland (2019). https://www.aclweb.org/anthology/W19-6119/
Chen, W.F., Al Khatib, K., Wachsmuth, H., Stein, B.: Analyzing political bias and unfairness in news articles at different levels of granularity. In: Proceedings of the 4th Workshop on Natural Language Processing and Computational Social Science, pp. 149–154. ACL (2020). https://doi.org/10.18653/v1/2020.nlpcss-1.16
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL 2019, pp. 4171–4186. ACL (2019). https://doi.org/10.18653/v1/N19-1423
Doan, T.M., Kille, B., Gulla, J.A.: Using language models for classifying the party affiliation of political texts. In: Rosso, P., Basile, V., Martínez, R., Métais, E., Meziane, F. (eds.) NLDB 2022. LNCS, vol. 13286, pp. 382–393. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-08473-7_35
Goutte, C., Gaussier, E.: A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 345–359. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31865-1_25
Chapter Google Scholar
Hoffman, M.D., Gelman, A., et al.: The no-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 15(1), 1593–1623 (2014)
MathSciNet MATH Google Scholar
Hu, Y., et al.: ConfliBERT: a pre-trained language model for political conflict and violence. In: Proceedings of NAACL 2022, pp. 5469–5482. ACL (2022). https://doi.org/10.18653/v1/2022.naacl-main.400
Hvingelby, R., Pauli, A.B., Barrett, M., Rosted, C., Lidegaard, L.M., Søgaard, A.: DaNE: a named entity resource for Danish. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 4597–4604 (2020)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kummervold, P.E., De la Rosa, J., Wetjen, F., Brygfjeld, S.A.: Operationalizing a national digital library: the case for a Norwegian transformer model. In: Proceedings of NoDaLiDa 2021, pp. 20–29 (2021)
Google Scholar
Kutuzov, A., Barnes, J., Velldal, E., Øvrelid, L., Oepen, S.: Large-scale contextualised language modelling for Norwegian. In: Proceedings of NoDaLiDa 2021, pp. 30–40. Linköping University Electronic Press, Sweden, Reykjavik, Iceland (2021)
Google Scholar
Lapponi, E., Søyland, M.G., Velldal, E., Oepen, S.: The talk of Norway: a richly annotated corpus of the Norwegian parliament, 1998–2016. Lang. Resour. Eval. 52(3), 873–893 (2018). https://doi.org/10.1007/s10579-018-9411-5
Article Google Scholar
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Liu, Y., Zhang, X.F., Wegsman, D., Beauchamp, N., Wang, L.: POLITICS: pretraining with same-story article comparison for ideology prediction and stance detection. In: Findings of the Association for Computational Linguistics: NAACL 2022, pp. 1354–1374. ACL (2022). https://doi.org/10.18653/v1/2022.findings-naacl.101
Magnusson, M., Öhrvall, R., Barrling, K., Mimno, D.: Voices from the far right: a text analysis of Swedish parliamentary debates (2018)
Google Scholar
Malmsten, M., Börjeson, L., Haffenden, C.: Playing with words at the national library of Sweden - making a Swedish BERT. CoRR abs/2007.01658 (2020)
Google Scholar
Maronikolakis, A., Sánchez Villegas, D., Preotiuc-Pietro, D., Aletras, N.: Analyzing political parody in social media. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4373–4384. ACL (2020). https://doi.org/10.18653/v1/2020.acl-main.403
Rauh, C., Schwalbach, J.: The ParlSpeech V2 data set: full-text corpora of 6.3 million parliamentary speeches in the key legislative chambers of nine representative democracies (2020). https://doi.org/10.7910/DVN/L4OAKN
Snæbjarnarson, V., et al.: A warm start and a clean crawled corpus — a recipe for good language models. In: Proceedings of the 13rd Language Resources and Evaluation Conference, pp. 4356–4366. European Language Resources Association, Marseille (2022)
Google Scholar
Solberg, P.E., Ortiz, P.: The Norwegian parliamentary speech corpus. arXiv preprint arXiv:2201.10881 (2022)
Steingrímsson, S., Barkarson, S., Örnólfsson, G.T.: IGC-Parl: icelandic corpus of parliamentary proceedings. In: Proceedings of the Second ParlaCLARIN Workshop, pp. 11–17. European Language Resources Association (2020). ISBN 979-10-95546-47-4
Google Scholar
Virtanen, A., et al.: Multilingual is not enough: BERT for Finnish. arXiv preprint arXiv:1912.07076 (2019)
Walter, T., Kirschner, C., Eger, S., Glavaš, G., Lauscher, A., Ponzetto, S.P.: Diachronic analysis of German parliamentary proceedings: ideological shifts through the lens of political biases. In: 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 51–60 (2021). https://doi.org/10.1109/JCDL52503.2021.00017

Download references

Acknowledgements

This work is done as part of the Trondheim Analytica project and funded under Digital Transformation program at Norwegian University of Science and Technology (NTNU), 7034 Trondheim, Norway.

Author information

Authors and Affiliations

Norwegian University of Science and Technology, Trondheim, Norway
Tu My Doan, Benjamin Kille & Jon Atle Gulla

Authors

Tu My Doan
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Kille
View author publications
You can also search for this author in PubMed Google Scholar
Jon Atle Gulla
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tu My Doan .

Editor information

Editors and Affiliations

Conservatoire National des Arts et Métiers, Paris, France
Elisabeth Métais
University of Derby, Derby, UK
Farid Meziane
Oakland University, Rochester, NY, USA
Vijayan Sugumaran
University of Derby, Derby, UK
Warren Manning
University of Derby, Derby, UK
Stephan Reiff-Marganiec

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Doan, T.M., Kille, B., Gulla, J.A. (2023). SP-BERT: A Language Model for Political Text in Scandinavian Languages. In: Métais, E., Meziane, F., Sugumaran, V., Manning, W., Reiff-Marganiec, S. (eds) Natural Language Processing and Information Systems. NLDB 2023. Lecture Notes in Computer Science, vol 13913. Springer, Cham. https://doi.org/10.1007/978-3-031-35320-8_34

Download citation

DOI: https://doi.org/10.1007/978-3-031-35320-8_34
Published: 14 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-35319-2
Online ISBN: 978-3-031-35320-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

SP-BERT: A Language Model for Political Text in Scandinavian Languages

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Using Language Models for Classifying the Party Affiliation of Political Texts

Automatically Detecting Political Viewpoints in Norwegian Text

BERTweet.BR: a pre-trained language model for tweets in Portuguese

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

SP-BERT: A Language Model for Political Text in Scandinavian Languages

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Using Language Models for Classifying the Party Affiliation of Political Texts

Automatically Detecting Political Viewpoints in Norwegian Text

BERTweet.BR: a pre-trained language model for tweets in Portuguese

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation