Abstract
In this paper, we present a framework for assessing geopolitical news based on local sentiment and public attention. Our approach uses data from social media and local online press in Kenya, Nigeria, Senegal, and South Africa, considering both local languages and colonial languages (French and English). We focus on four main topics: Foreign Relations, Institutional Stability, Conflicts, and Nature and Pandemics, using specific keywords to retrieve relevant data. We construct a Pre-trained Multilingual BERT Model, fine-tuned for tasks like text classification and sentiment analysis, emphasizing African low-resource languages. Our experiments compare different embedding approaches, showing better performance with our domain-specific model, ToumBERT, compared to the multilingual BERT-base model. The proposed Geopolitical Risk measurement methodology employs three indices: Novelty, Severity and Reach while Novelty is related to the newness of a geopolitical controversial topic, the Severity, to the roughness of comments in front of this controversy and the Reach to its media coverage. Low scores in these indices signal potential political issues in specific topics, regions and dates. To facilitate monitoring the progression of the constructed metrics, we have implemented a dashboard that visualizes their temporal evolution. This dashboard supports multiple filters (dates, topics, countries, etc.) and allows direct access to the original documents as the primary information source.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of Data and Materials
The data used in this work are all open sources. They are referenced in the body of the article.
Notes
Thanks to Google for granting us these resources through the Google TPU for Researchers program.
References
Adewumi TP, Liwicki F, Liwicki M (2020) The challenge of diacritics in yoruba embeddings
Ahmad I, Yousaf M, Yousaf S, Ahmad MO (2020) Fake news detection using machine learning ensemble methods. Complexity 2020:1–11
Al Sharou K, Li Z, Specia L (2021) Towards a better understanding of noise in natural language processing. In: Proceedings of the international conference on recent advances in natural language processing (RANLP 2021), pages 53–62, Held Online, INCOMA Ltd
Alabi JO, Adelani DI, Mosbach M, Klakow D (2022) Adapting pre-trained language models to African languages via multilingual adaptive fine-tuning. In: Proceedings of the 29th international conference on computational linguistics, pages 4336–4349, Gyeongju, Republic of Korea, International Committee on Computational Linguistics
Ayyad K, Lugo-Ocando J (2023) Reporters’ agency and (de) escalation during the 2011 uprising in egypt: Re-writing the historical role of the news media during the arab spring. Online J Commun Media Technol 13(3):e202330
Bilal M, Almazroi AA (2022) Effectiveness of fine-tuned bert model in classification of helpful and unhelpful online customer reviews. Electron Commerce Res 23(4):2737–2757
Borms S, Boudt K, Holle FV, Willems J (2020) Semi-supervised text mining for monitoring the news about the ESG performance of companies. SSRN Electronic J
Boutilier Robert G, Bahr Kyle (October 2020) A natural language processing approach to social license management. Sustainability 12(20):8441
Chiovaro Megan, Windsor Leah C, Windsor Alistair, Paxton Alexandra (July2021) Online social cohesion reflects real-world group action in syria during the arab spring. PLOS ONE 16(7):e0254087
Deraman NA, Buja AG, Samah KAFA, Jono MNHH, Isa MAM, Saad S (2021) A social media mining using topic modeling and sentiment analysis on tourism in malaysia during covid19. IOP Conference Series: Earth Environ Sci 704(1)
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Jill Burstein, Christy Doran, and Thamar Solorio, editors, Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, Vol 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics
Du X, Bian J, Prosperi M (2019) An operational deep learning pipeline for classifying life events from individual tweets. In: Information management and big data, pages 54–66. Springer International Publishing
Goodman Leo A (1961) Snowball sampling. Ann Math Stat 32(1):148–170
Grootendorst M (2022) Bertopic: neural topic modeling with a class-based tf-idf procedure
Issam A, Mrini K (2021) Goud.ma: a news article dataset for summarization in moroccan darija. In 3rd Workshop on African natural language processing
Kamruzzaman MM (2022) Impact of social media on geopolitics and economic growth: mitigating the risks by developing artificial intelligence and cognitive computing tools. Computat Intell Neurosci 2022:1–12
Kejriwal M (2021) Link prediction between structured geopolitical events: models and experiments. Front Big Data 4
Khanam Z, Alwasel BN, Sirafi H, Rashid M (2021) Fake news detection using machine learning approaches. IOP Conference Series: Mater Sci Eng 1099(1):012040
Korhan Kocak, Özgür Kıbrıs (2022) Social media and press freedom. British Journal of Political Science 53(1):140–162
Kudo T, Richardson J (2018) SentencePiece: a simple and language independent subword tokenizer and detokenizer for neural text processing. In: Proceedings of the 2018 conference on empirical methods in natural language processing: system demonstrations, pages 66–71, Brussels, Belgium. Association for Computational Linguistics
Kumar A, Makhija P, Gupta A (2020) Noisy text data: Achilles’ heel of BERT. In: Wei Xu, Alan Ritter, Tim Baldwin, and Afshin Rahimi, editors, Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pp 16–21, Online. Association for Computational Linguistics
Ladani DJ, Desai NP (2020) Stopword identification and removal techniques on tc and ir applications: a survey. In: 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), pp 466–472
Li Z, Zou Y, Zhang C, Zhang Q, Wei Z (2021) Learning implicit sentiment in aspect-based sentiment analysis with supervised contrastive pre-training. In: Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih, eds, Proceedings of the 2021 conference on empirical methods in natural language processing, pp 246–256, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics
Malo P, Sinha A, Korhonen P, Wallenius J, Takala P (2014) Good debt or bad debt: detecting semantic orientations in economic texts. J Assoc Inf Sci Technol 65
McInnes L, Healy J, Astels S (2017) hdbscan: Hierarchical density based clustering. J Open Source Softw 2(11):205
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space
Muhammad SH, Yimam S, Abdulmumin I, Ahmad IS, Ousidhoum N, Ayele A, Adelani D, Ruder S, Beloucif M, Bello SB, Mohammad SM (2023) SemEval-2023 task 12: Sentiment analysis for african languages (AfriSenti-SemEval). In: Proceedings of the 17th international workshop on semantic evaluation (SemEval-2023)
NguyenHuu T, Örsal DK (2023) Geopolitical risks and financial stress in emerging economies. World Econ 47(1):217–237
Nie Y, Tian Y, Wan X, Song Y, Dai B (2020) Named entity recognition for social media texts with semantic augmentation. In: Proceedings of the 2020 conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1383–1391, Online. Association for Computational Linguistics
NLLB Team, Costa-jussà MR, Cross J, Onur undefinedelebi, Elbayad M, Heafield K, Heffernan K, Kalbassi E, Lam J, Licht D, Maillard J, Sun A, Wang S, Wenzek G, Youngblood A, Akula B, Barrault L, Gonzalez GM, Hansanti P, Hoffman J, Jarrett S, Sadagopan KR, Rowe D, Spruit S, Tran C, Andrews P, Ayan NF, Bhosale S, Edunov S, Fan A, Gao C, Goswami V, Guzmán F, Koehn P, Mourachko A, Ropers C, Saleem S, Schwenk H, Wang J (2022) No language left behind: scaling human-centered machine translation
Ogueji K, Zhu Y, Lin J (2021) Small data? no problem! exploring the viability of pretrained multilingual language models for low-resourced languages. In: Proceedings of the 1st workshop on multilingual representation learning, pp 116–126, Punta Cana, Dominican Republic. Association for Computational Linguistics
Orife I (2018) Attentive sequence-to-sequence learning for diacritic restoration of yorùbá language text. In: Interspeech 2018, interspeech_2018. ISCA
Pikatza-Gorrotxategi N, Borregan-Alvarado J, Ruiz-de-la Torre-Acha A, Alvarez-Meaza I (2024) News and esg investment criteria: What’s behind it? Soc Netw Anal Mining 14(1)
Qiao Y, Xiong C, Liu Z, Liu Z (2019) Understanding the behaviors of bert in ranking
Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21(1)
Reimers N, Gurevych I (2019) Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan, eds, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, Hong Kong, China. Association for Computational Linguistics
Shikali CS, Mokhosi R (2020) Enhancing african low-resource languages: Swahili data for language modelling. Data Brief 31:105951
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, eds, Advances in neural information processing systems, vol 30. Curran Associates, Inc
Xu C, Paris C, Sparks R, Nepal S, VanderLinden K (2020) Assessing social license to operate from the public discourse on social media. In: Proceedings of the 28th international conference on computational linguistics: Industry Track, pages 146–159, Online. International Committee on Computational Linguistics
Xinze Yang, Chunkai Zhang, Yizhi Sun, Kairui Pang, Luru Jing, Shiyun Wa, Chunli Lv (2023) Finchain-bert: a high-accuracy automatic fraud detection model based on nlp methods for financial scenarios. Information 14(9):499
Xiangsen Z, Wu Z, Ke L, Zengshun Z, Jinhao W, Wu C (2023) Text sentiment classification based on BERT embedding and sliced multi-head self-attention bi-GRU. Sensors 23(3):1481
Zouhar V, Meister C, Gastaldi J, Du L, Vieira T, Sachan M, Cotterell R (2023) A formal perspective on byte-pair encoding. In: Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, eds, Findings of the Association for Computational Linguistics: ACL 2023, pp 598–614, Toronto, Canada. Association for Computational Linguistics
Acknowledgements
Special thank to the Cloud TPU Research initiative by Google by providing us the TPU nodes necessary to pre-train the model.
Funding
No special funds were necessarily allocated for the completion of this work.
Author information
Authors and Affiliations
Contributions
Abdou Mohamed Naira wrote the main manuscript text and designed the different pipelines. Imade Benellam and Youcef Rahmani reviewed the manuscript and contributed to the construction of the indices and on the scoping of this study.
Corresponding author
Ethics declarations
Conflict of Interest
We hereby declare that we have no conflict of interest related to the subject matter discussed in this paper. We affirm that we do not have any financial, personal, or professional affiliations that could be perceived as a conflict of interest regarding the research, analysis, or conclusions presented in this document. We have conducted this work with integrity and objectivity, ensuring that the information provided is accurate and unbiased.
Ethics Approval and Consent to Participate
All participants were fully informed of the study objectives, procedures, possible risks, and anticipated benefits before giving informed consent. Informed consent was obtained from each participant before their inclusion in the study. Participants were informed of their right to withdraw at any time without facing negative consequences. The data collected was treated confidentially, in accordance with ethical standards, and will be used for research purposes only. This ethical approval and participant consent demonstrates our commitment to rigorous and ethical research practices, ensuring the protection of the rights and well-being of the individuals involved in this study.
Consent for Publication
All individuals who are acknowledged or identifiable in this research article have provided consent for the publication of information related to their participation. Participants were informed that their data would be used for research purposes and could be included in scientific publications. Any potentially identifiable information will be handled with the utmost confidentiality and will not be disclosed without the explicit consent of the individuals involved.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Abdou Mohamed, N., Benelallam, I. & Rahmani, Y. Monitoring african geopolitics: a multilingual sentiment and public attention framework. Appl Intell 55, 89 (2025). https://doi.org/10.1007/s10489-024-05905-0
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-024-05905-0