Abstract
In this study, we analyze the trends of COVID-19 related communication in Croatian language on Twitter. First, we prepare a dataset of 147,028 tweets about COVID-19 posted during the first three waves of the pandemic, and then perform an analysis in three steps. In the first step, we train the LDA model and calculate the coherence values of the topics. We identify seven topics and report the ten most frequent words for each topic. In the second step, we analyze the proportion of tweets in each topic and report how these trends change over time. In the third step, we study spreading properties for each topic. The results show that all seven topics are evenly distributed across the three pandemic waves. The topic “vaccination” stands out with the change in percentage from 14.6% tweets in the first wave to 25.7% in the third wave. The obtained results contribute to a better understanding of pandemic communication in social media in Croatia.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Babić, K., Petrović, M., Beliga, S., Martinčić-Ipšić, S., Jarynowski, A., Meštrović, A.: COVID-19-related communication on twitter: analysis of the croatian and polish attitudes. In: Yang, X.-S., Sherratt, S., Dey, N., Joshi, A. (eds.) Proceedings of Sixth International Congress on Information and Communication Technology. LNNS, vol. 216, pp. 379–390. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-1781-2_35
Babić, K., Petrović, M., Beliga, S., Martinčić-Ipšić, S., Matešić, M., Meštrović, A.: Characterisation of COVID-19-related tweets in the Croatian language: framework based on the Cro-CoV-cseBERT model. Appl. Sci. 11(21), 10442 (2021). https://doi.org/10.3390/app112110442
Beliga, S., Martinčić-Ipšić, S., Matešić, M., Petrijevčanin Vuksanović, I., Meštrović, A.: Infoveillance of the croatian online media during the covid-19 pandemic: one-year longitudinal study using natural language processing. JMIR Public Health Surveill. 7(12), e31540 (2021). https://doi.org/10.2196/31540, https://publichealth.jmir.org/2021/12/e31540
Beliga, S., Meštrović, A., Martinčić-Ipšić, S.: Selectivity-based keyword extraction method. Int. J. Semant. Web Inf. Syst. 12(3), 1–26 (2016). https://doi.org/10.4018/ijswis.2016070101
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)
Bogović, P.K., Meštrović, A., Beliga, S., Martinčić-Ipšić, S.: Topic modelling of Croatian news during COVID-19 pandemic. In: 2021 44th International Convention on Information, Communication and Electronic Technology (MIPRO). IEEE (2021). https://doi.org/10.23919/mipro52101.2021.9597125
Bunker, D.: Who do you trust? the digital destruction of shared situational awareness and the COVID-19 infodemic. Int. J. Inf. Manag. 55, 102201 (2020). https://doi.org/10.1016/j.ijinfomgt.2020.102201
Cinelli, M., et al.: The COVID-19 social media infodemic. Sci. Rep. 10(1), 1–10 (2020). https://doi.org/10.1038/s41598-020-73510-5
Cuello-Garcia, C., Pérez-Gaxiola, G., van Amelsvoort, L.: Social media can have an impact on how we manage and investigate the COVID-19 pandemic. J. Clin. Epidemiol. 127, 198–201 (2020). https://doi.org/10.1016/j.jclinepi.2020.06.028
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990). https://doi.org/10.1002/(sici)1097-4571(199009)41:6<391::aid-asi1>3.0.co;2-9
Gallotti, R., Valle, F., Castaldo, N., Sacco, P., Domenico, M.D.: Assessing the risks of ‘infodemics’ in response to COVID-19 epidemics. Nat. Hum. Behav. 4(12), 1285–1293 (2020). https://doi.org/10.1038/s41562-020-00994-6
Glik, D.C.: Risk communication for public health emergencies. Ann. Rev. Public Health 28(1), 33–54 (2007). https://doi.org/10.1146/annurev.publhealth.28.021406.144123
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Nat. Acad. Sci. 101(suppl–1), 5228–5235 (2004). https://doi.org/10.1073/pnas.0307752101
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR 1999. ACM Press (1999). https://doi.org/10.1145/312624.312649
Ljubešić, N., Dobrovoljc, K.: What does neural bring? analysing improvements in morphosyntactic annotation and lemmatisation of Slovenian, Croatian and Serbian. In: Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing, pp. 29–34. Association for Computational Linguistics, Florence, Italy (2019). https://doi.org/10.18653/v1/W19-3704, https://www.aclweb.org/anthology/W19-3704
Lwin, M.O., et al.: Global sentiments surrounding the COVID-19 pandemic on twitter: analysis of twitter trends. JMIR Public Health Surveill. 6(2), e19447 (2020). https://doi.org/10.2196/19447
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press (2008). https://doi.org/10.1017/CBO9780511809071
Martinčić-Ipšić, S., Miličić, T., Todorovski, L.: The influence of feature representation of text on the performance of document classification. Appl. Sci. 9(4), 743 (2019). https://doi.org/10.3390/app9040743
Newman, D., Lau, J.H., Grieser, K., Baldwin, T.: Automatic evaluation of topic coherence. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 100–108. HLT 2010, Association for Computational Linguistics, USA (2010)
O’Callaghan, D., Greene, D., Carthy, J., Cunningham, P.: An analysis of the coherence of descriptors in topic modeling. Expert Syst. Appl. 42(13), 5645–5657 (2015). https://doi.org/10.1016/j.eswa.2015.02.055
Organization, P.A.H.: Understanding the infodemic and misinformation in the fight against COVID-19 (2020)
Park, H.W., Park, S., Chong, M.: Conversations and medical news frames on twitter: infodemiological study on COVID-19 in south Korea. J. Med. Internet Res. 22(5), e18897 (2020). https://doi.org/10.2196/18897
Pulido, C.M., Villarejo-Carballido, B., Redondo-Sama, G., Gómez, A.: COVID-19 infodemic: more retweets for science-based information on coronavirus than for false information. Int. Sociol. 35(4), 377–392 (2020). https://doi.org/10.1177/0268580920914755
Qi, P., Zhang, Y., Zhang, Y., Bolton, J., Manning, C.D.: Stanza: a python natural language processing toolkit for many human languages. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations (2020)
Rehurek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. CiteSeerX (2010)
Xia, C., et al.: A new coupled disease-awareness spreading model with mass media on multiplex networks. Inf. Sci. 471, 185–200 (2019). https://doi.org/10.1016/j.ins.2018.08.050
Xue, J., Chen, J., Chen, C., Zheng, C., Li, S., Zhu, T.: Public discourse and sentiment during the COVID 19 pandemic: using latent dirichlet allocation for topic modeling on twitter. PLoS One 15(9), e0239441 (2020). https://doi.org/10.1371/journal.pone.0239441
Zarocostas, J.: How to fight an infodemic. Lancet 395(10225), 676 (2020). https://doi.org/10.1016/s0140-6736(20)30461-x
Acknowledgement
This work has been supported in part by the Croatian Science Foundation under the project IP-CORONA-04-2061, “Multilayer Framework for the Information Spreading Characterization in Social Media during the COVID-19 Crisis” (InfoCoV) and by University of Rijeka projects number uniri-drustv-18-20 and uniri-drustv-18-38. PKB is fully supported by Croatian Science Foundation under the project DOK-2021-02.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Bogović, P.K., Meštrović, A., Martinčić-Ipšić, S. (2022). Topic Modeling for Tracking COVID-19 Communication on Twitter. In: Lopata, A., Gudonienė, D., Butkienė, R. (eds) Information and Software Technologies. ICIST 2022. Communications in Computer and Information Science, vol 1665. Springer, Cham. https://doi.org/10.1007/978-3-031-16302-9_19
Download citation
DOI: https://doi.org/10.1007/978-3-031-16302-9_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16301-2
Online ISBN: 978-3-031-16302-9
eBook Packages: Computer ScienceComputer Science (R0)