Skip to main content

Topic Modeling for Tracking COVID-19 Communication on Twitter

  • Conference paper
  • First Online:
Information and Software Technologies (ICIST 2022)

Abstract

In this study, we analyze the trends of COVID-19 related communication in Croatian language on Twitter. First, we prepare a dataset of 147,028 tweets about COVID-19 posted during the first three waves of the pandemic, and then perform an analysis in three steps. In the first step, we train the LDA model and calculate the coherence values of the topics. We identify seven topics and report the ten most frequent words for each topic. In the second step, we analyze the proportion of tweets in each topic and report how these trends change over time. In the third step, we study spreading properties for each topic. The results show that all seven topics are evenly distributed across the three pandemic waves. The topic “vaccination” stands out with the change in percentage from 14.6% tweets in the first wave to 25.7% in the third wave. The obtained results contribute to a better understanding of pandemic communication in social media in Croatia.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Babić, K., Petrović, M., Beliga, S., Martinčić-Ipšić, S., Jarynowski, A., Meštrović, A.: COVID-19-related communication on twitter: analysis of the croatian and polish attitudes. In: Yang, X.-S., Sherratt, S., Dey, N., Joshi, A. (eds.) Proceedings of Sixth International Congress on Information and Communication Technology. LNNS, vol. 216, pp. 379–390. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-1781-2_35

    Chapter  Google Scholar 

  2. Babić, K., Petrović, M., Beliga, S., Martinčić-Ipšić, S., Matešić, M., Meštrović, A.: Characterisation of COVID-19-related tweets in the Croatian language: framework based on the Cro-CoV-cseBERT model. Appl. Sci. 11(21), 10442 (2021). https://doi.org/10.3390/app112110442

    Article  Google Scholar 

  3. Beliga, S., Martinčić-Ipšić, S., Matešić, M., Petrijevčanin Vuksanović, I., Meštrović, A.: Infoveillance of the croatian online media during the covid-19 pandemic: one-year longitudinal study using natural language processing. JMIR Public Health Surveill. 7(12), e31540 (2021). https://doi.org/10.2196/31540, https://publichealth.jmir.org/2021/12/e31540

  4. Beliga, S., Meštrović, A., Martinčić-Ipšić, S.: Selectivity-based keyword extraction method. Int. J. Semant. Web Inf. Syst. 12(3), 1–26 (2016). https://doi.org/10.4018/ijswis.2016070101

    Article  Google Scholar 

  5. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)

    MATH  Google Scholar 

  6. Bogović, P.K., Meštrović, A., Beliga, S., Martinčić-Ipšić, S.: Topic modelling of Croatian news during COVID-19 pandemic. In: 2021 44th International Convention on Information, Communication and Electronic Technology (MIPRO). IEEE (2021). https://doi.org/10.23919/mipro52101.2021.9597125

  7. Bunker, D.: Who do you trust? the digital destruction of shared situational awareness and the COVID-19 infodemic. Int. J. Inf. Manag. 55, 102201 (2020). https://doi.org/10.1016/j.ijinfomgt.2020.102201

    Article  Google Scholar 

  8. Cinelli, M., et al.: The COVID-19 social media infodemic. Sci. Rep. 10(1), 1–10 (2020). https://doi.org/10.1038/s41598-020-73510-5

    Article  MathSciNet  Google Scholar 

  9. Cuello-Garcia, C., Pérez-Gaxiola, G., van Amelsvoort, L.: Social media can have an impact on how we manage and investigate the COVID-19 pandemic. J. Clin. Epidemiol. 127, 198–201 (2020). https://doi.org/10.1016/j.jclinepi.2020.06.028

    Article  Google Scholar 

  10. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990). https://doi.org/10.1002/(sici)1097-4571(199009)41:6<391::aid-asi1>3.0.co;2-9

    Article  Google Scholar 

  11. Gallotti, R., Valle, F., Castaldo, N., Sacco, P., Domenico, M.D.: Assessing the risks of ‘infodemics’ in response to COVID-19 epidemics. Nat. Hum. Behav. 4(12), 1285–1293 (2020). https://doi.org/10.1038/s41562-020-00994-6

    Article  Google Scholar 

  12. Glik, D.C.: Risk communication for public health emergencies. Ann. Rev. Public Health 28(1), 33–54 (2007). https://doi.org/10.1146/annurev.publhealth.28.021406.144123

    Article  Google Scholar 

  13. Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Nat. Acad. Sci. 101(suppl–1), 5228–5235 (2004). https://doi.org/10.1073/pnas.0307752101

    Article  Google Scholar 

  14. Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR 1999. ACM Press (1999). https://doi.org/10.1145/312624.312649

  15. Ljubešić, N., Dobrovoljc, K.: What does neural bring? analysing improvements in morphosyntactic annotation and lemmatisation of Slovenian, Croatian and Serbian. In: Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing, pp. 29–34. Association for Computational Linguistics, Florence, Italy (2019). https://doi.org/10.18653/v1/W19-3704, https://www.aclweb.org/anthology/W19-3704

  16. Lwin, M.O., et al.: Global sentiments surrounding the COVID-19 pandemic on twitter: analysis of twitter trends. JMIR Public Health Surveill. 6(2), e19447 (2020). https://doi.org/10.2196/19447

  17. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press (2008). https://doi.org/10.1017/CBO9780511809071

  18. Martinčić-Ipšić, S., Miličić, T., Todorovski, L.: The influence of feature representation of text on the performance of document classification. Appl. Sci. 9(4), 743 (2019). https://doi.org/10.3390/app9040743

    Article  Google Scholar 

  19. Newman, D., Lau, J.H., Grieser, K., Baldwin, T.: Automatic evaluation of topic coherence. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 100–108. HLT 2010, Association for Computational Linguistics, USA (2010)

    Google Scholar 

  20. O’Callaghan, D., Greene, D., Carthy, J., Cunningham, P.: An analysis of the coherence of descriptors in topic modeling. Expert Syst. Appl. 42(13), 5645–5657 (2015). https://doi.org/10.1016/j.eswa.2015.02.055

    Article  Google Scholar 

  21. Organization, P.A.H.: Understanding the infodemic and misinformation in the fight against COVID-19 (2020)

    Google Scholar 

  22. Park, H.W., Park, S., Chong, M.: Conversations and medical news frames on twitter: infodemiological study on COVID-19 in south Korea. J. Med. Internet Res. 22(5), e18897 (2020). https://doi.org/10.2196/18897

    Article  Google Scholar 

  23. Pulido, C.M., Villarejo-Carballido, B., Redondo-Sama, G., Gómez, A.: COVID-19 infodemic: more retweets for science-based information on coronavirus than for false information. Int. Sociol. 35(4), 377–392 (2020). https://doi.org/10.1177/0268580920914755

    Article  Google Scholar 

  24. Qi, P., Zhang, Y., Zhang, Y., Bolton, J., Manning, C.D.: Stanza: a python natural language processing toolkit for many human languages. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations (2020)

    Google Scholar 

  25. Rehurek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. CiteSeerX (2010)

    Google Scholar 

  26. Xia, C., et al.: A new coupled disease-awareness spreading model with mass media on multiplex networks. Inf. Sci. 471, 185–200 (2019). https://doi.org/10.1016/j.ins.2018.08.050

    Article  MathSciNet  MATH  Google Scholar 

  27. Xue, J., Chen, J., Chen, C., Zheng, C., Li, S., Zhu, T.: Public discourse and sentiment during the COVID 19 pandemic: using latent dirichlet allocation for topic modeling on twitter. PLoS One 15(9), e0239441 (2020). https://doi.org/10.1371/journal.pone.0239441

  28. Zarocostas, J.: How to fight an infodemic. Lancet 395(10225), 676 (2020). https://doi.org/10.1016/s0140-6736(20)30461-x

    Article  Google Scholar 

Download references

Acknowledgement

This work has been supported in part by the Croatian Science Foundation under the project IP-CORONA-04-2061, “Multilayer Framework for the Information Spreading Characterization in Social Media during the COVID-19 Crisis” (InfoCoV) and by University of Rijeka projects number uniri-drustv-18-20 and uniri-drustv-18-38. PKB is fully supported by Croatian Science Foundation under the project DOK-2021-02.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Petar Kristijan Bogović .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bogović, P.K., Meštrović, A., Martinčić-Ipšić, S. (2022). Topic Modeling for Tracking COVID-19 Communication on Twitter. In: Lopata, A., Gudonienė, D., Butkienė, R. (eds) Information and Software Technologies. ICIST 2022. Communications in Computer and Information Science, vol 1665. Springer, Cham. https://doi.org/10.1007/978-3-031-16302-9_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-16302-9_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-16301-2

  • Online ISBN: 978-3-031-16302-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics