Abstract
Since the beginning of 2024, several countries have been experiencing an outbreak of measles. In the modern-day Internet of Everything lifestyle, social media platforms such as YouTube and TikTok have gained widespread popularity on a global scale due to their ability to facilitate the easy creation and dissemination of videos. During virus outbreaks of the recent past, videos on social media platforms played a crucial role in keeping the global population informed and updated regarding various aspects of the outbreaks. As a result in the last few years, researchers from different disciplines have focused on the development of datasets of videos published on YouTube, TikTok, and similar websites. No prior work in this field has focused on the development of a dataset of videos about the ongoing outbreak of measles, published on social media platforms. The work of this paper aims to address this research gap and presents a dataset that contains the data of 4011 videos about the ongoing outbreak of measles published on 264 websites on the internet between January 1, 2024, and May 31, 2024, available at https://dx.doi.org/10.21227/40s8-xf63. These websites primarily include YouTube and TikTok, which account for 48.6% and 15.2% of the videos, respectively. The remainder of the websites include Instagram and Facebook as well as the websites of various global and local news organizations. For each of these videos, the URL of the video, title of the post, description of the post, and the date of publication of the video are presented as separate attributes in the dataset. After developing this dataset, sentiment analysis (using VADER), subjectivity analysis (using TextBlob), and fine-grain sentiment analysis (using DistilRoBERTa-base) of the video titles and video descriptions were performed. This included classifying each video title and video description into (i) one of the sentiment classes i.e. positive, negative, or neutral, (ii) one of the subjectivity classes i.e. highly opinionated, neutral opinionated, or least opinionated, and (iii) one of the fine-grain sentiment classes i.e. fear, surprise, joy, sadness, anger, disgust, or neutral. These results are presented as separate attributes in the dataset for the training and testing of machine learning algorithms for performing sentiment analysis or subjectivity analysis in this field as well as for other applications. Finally, this paper also presents a list of open research questions that may be investigated using this dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bester, J.C.: Measles and measles vaccination: a review. JAMA Pediatr. 170, 1209 (2016). https://doi.org/10.1001/jamapediatrics.2016.1787
Measles — United States, January 4–April 2 (2015). https://www.cdc.gov/mmwr/preview/mmwrhtml/mm6414a1.htm. Accessed 29 Mar 2024
Gastañaduy, P.A., Goodson, J.L., Panagiotakopoulos, L., Rota, P.A., Orenstein, W.A., Patel, M.: Measles in the 21st century: progress toward achieving and sustaining elimination. J. Infect. Dis. 224, S420–S428 (2021). https://doi.org/10.1093/infdis/jiaa793
Durrheim, D.N., Andrus, J.K., Tabassum, S., Bashour, H., Githanga, D., Pfaff, G.: A dangerous measles future looms beyond the COVID-19 pandemic. Nat. Med. 27, 360–361 (2021). https://doi.org/10.1038/s41591-021-01237-5
Soodejani, M.T., Basti, M., Tabatabaei, S.M., Rajabkhah, K.: Measles, mumps, and rubella (MMR) vaccine and COVID-19: a systematic review. Int. J. Mol. Epidemiol. Gen. 12, 35 (2021)
CDCGlobal: Global measles outbreaks. https://www.cdc.gov/globalhealth/measles/data/global-measles-outbreaks.html. Accessed 29 Mar 2024
Ouyang, S., Li, C., Li, X.: A peek into the future: Predicting the popularity of online videos. IEEE Access. 4, 3026–3033 (2016). https://doi.org/10.1109/access.2016.2580911
Weekly time spent with online video worldwide 2018–2023. https://www.statista.com/statistics/611707/online-video-time-spent/. Accessed 29 Mar 2024
Rosenthal, S.: Media literacy, scientific literacy, and science videos on the Internet. Front. Commun. 5 (2020). https://doi.org/10.3389/fcomm.2020.581585
Elgedawy, R., et al.: Security advice for parents and children about content filtering and circumvention as found on YouTube and TikTok (2024). http://arxiv.org/abs/2402.03255
Cuesta-Valiño, P., Gutiérrez-Rodríguez, P., Durán-Álamo, P.: Why do people return to video platforms? millennials and centennials on TikTok. Media Commun. 10, 198–207 (2022). https://doi.org/10.17645/mac.v10i1.4737
Mohsin, M.: 10 YouTube statistics that you need to know in 2023. https://www.oberlo.com/blog/youtube-statistics. Accessed 01 May 2024
Top websites in the World - March 2024 most visited & popular rankings. https://www.semrush.com/website/top/. Accessed 01 May 2024
Blogger, G.M.I.: Youtube statistics 2024 (demographics, users by country & more). https://www.globalmediainsight.com/blog/youtube-users-statistics/. Accessed 01 May 2024
YouTube app user engagement in selected markets 2023. https://www.statista.com/statistics/1287283/time-spent-youtube-app-selected-countries/. Accessed 01 May 2024
Biggest social media platforms 2024. https://www.statista.com/statistics/272014/global-social-networks-ranked-by-number-of-users/. Accessed 01 May 2024
TikTok users worldwide 2027. https://www.statista.com/forecasts/1142687/tiktok-users-worldwide. Accessed 01 May 2024
Most downloaded apps worldwide 2024. https://www.statista.com/statistics/1448008/top-downloaded-mobile-apps-worldwide/. Accessed 01 May 2024
Duarte, F.: Average time spent on TikTok statistics (2024). https://explodingtopics.com/blog/time-spent-on-tiktok. Accessed 01 May 2024
Lin, Y.: TikTok users by country. https://www.oberlo.com/statistics/tiktok-users-by-country. Accessed 01 May 2024
de Guzman, A.B., Mesana, J.C.B., Manuel, M.E., Arcega, K.C.A., Yumang, R.L.T., Miranda, K.N.V.: Examining intergenerational family members’ creative activities during COVID-19 lockdown via manifest content analysis of YouTube and TikTok videos. Educ. Gerontol. 48, 458–471 (2022). https://doi.org/10.1080/03601277.2022.2046372
Comeau, N., Abdelnour, A., Ashack, K.: Assessing public interest in Mpox via Google trends, YouTube, and TikTok. JMIR Dermatol. 6, e48827 (2023). https://doi.org/10.2196/48827
https://abcnews.go.com/Health/measles-outbreak-american-samoa-declared-public-health-emergency/story?id=98826831. Accessed 01 May 2024
Romania declares measles epidemic as infant dies in hospital. https://www.vaccinestoday.eu/stories/romania-declares-measles-epidemic-as-infant-dies-in-hospital/. Accessed 01 May 2024
Prater, E.: Measles cases are mounting in the US as the UK declares a ‘national incident’ over the disease. What parents need to know to keep their kids safe. https://fortune.com/well/2024/01/27/measles-cases-rise-us-uk-world-symptoms-vaccine-hesitancy-covid-pandemic/. Accessed 01 May 2024
Real, E., Shlens, J., Mazzocchi, S., Pan, X., Vanhoucke, V.: YouTube-BoundingBoxes: a large high-precision human-annotated data set for object detection in video. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2017)
Loh, F., Wamser, F., Poignée, F., Geißler, S., Hoßfeld, T.: YouTube dataset on mobile streaming for Internet traffic modeling and streaming analysis. Sci. Data. 9, 1–12 (2022). https://doi.org/10.1038/s41597-022-01418-y
Xu, N., et al.: YouTube-VOS: sequence-to-sequence video object segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. ECCV 2018. Lecture Notes in Computer Science(), vol. 11209. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_36
Li, F., Kim, T., Humayun, A., Tsai, D., Rehg, J.M.: Video segmentation by tracking many figure-ground segments. In: 2013 IEEE International Conference on Computer Vision. IEEE (2013)
Jain, S.D., Grauman, K.: Supervoxel-consistent foreground propagation in video. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol. 8692. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_43
Ochs, P., Malik, J., Brox, T.: Segmentation of moving objects by long-term video analysis. IEEE Trans. Pattern Anal. Mach. Intell. 36, 1187–1200 (2014). https://doi.org/10.1109/tpami.2013.242
Perazzi, F., Pont-Tuset, J., McWilliams, B., Van Gool, L., Gross, M., Sorkine-Hornung, A.: A benchmark dataset and evaluation methodology for video object segmentation. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2016)
Pont-Tuset, J., Perazzi, F., Caelles, S., Arbeláez, P., Sorkine-Hornung, A., Van Gool, L.: The 2017 DAVIS Challenge on Video Object Segmentation (2017). http://arxiv.org/abs/1704.00675
Lall, S., Agarwal, M., Sivakumar, R.: A YouTube dataset with user-level usage data: baseline characteristics and key insights. In: ICC 2020 - 2020 IEEE International Conference on Communications (ICC). IEEE (2020)
Le, T., Nguyen-Thi, M.-V., Le, H., Vo, Q.-T., Le, T., Nguyen, H.T.: EnTube: A Dataset for YouTube Video Engagement Analytics (2022). https://doi.org/10.21203/rs.3.rs-2085784/v1
Qian, Y., Sun, Y.: Tik Tok Actions: A Tik Tok-Derived Video Dataset for Human Action Recognition. http://arxiv.org/abs/2402.08875. Accessed 01 May 2024
Ng, L.H.X., Tan, J.Y.H., Tan, D.J.H., Lee, R.K.-W.: Will you dance to the challenge?: predicting user participation of TikTok challenges. In: Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. ACM, New York (2021)
Basch, C.H., Yalamanchili, B., Fera, J.: #climate change on TikTok: a content analysis of videos. J. Commun. Health 47, 163–167 (2022). https://doi.org/10.1007/s10900-021-01031-x
Fiallos, A., Fiallos, C., Figueroa, S.: Tiktok and education: Discovering knowledge through learning videos. In: 2021 Eighth International Conference on eDemocracy and eGovernment (ICEDEG), pp. 172–176. IEEE, Los Alamitos (2021)
Shutsko, A.: User-generated short video content in social media: a case study of TikTok. In: Meiselwitz, G. (eds.) Social Computing and Social Media. Participation, User Experience, Consumer Experience, and Applications of Social Computing. HCII 2020. Lecture Notes in Computer Science(), vol. 12195. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49576-3_8
Abdaljaleel, M., Barakat, M., Mahafzah, A., Hallit, R.R.: TikTok content on measles-rubella vaccine in Jordan: a cross-sectional study highlighting the spread of vaccine misinformation. JMIR Preprints (2023)
Hussain, A., Ali, S., Ahmed, M., Hussain, S.: The anti-vaccination movement: a regression in modern medicine. Cureus (2018). https://doi.org/10.7759/cureus.2919
Yiannakoulias, N., Slavik, C.E., Chase, M.: Expressions of pro - and anti-vaccine sentiment on YouTube. Vaccine 37, 2057–2064 (2019). https://doi.org/10.1016/j.vaccine.2019.03.001
YouTube data API. https://developers.google.com/youtube/v3. Accessed 07 Jun 2024
getcartermusic: No baby at all by THE MEASLES [music video]. https://www.youtube.com/watch?v=fr1H5j56kv4. Accessed 07 Jun 2024
Hutto, C., Gilbert, E.: VADER: a parsimonious rule-based model for sentiment analysis of social media text. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 8, pp. 216–225 (2014). https://doi.org/10.1609/icwsm.v8i1.14550
TextBlob: Simplified Text Processing — TextBlob 0.18.0.post0 documentation. https://textblob.readthedocs.io/. Accessed 01 May 2024
J-hartmann/emotion-english-distilroberta-base · hugging face. https://huggingface.co/j-hartmann/emotion-english-distilroberta-base. Accessed 01 May 2024
Liu, B.: Sentiment Analysis: Mining Opinions, Sentiments, and Emotions. Cambridge University Press, Cambridge (2020)
Vyas, V., Uma, V.: Approaches to sentiment analysis on product reviews. In: Advances in Business Information Systems and Analytics, pp. 15–30. IGI Global, Hershey (2019)
Ribeiro, F.N., Araújo, M., Gonçalves, P., André Gonçalves, M., Benevenuto, F.: SentiBench - a benchmark comparison of state-of-the-practice sentiment analysis methods. EPJ Data Sci. 5 (2016). https://doi.org/10.1140/epjds/s13688-016-0085-1
Islam, M.R., Zibran, M.F.: A comparison of dictionary building methods for sentiment analysis in software engineering text. In: 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), pp. 478–479. IEEE (2017)
Nguyen, H., Veluchamy, A., Diop, M., Iqbal, R.: Comparative study of sentiment analysis with product reviews using machine learning and lexicon-based approaches. SMU Data Sci. Rev. 1, 7 (2018)
Saha, S., Showrov, M.I.H., Rahman, M.M., Majumder, M.Z.H.: VADER vs. BERT: a comparative performance analysis for sentiment on coronavirus outbreak. In: Satu, M.S., Moni, M.A., Kaiser, M.S., Arefin, M.S. (eds.) Machine Intelligence and Emerging Technologies. MIET 2022. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol. 490. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-34619-4_30
Borrelli, F.M., Challiol, C.: Comparing and evaluating tools for sentiment analysis. In: XI Jornadas de Cloud Computing, Big Data and Emerging Topics (La Plata, 27 al 29 de junio de 2023) (2023)
Thakur, N., Han, C.: An exploratory study of tweets about the SARS-CoV-2 Omicron variant: insights from sentiment analysis, language interpretation, source tracking, type classification, and embedded URL detection. COVID 2, 1026–1049 (2022). https://doi.org/10.3390/covid2080076
Thakur, N.: Sentiment analysis and text analysis of the public discourse on Twitter about COVID-19 and MPox. Big Data Cogn. Comput. 7, 116 (2023). https://doi.org/10.3390/bdcc7020116
Anoop, V.S., Sreelakshmi, S.: Public discourse and sentiment during Mpox outbreak: an analysis using natural language processing. Publ. Health 218, 114–120 (2023). https://doi.org/10.1016/j.puhe.2023.02.018
Bengesi, S., Oladunni, T., Olusegun, R., Audu, H.: A machine learning-sentiment analysis on Monkeypox outbreak: an extensive dataset to show the polarity of public opinion from Twitter tweets. IEEE Access. 11, 11811–11826 (2023). https://doi.org/10.1109/access.2023.3242290
Thakur, N.: MonkeyPox2022Tweets: a large-scale Twitter dataset on the 2022 Monkeypox outbreak, findings from analysis of Tweets, and open research questions. Infect. Dis. Rep. 14, 855–883 (2022). https://doi.org/10.3390/idr14060087
Butt, S., Sharma, S., Sharma, R., Sidorov, G., Gelbukh, A.: What goes on inside rumour and non-rumour tweets and their reactions: a psycholinguistic analyses. Comput. Human Behav. 135, 107345 (2022). https://doi.org/10.1016/j.chb.2022.107345
Kuang, Z., Zong, S., Zhang, J., Chen, J., Liu, H.: Music-to-text synaesthesia: generating descriptive text from music recordings (2022). http://arxiv.org/abs/2210.00434
Rozado, D., Hughes, R., Halberstadt, J.: Longitudinal analysis of sentiment and emotion in news media headlines using automated labelling with transformer language models. PLoS ONE 17, e0276367 (2022). https://doi.org/10.1371/journal.pone.0276367
Melton, C.A., Olusanya, O.A., Ammar, N., Shaban-Nejad, A.: Public sentiment analysis and topic modeling regarding COVID-19 vaccines on the Reddit social media platform: a call to action for strengthening vaccine confidence. J. Infect. Public Health 14, 1505–1512 (2021). https://doi.org/10.1016/j.jiph.2021.08.010
Melton, C.A.: Mining public opinion on COVID-19 vaccines using unstructured social media data (2022)
Wilkinson, M.D., et al.: The FAIR guiding principles for scientific data management and stewardship. Sci. Data. 3, 1–9 (2016). https://doi.org/10.1038/sdata.2016.18
Kaushik, L., Sangwan, A., Hansen, J.H.L.: Automatic sentiment extraction from YouTube videos. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding. IEEE (2013)
Oksanen, A., et al.: Pro-anorexia and anti-pro-anorexia videos on YouTube: sentiment analysis of user responses. J. Med. Internet Res. 17, e256 (2015). https://doi.org/10.2196/jmir.5007
Isnan, M., Elwirehardja, G.N., Pardamean, B.: Sentiment analysis for TikTok review using VADER sentiment and SVM model. Proc. Comput. Sci. 227, 168–175 (2023). https://doi.org/10.1016/j.procs.2023.10.514
Southwick, L., Guntuku, S.C., Klinger, E.V., Seltzer, E., McCalpin, H.J., Mer-chant, R.M.: Characterizing COVID-19 content posted to TikTok: public sentiment and response during the first phase of the COVID-19 pandemic. J. Adolesc. Health. 69, 234–241 (2021). https://doi.org/10.1016/j.jadohealth.2021.05.010
Heyder, C., Hillebrandt, I.: Short vertical videos going viral on TikTok: an empirical study and sentiment analysis. In: Redler, J., Schmidt, H.J., Baumgarth, C. (eds.) Forum Markenforschung 2021. Springer Gabler, Wiesbaden (2023). https://doi.org/10.1007/978-3-658-39568-1_7
Shevtsov, A., Oikonomidou, M., Antonakaki, D.: Analysis of Twitter and YouTube during USelections 2020. http://arxiv.org/abs/2010.08183. https://doi.org/10.1145/nnnnnnn.nnnnnnn
Thakur, N., Cui, S., Knieling, V., Khanna, K., Shao, M.: Investigation of the misinformation about COVID-19 on YouTube using topic modeling, sentiment analysis, and language analysis. Computation (Basel) 12, 28 (2024). https://doi.org/10.3390/computation12020028
Porreca, A., Scozzari, F., Di Nicola, M.: Using text mining and sentiment analysis to analyse YouTube Italian videos concerning vaccination. BMC Publ. Health. 20 (2020). https://doi.org/10.1186/s12889-020-8342-4
Rachmawati, F., Wibowo, A.A., Arianto, I.D.: Sentiment analysis #samasamabelajar public relations campaign based on big data on Tik-Tok. In: Proceeding of the International Conference on Economics and Business, vol. 1, pp. 377–388
Da’u, A., Salim, N.: Recommendation system based on deep learning methods: a systematic review and new directions. Artif. Intell. Rev. 53, 2709–2748 (2020). https://doi.org/10.1007/s10462-019-09744-1
Herlocker, J.L., Konstan, J.A., Riedl, J.: Explaining collaborative filtering recommendations. In: Proceedings of the 2000 ACM Conference on Computer-Supported Cooperative Work. ACM, New York (2000)
Schein, A.I., Popescul, A., Ungar, L.H., Pennock, D.M.: Methods and metrics for cold-start recommendations. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York (2002)
Ma, H., Zhou, T.C., Lyu, M.R., King, I.: Improving recommender systems by incorporating social contextual information. ACM Trans. Inf. Syst. 29, 1–23 (2011). https://doi.org/10.1145/1961209.1961212
Li, Y., Wang, H., Liu, H., Chen, B.: A study on content-based video recommendation. In: 2017 IEEE International Conference on Image Processing (ICIP). IEEE (2017)
Nanli, Z., Ping, Z., Weiguo, L., Meng, C.: Sentiment analysis: a literature review. In: 2012 International Symposium on Management of Technology (ISMOT). IEEE (2012)
Medhat, W., Hassan, A., Korashy, H.: Sentiment analysis algorithms and applications: a survey. Ain Shams Eng. J. 5, 1093–1113 (2014). https://doi.org/10.1016/j.asej.2014.04.011
Wankhade, M., Rao, A.C.S., Kulkarni, C.: A survey on sentiment analysis methods, applications, and challenges. Artif. Intell. Rev. 55, 5731–5780 (2022)
Birjali, M., Kasri, M., Beni-Hssane, A.: A comprehensive survey on sentiment analysis: approaches, challenges, and trends. Knowl. Based Syst. 226, 107134 (2021). https://doi.org/10.1016/j.knosys.2021.107134
Singh, N.K., Tomar, D.S., Sangaiah, A.K.: Sentiment analysis: a review and comparative analysis over social media. J. Ambient. Intell. Humaniz. Comput. 11, 97–117 (2020). https://doi.org/10.1007/s12652-018-0862-8
Hussein, D.M.E.-D.M.: A survey on sentiment analysis challenges. J. King Saud Univ. - Eng. Sci. 30, 330–338 (2018). https://doi.org/10.1016/j.jksues.2016.04.002
Zhang, L., Tong, Y., Ji, Q.: Active image labeling and its application to facial action labeling. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) Computer Vision – ECCV 2008. ECCV 2008. Lecture Notes in Computer Science, vol. 5303. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88688-4_52
Woods, D.D.: Behind Human Error. Ashgate Publishing, London (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Thakur, N. et al. (2025). A Labeled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and Other Sources About the 2024 Outbreak of Measles. In: Coman, A., Vasilache, S., Fui-Hoon Nah, F., Siau, K.L., Wei, J., Margetis, G. (eds) HCI International 2024 – Late Breaking Papers. HCII 2024. Lecture Notes in Computer Science, vol 15375. Springer, Cham. https://doi.org/10.1007/978-3-031-76806-4_17
Download citation
DOI: https://doi.org/10.1007/978-3-031-76806-4_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-76805-7
Online ISBN: 978-3-031-76806-4
eBook Packages: Computer ScienceComputer Science (R0)