Abstract
The virality of a tweet is essential to convey its message to a broader audience and, eventually, to generate influence. This is especially important for news outlets as they struggle to transition from traditional media to online formats. As their usual readers will not migrate directly to digital news outlets need to gather new audiences from the spaces where real-time information and discussions are happening; this is Social Media and in particular Twitter. Since the news websites and Twitter languages differ greatly news outlets need to write their tweets properly to maximize their impact on Twitter. We propose a method to predict if a tweet will be influential or not influential based on its text using a variant of Google BERT named RoBERTa, and a corpus of 5000 high-quality and automatically labeled highly-influential and non-influential tweets to train and classify tweets in these categories. Our method reaches an F1 of 0.873, improving 4 and 9 over approaches using LSTMs and n-grams respectively.
This work was supported by the CONACYT, Mexico, under Grant A1-S-47854 and by the Secretaría de Investigación y Posgrado of the Instituto Politécnico Nacional under Grants 20200859, 20211784, 20211884, and 20211178.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Jansen, B.J., Zhang, M., Sobel, K., Chowdury, A.: Twitter power: tweets as electronic word of mouth. J. Am. Soc. Inf. Sci. Technol. 60(11), 2169–2188 (2009)
Kwak, H., Lee, C., Park, H., Moon, S.: What is twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, pp. 591–600 (2010)
Hendriks, P.: Epilogue the myth of the death of newspapers. In: Newspapers: A Lost Cause?, pp. 195–201. Springer, Cham (1999)
Johnson, T.J., Kaye, B.K.: Blog day afternoon: are blogs stealing audiences away from traditional media sources? In: CYBERMEDIA, p. 320 (2006)
Minuti, D.: Journalism and ethics-ethics in journalism in the era of prolific sources. Academicus Int. Sci. J. 109–119 (2010)
Liu, Y., Chen, W., Li, J.: Transformation and development of traditional media in new media environment. In: Xie, Y. (ed.) New Media and China’s Social Development. RSCDCDP, pp. 25–46. Springer, Singapore (2017). https://doi.org/10.1007/978-981-10-3994-2_3
Goel, S., Anderson, A., Hofman, J., Watts, D.J.: The structural virality of online diffusion. Manag. Sci. 62(1), 180–196 (2016)
Maldonado, C.E.: How to improve the reach and impact of social media content. Res. Comput. Sci. 127, 59–68 (2016)
Yang, Q., Tufts, C., Ungar, L., Guntuku, S., Merchant, R.: To retweet or not to retweet: understanding what features of cardiovascular tweets influence their retransmission. J. Health Commun. 23(12), 1026–1035 (2018)
Keib, K., Himelboim, I., Han, J.Y.: Important tweets matter: predicting retweets in the# blacklivesmatter talk on twitter. Comput. Hum. Behav. 85, 106–115 (2018)
Lee, C.H., Yu, H.: The impact of language on retweeting during acute crises: uncertainty reduction and language expectancy perspectives. Ind. Manag. Data Syst. Forthcoming (2019)
Bandari, R., Asur, S., Huberman, B.: The pulse of news in social media: forecasting popularity. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 6 (2012)
Kowalczyk, D.K., Larsen, J.: Scalable privacy-compliant virality prediction on twitter. arXiv preprint arXiv:1812.06034 (2018)
Xiao, C., Liu, C., Ma, Y., Li, Z., Luo, X.: Time sensitivity-based popularity prediction for online promotion on twitter. Inf. Sci. 525, 82–92 (2020)
Rosé, C., et al.: Analyzing collaborative learning processes automatically: exploiting the advances of computational linguistics in computer-supported collaborative learning. Int. J. Comput.-Supp. Collab. Learn. 3(3), 237–271 (2008)
Witten, I.H., Frank, E., Hall, M.A., Pal, C., Data, M.: Practical machine learning tools and techniques. In: DATA MINING. vol. 2, p. 4 (2005)
Thornton, C., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of KDD-2013, pp. 847–855 (2013)
Molino, P., Dudin, Y., Miryala, S.S.: Ludwig: a type-based declarative deep learning toolbox. arXiv preprint arXiv:1909.07930 (2019)
Pachón, V., Vázquez, J.M., Olmedo, J.L.D.: Identification of profession & occupation in health-related social media using tweets in spanish. In: Proceedings of the Sixth Social Media Mining for Health (# SMM4H) Workshop and Shared Task, pp. 105–107 (2021)
Wolf, T., et al.: Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771 (2019)
Desai, A., Sunil, R.: Analysis of machine learning algorithms using Weka. Int. J. Comput. Appl. 975, 8887 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Maldonado-Sifuentes, C.E., Angel, J., Sidorov, G., Kolesnikova, O., Gelbukh, A. (2021). Virality Prediction for News Tweets Using RoBERTa. In: Batyrshin, I., Gelbukh, A., Sidorov, G. (eds) Advances in Soft Computing. MICAI 2021. Lecture Notes in Computer Science(), vol 13068. Springer, Cham. https://doi.org/10.1007/978-3-030-89820-5_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-89820-5_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89819-9
Online ISBN: 978-3-030-89820-5
eBook Packages: Computer ScienceComputer Science (R0)