Skip to main content

Advertisement

Log in

Prompt-based contrastive learning to combat the COVID-19 infodemic

  • Published:
Machine Learning Aims and scope Submit manuscript

Abstract

The COVID-19 pandemic has brought about an influx of misinformation and disinformation online, especially on social media. The World Health Organization has identified combating this infodemic as one of its top priorities, as false and misleading information can lead to negative consequences, such as the spread of conspiracy theories, false remedies, and xenophobia. This study presents a prompt-based contrastive learning approach that can be employed to address this issue. This method was designed to overcome challenges such as data scarcity and class imbalance commonly found in social media. Fighting the infodemic is modeled as a series of text classification problems in which questions relevant to credibility of the texts, their potential harm to society and the necessity of government intervention need to be answered. Experiments show that prompt-based contrastive learning is effective in assessing the accuracy of COVID-19-related online text.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

The data set is publicly available at https://github.com/firojalam/COVID-19-disinformation/tree/master/data.

Code availability

The code is available at https://github.com/yw57721/Prompt_Contrastive_Covid19.

References

  • Abdelminaam, D. S., Ismail, F. H., Taha, M., Taha, A., Houssein, E. H., & Nabil, A. (2021). CoAID-DEEP: An optimized intelligent framework for automated detecting COVID-19 misleading information on twitter. IEEE Access, 9, 27840–27867.

    Article  MATH  Google Scholar 

  • Alam, F., Shaar, S., Dalvi, F., Sajjad, H., Nikolov, A., Mubarak, H., Da San Martino, G., Abdelali, A., Durrani, N., Darwish, K., Al-Homaid, A., Zaghouani, W., Caselli, T., Danoe, G., Stolk, F., Bruntink, B., & Nakov, P. (2021). Fighting the COVID-19 infodemic: Modeling the perspective of journalists, fact-checkers, social media platforms, policy makers, and the society. Findings of the Association for Computational Linguistics: EMNLP, 2021, 611–649.

    Google Scholar 

  • Aljazeera. (2020). Online resource, https://www.aljazeera.com/news/2020/4/27/iran-over-700-dead-after-drinking-alcohol-to-cure-coronavirus, Data of access: Dec 08, 2022

  • Ayoub, J. Yang, X. J., & Zhou, F. (2021), Combat COVID-19 infodemic using explainable natural language processing models. Information Processing & Management, 58(4), article 102569.

  • Ba, J. L., Kiros, J. R., & Hinton, G. E. (2016). Layer normalization. arXiv:1607.06450.

  • Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2016). Enriching word vectors with subword information. arXiv:1607.04606.

  • Bowman, S. R., Angeli, G., Potts, C., & Manning, C. D. (2015). A large annotated corpus for learning natural language inference. In Proceedings of the 2015 conference on empirical methods in natural language processing (pp. 632–642).

  • Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., … Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901.

    Google Scholar 

  • Chadaga, K., Prabhu, S., Sampathila, N., Chadaga, R., & Umakanth, S. (2024). An explainable decision support framework for differential diagnosis between mild COVID-19 and other similar influenzas. IEEE Access, 12, 75010–75033.

    Article  Google Scholar 

  • Chatterjee, S., Bhattacharjee, S., Das, A. K., & Banerjee, S. (2024). Imbalanced COVID-19 vaccine sentiment classification with synthetic resampling coupled deep adversarial active learning. Machine Learning. https://doi.org/10.1007/s10994-024-06562-7

    Article  MathSciNet  MATH  Google Scholar 

  • Chen, M.-Y., & Lai, Y.-W. (2022). Using fuzzy clustering with deep learning models for detection of COVID-19 disinformation. ACM Transactions on Asian and Low-Resource Language Information Process. https://doi.org/10.1145/3548458

    Article  MATH  Google Scholar 

  • Chen, M.-Y., Lai, Y.-W., & Lian, J.-W. (2022). Using deep learning models to detect fake news about COVID-19. ACM Transactions on Internet Technology. https://doi.org/10.1145/3533431

    Article  MATH  Google Scholar 

  • Dadgar, S., & Ghatee, M. (2021). Checkovid: A COVID-19 misinformation detection system on Twitter using network and content mining perspectives. arXiv:2107.09768.

  • Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional Transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: Human language technologies (pp. 4171–4186).

  • Du, J., Dou, Y., Xia, C., Cui, L., Ma, J., & Yu, P. S. (2021). Cross-lingual COVID-19 fake news detection. In Proceedings of 2021 international conference on data mining workshops (ICDMW) cross-lingual COVID-19 fake news detection (pp. 859–862).

  • Editorial of the Lancet Infectious diseases. (2020). The COVID-19 infordemic. The Lancet Infectious Diseases, 20(8), 875.

    Article  MATH  Google Scholar 

  • Elhadad, M. K., Li, K. F., & Gebali, F. (2021). An ensemble deep learning technique to detect COVID-19 misleading information. In L. Barolli, K. Li, T. Enokido, & M. Takizawa (Eds.), Advances in networked-based information systems. Springer.

    MATH  Google Scholar 

  • Fang, H., Wang, S., Zhou, M., Ding, J., & Xie, P. (2020). CERT: Contrastive self-supervised learning for language understanding. arXiv:2005.12766.

  • Gao, T., Yao, X., & Chen, D. (2021). SimCSE: Simple contrastive learning of sentence embeddings. In Proceedings of the 2021 conference on empirical methods in natural language processing (pp. 6894–6910).

  • Giuseppe, R., Bianca, P., Andrea, V., & ErpAmparo, M. E. C. B. (2017). Lessons learnt from the named entity recognition and linking (NEEL) challenge series. Semantic Web Journal, 8(5), 667–700.

    Article  Google Scholar 

  • Han, X., Zhao, W., Ding, N., Liu, Z., & Sun, M. (2022). PTR: Prompt tuning with rules for text classification. AI Open, 3(1), 182–192.

    Article  MATH  Google Scholar 

  • Hendrycks, D., & Gimpel, K. (2016). Gaussian error linear units (GELUs). arXiv:1606.08415.

  • Hossain, T., Logan IV, R. L., Ugarte, A., Matsubara, Y., Young, S., & Singh, S. (2020). COVIDLies: Detecting COVID-19 misinformation on social media. In Proceedings of the 1st workshop on NLP for COVID-19, Online.

  • Huang, J., Wang, Y., Ng, S. C. H., & Tsung, F. (2024). Overcoming the semantic gap in the customer-to-manufacturer (C2M) platform: A soft prompts-based approach with pretrained language models. International Journal of Production Economics, 272(6), 109248.

    Article  MATH  Google Scholar 

  • Jaimovitch-López, G., Ferri, C., Hernández-Orallo, J., Martínez-Plumed, F., & José Ramírez-Quintana, M. (2023). Can language models automate data wrangling? Machine Learning, 112, 2053–2082.

    Article  MathSciNet  MATH  Google Scholar 

  • Joshi, A., Sparks, R., Karimi, S., Yan, S.-L., Chughtai, A., Paris, C., & MacIntyre, C. R. (2020). Automated monitoring of tweets for early detection of the 2014 Ebola epidemic. PLoS ONE, 15(3), e0230322.

    Article  Google Scholar 

  • Kawintiranon, K., Singh, L., & Budak, C. (2022). Traditional and context-specific spam detection in low resource settings. Machine Learning, 111, 2515–2536.

    Article  MathSciNet  MATH  Google Scholar 

  • Kim, Y. (2014). Convolutional neural networks for sentence classification. In Proceedings of the 2014 conference on empirical methods in natural language processing (pp. 1746–1751).

  • Kolluri, N. L., & Murthy, D. (2021). CoVerifi: A COVID-19 news verification system. Online Social Networks and Media, 22, 100123.

    Article  MATH  Google Scholar 

  • Kou, Z., Shang, L., Zhang, Y., & Wang, D. (2022b). HC-COVID: A hierarchical crowdsource knowledge graph approach to explainable COVID-19 misinformation detection. Proceedings of the ACM on Human-Computer Interaction, 6, 1–25.

    Google Scholar 

  • Kou, Z., Shang, L., Zhang, Y., Yue, Z., Zeng, H., & Wang, D. (2022a). Crowd, expert & AI: A human–AI interactive approach towards natural language explanation based COVID-19 misinformation detection. In Proceedings of the thirty-first international joint conference on artificial intelligence (IJCAI-22) (pp. 5087–5093).

  • Lin, Y. C., & Su, K.-Y. (2021). How fast can BERT learn simple natural language inference? In Proceedings of the 16th conference of the European chapter of the association for computational linguistics (pp. 626–633).

  • Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2021). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. arXiv:2107.13586.

  • Liu, Z., Xiong, C., Dai, Z., Sun, S., Sun, M., & Liu, Z. (2020). Adapting open domain fact extraction and verification to COVID-FACT through in-domain language modeling. In Findings of the association for computational linguistics: EMNLP (pp. 2395–2400).

  • Luo, L., & Wang, Y. (2019). EmotionX-HSU: Adopting pre-trained BERT for emotion classification. arXiv:1907.09669.

  • Luo, L., Wang, Y., & Liu, L. (2022). COVID-19 personal health mention detection from tweets using dual convolutional neural network. Expert Systems with Applications, 200, 117139.

    Article  Google Scholar 

  • Luo, L., Wang, Y., & Mo, D. Y. (2023). Identifying heart disease risk factors from electronic health records using an ensemble of deep learning method. IISE Transactions on Healthcare Systems Engineering, 13(3), 237–247.

    Article  MATH  Google Scholar 

  • Meng, Y., Xiong, C., Bajaj, P., Tiwary, S., Bennett, P., Han, J., & Song, X. (2021). COCO-LM: Correcting and contrasting text sequences for language model pretraining. arXiv:2102.08473.

  • Mohr, I., Wührl, A., & Klinger, R. (2022). CoVERT: A corpus of fact-checked biomedical COVID-19 tweets. In Proceedings of the thirteenth language resources and evaluation conference (pp. 244–257).

  • Paka, W. S., Bansal, R., Kaushik, A., Sengupta, S., & Chakraborty, T. (2021). Cross-SEAN: A cross-stitch semi-supervised neural attention model for COVID-19 fake news detection. Applied Soft Computing, 107, 107393.

    Article  Google Scholar 

  • Peng, Z., Li, M., Wang, Y., & Ho, G. T. S. (2023). Combating the COVID-19 infodemic using prompt-based curriculum learning. Expert Systems with Applications, 229(A), 120501.

    Article  Google Scholar 

  • Pulido, C. M., Villarejo-Carballido, B., Redondo-Sama, G., & Gómez, A. (2020). COVID-19 infodemic: More retweets for science-based information on coronavirus than for false information. International Sociology, 35(4), 377–392.

    Article  Google Scholar 

  • Qian, Z., Alaa, A. M., & van der Schaar, M. (2021). CPAS: The UK’s national machine learning-based hospital capacity planning system for COVID-19. Machine Learning, 110(1), 15–35.

    Article  MathSciNet  MATH  Google Scholar 

  • Saakyan, A., Chakrabarty, T., & Muresan, S. (2021). COVID-Fact: Fact extraction and certification of real-world claims on COVID-19 pandemic. In Proceedings of the 59th annual meeting of the association for computational linguistics (pp. 2116–2129).

  • Sarrouti, M., Abacha, A. B., Mrabet, Y., & Demner-Fushman, D. (2021). Evidence-based fact-checking of health-related claims. In Findings of the association for computational linguistics EMNLP (pp. 3499–3512).

  • Serrano, J. C. M., Papakyriakopoulos, O., & Hegelich, S. (2020). NLP-based feature extraction for the detection of COVID-19 misinformation videos on YouTube. In Proceedings of the 1st workshop on NLP for COVID-19, Online.

  • Schick, Y., & Schütze., H. (2021). Exploiting cloze questions for few-shot text classification and natural language inference. In Proceedings of the 16th conference of the European chapter of the association for computational linguistics (pp. 255–269).

  • Solayman, S., Aumi, S., Mery, C., Mubassir, M., & Khan, R. (2023). Automatic COVID-19 prediction using explainable machine learning techniques. International Journal of Cognitive Computing in Engineering, 4, 36–46.

    Article  Google Scholar 

  • Sushil, M., Suster, S., & Daelemans, W. (2021). Are we there yet? Exploring clinical domain knowledge of BERT models. In Proceedings of the 20th workshop on biomedical language processing, 41–53, Online.

  • Talib, M. A., Afadar, Y., Nasir, Q., Nassif, A., Hijazi, H., & Hasasneh, A. (2024). A tree-based explainable AI model for early detection of Covid-19 using physiological data. BMC Medical Informatics and Decision Making, 24, 179.

    Article  Google Scholar 

  • Tan, Q., Song, X., Ye, G., & Wu, C. (2023). An effective negative sampling approach for contrastive learning of sentence embedding. Machine Learning, 112(11), 4837–4861.

    Article  MathSciNet  MATH  Google Scholar 

  • Talman, A., & Chatzikyriakidis, S. (2019). Testing the generalization power of neural network models across NLI benchmarks. In Proceedings of the 2019 ACL workshop BlackboxNLP: Analyzing and interpreting neural networks for NLP (pp. 85–94).

  • van der Schaar, M., Alaa, A. M., Floto, A., Gimson, A., Scholtes, S., Wood, A., McKinney, E., Jarrett, E., Lio, P., & Ercole, A. (2021). How artificial intelligence and machine learning can help healthcare systems respond to COVID-19. Machine Learning, 110(1), 1–14.

    Article  MathSciNet  MATH  Google Scholar 

  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In Proceedings of 31st conference on neural information processing systems (NIPS 2017), Long Beach, CA, USA.

  • Vijjali, R., Potluri, P., Kumar, S., & Teki, S. (2020). Two stage Transformer model for COVID-19 fake news detection and fact checking. In Proceedings of the 3rd NLP4IF workshop on NLP for internet freedom: Censorship, disinformation, and propaganda, Barcelona, Spain (pp. 1–10).

  • Wani, A., Joshi, I., Khandve, S., Wagh, V., & Joshi, R. (2021). Evaluating deep learning approaches for covid19 fake news detection. In Proceedings of workshop on combating online hostile posts in regional languages during emergency situation, CONSTRAINT 2021, collocated with AAAI 2021 (pp. 153–163). Springer.

  • Wang, Y., & Li, X. (2021). Mining product reviews for needs-based product configurator design: A transfer learning-based approach. IEEE Transactions on Industrial Informatics, 17(9), 6192–6199.

    Article  MATH  Google Scholar 

  • Wang, Y., Zhao, W., & Wan, X. (2021). Needs-based product configurator design for mass customization using hierarchical attention network. IEEE Transactions on Automation Science and Engineering, 18(1), 195–204.

    Article  MATH  Google Scholar 

  • WHO. (2021). Infodemic, online resource. https://www.who.int/health-topics/infodemic#tab=tab_1

  • Williams, A., Nangia, N., & Bowman, S. (2018). A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 conference of the North American chapter of the Association for Computational Linguistics: Human language technologies (pp. 1112–1122).

  • Wu, Z., Wang, S., Gu, J., Khabsa, M., Sun, F., & Ma, H. (2020). CLEAR: Contrastive learning for sentence representation. arXiv:2012.15466, 1–10.

  • Yakovyna, V., Shakhovska, N., & Szpakowska, A. (2024). A novel hybrid supervised and unsupervised hierarchical ensemble for COVID-19 cases and mortality prediction. Scientific Reports, 14, 9782.

    Article  MATH  Google Scholar 

  • Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., & Hovy, E. (2016). Hierarchical attention networks for document classification. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: Human language technologies (pp. 1480–1489).

Download references

Funding

This research is supported by Hong Kong Research Grant Council FDS UGC/FDS14/E05/22.

Author information

Authors and Affiliations

Authors

Contributions

ZP: processed the data, implemented the algorithm and conducted the experiment; ML: implemented the algorithm and conducted the experiment; YW: conceived the idea, applied the research fund, analyze the results and wrote the paper; DM: conducted the experiment and analyzed the results.

Corresponding author

Correspondence to Yue Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing interests or personal relationships that could have appeared to influence the work reported in this paper.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Ethics approval

Not applicable.

Additional information

Editors: Longbing Cao, David Anastasiu, Qi Zhang, Xiaolin Huang.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Peng, Z., Li, M., Wang, Y. et al. Prompt-based contrastive learning to combat the COVID-19 infodemic. Mach Learn 114, 6 (2025). https://doi.org/10.1007/s10994-024-06731-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10994-024-06731-8

Keywords

Navigation