Prompt-based contrastive learning to combat the COVID-19 infodemic

Peng, Zifan; Li, Mingchen; Wang, Yue; Mo, Daniel Y.

doi:10.1007/s10994-024-06731-8

Prompt-based contrastive learning to combat the COVID-19 infodemic

Published: 14 January 2025

Volume 114, article number 6, (2025)
Cite this article

Machine Learning Aims and scope Submit manuscript

Zifan Peng¹,
Mingchen Li²,
Yue Wang ORCID: orcid.org/0000-0002-0185-6172³ &
…
Daniel Y. Mo⁴

65 Accesses
1 Altmetric
Explore all metrics

Abstract

The COVID-19 pandemic has brought about an influx of misinformation and disinformation online, especially on social media. The World Health Organization has identified combating this infodemic as one of its top priorities, as false and misleading information can lead to negative consequences, such as the spread of conspiracy theories, false remedies, and xenophobia. This study presents a prompt-based contrastive learning approach that can be employed to address this issue. This method was designed to overcome challenges such as data scarcity and class imbalance commonly found in social media. Fighting the infodemic is modeled as a series of text classification problems in which questions relevant to credibility of the texts, their potential harm to society and the necessity of government intervention need to be answered. Experiments show that prompt-based contrastive learning is effective in assessing the accuracy of COVID-19-related online text.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CoBertTC: Covid-19 Text Classification Using Transformer-Based Language Models

Exploring Contrastive Learning for Long-Tailed Multi-label Text Classification

Addressing long-tailed distribution in judicial text for criminal motive classification: a balanced contrastive learning approach

Article Open access 19 February 2025

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The data set is publicly available at https://github.com/firojalam/COVID-19-disinformation/tree/master/data.

Code availability

The code is available at https://github.com/yw57721/Prompt_Contrastive_Covid19.

References

Abdelminaam, D. S., Ismail, F. H., Taha, M., Taha, A., Houssein, E. H., & Nabil, A. (2021). CoAID-DEEP: An optimized intelligent framework for automated detecting COVID-19 misleading information on twitter. IEEE Access, 9, 27840–27867.
Article MATH Google Scholar
Alam, F., Shaar, S., Dalvi, F., Sajjad, H., Nikolov, A., Mubarak, H., Da San Martino, G., Abdelali, A., Durrani, N., Darwish, K., Al-Homaid, A., Zaghouani, W., Caselli, T., Danoe, G., Stolk, F., Bruntink, B., & Nakov, P. (2021). Fighting the COVID-19 infodemic: Modeling the perspective of journalists, fact-checkers, social media platforms, policy makers, and the society. Findings of the Association for Computational Linguistics: EMNLP, 2021, 611–649.
Google Scholar
Aljazeera. (2020). Online resource, https://www.aljazeera.com/news/2020/4/27/iran-over-700-dead-after-drinking-alcohol-to-cure-coronavirus, Data of access: Dec 08, 2022
Ayoub, J. Yang, X. J., & Zhou, F. (2021), Combat COVID-19 infodemic using explainable natural language processing models. Information Processing & Management, 58(4), article 102569.
Ba, J. L., Kiros, J. R., & Hinton, G. E. (2016). Layer normalization. arXiv:1607.06450.
Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2016). Enriching word vectors with subword information. arXiv:1607.04606.
Bowman, S. R., Angeli, G., Potts, C., & Manning, C. D. (2015). A large annotated corpus for learning natural language inference. In Proceedings of the 2015 conference on empirical methods in natural language processing (pp. 632–642).
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., … Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901.
Google Scholar
Chadaga, K., Prabhu, S., Sampathila, N., Chadaga, R., & Umakanth, S. (2024). An explainable decision support framework for differential diagnosis between mild COVID-19 and other similar influenzas. IEEE Access, 12, 75010–75033.
Article Google Scholar
Chatterjee, S., Bhattacharjee, S., Das, A. K., & Banerjee, S. (2024). Imbalanced COVID-19 vaccine sentiment classification with synthetic resampling coupled deep adversarial active learning. Machine Learning. https://doi.org/10.1007/s10994-024-06562-7
Article MathSciNet MATH Google Scholar
Chen, M.-Y., & Lai, Y.-W. (2022). Using fuzzy clustering with deep learning models for detection of COVID-19 disinformation. ACM Transactions on Asian and Low-Resource Language Information Process. https://doi.org/10.1145/3548458
Article MATH Google Scholar
Chen, M.-Y., Lai, Y.-W., & Lian, J.-W. (2022). Using deep learning models to detect fake news about COVID-19. ACM Transactions on Internet Technology. https://doi.org/10.1145/3533431
Article MATH Google Scholar
Dadgar, S., & Ghatee, M. (2021). Checkovid: A COVID-19 misinformation detection system on Twitter using network and content mining perspectives. arXiv:2107.09768.
Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional Transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: Human language technologies (pp. 4171–4186).
Du, J., Dou, Y., Xia, C., Cui, L., Ma, J., & Yu, P. S. (2021). Cross-lingual COVID-19 fake news detection. In Proceedings of 2021 international conference on data mining workshops (ICDMW) cross-lingual COVID-19 fake news detection (pp. 859–862).
Editorial of the Lancet Infectious diseases. (2020). The COVID-19 infordemic. The Lancet Infectious Diseases, 20(8), 875.
Article MATH Google Scholar
Elhadad, M. K., Li, K. F., & Gebali, F. (2021). An ensemble deep learning technique to detect COVID-19 misleading information. In L. Barolli, K. Li, T. Enokido, & M. Takizawa (Eds.), Advances in networked-based information systems. Springer.
MATH Google Scholar
Fang, H., Wang, S., Zhou, M., Ding, J., & Xie, P. (2020). CERT: Contrastive self-supervised learning for language understanding. arXiv:2005.12766.
Gao, T., Yao, X., & Chen, D. (2021). SimCSE: Simple contrastive learning of sentence embeddings. In Proceedings of the 2021 conference on empirical methods in natural language processing (pp. 6894–6910).
Giuseppe, R., Bianca, P., Andrea, V., & ErpAmparo, M. E. C. B. (2017). Lessons learnt from the named entity recognition and linking (NEEL) challenge series. Semantic Web Journal, 8(5), 667–700.
Article Google Scholar
Han, X., Zhao, W., Ding, N., Liu, Z., & Sun, M. (2022). PTR: Prompt tuning with rules for text classification. AI Open, 3(1), 182–192.
Article MATH Google Scholar
Hendrycks, D., & Gimpel, K. (2016). Gaussian error linear units (GELUs). arXiv:1606.08415.
Hossain, T., Logan IV, R. L., Ugarte, A., Matsubara, Y., Young, S., & Singh, S. (2020). COVIDLies: Detecting COVID-19 misinformation on social media. In Proceedings of the 1st workshop on NLP for COVID-19, Online.
Huang, J., Wang, Y., Ng, S. C. H., & Tsung, F. (2024). Overcoming the semantic gap in the customer-to-manufacturer (C2M) platform: A soft prompts-based approach with pretrained language models. International Journal of Production Economics, 272(6), 109248.
Article MATH Google Scholar
Jaimovitch-López, G., Ferri, C., Hernández-Orallo, J., Martínez-Plumed, F., & José Ramírez-Quintana, M. (2023). Can language models automate data wrangling? Machine Learning, 112, 2053–2082.
Article MathSciNet MATH Google Scholar
Joshi, A., Sparks, R., Karimi, S., Yan, S.-L., Chughtai, A., Paris, C., & MacIntyre, C. R. (2020). Automated monitoring of tweets for early detection of the 2014 Ebola epidemic. PLoS ONE, 15(3), e0230322.
Article Google Scholar
Kawintiranon, K., Singh, L., & Budak, C. (2022). Traditional and context-specific spam detection in low resource settings. Machine Learning, 111, 2515–2536.
Article MathSciNet MATH Google Scholar
Kim, Y. (2014). Convolutional neural networks for sentence classification. In Proceedings of the 2014 conference on empirical methods in natural language processing (pp. 1746–1751).
Kolluri, N. L., & Murthy, D. (2021). CoVerifi: A COVID-19 news verification system. Online Social Networks and Media, 22, 100123.
Article MATH Google Scholar
Kou, Z., Shang, L., Zhang, Y., & Wang, D. (2022b). HC-COVID: A hierarchical crowdsource knowledge graph approach to explainable COVID-19 misinformation detection. Proceedings of the ACM on Human-Computer Interaction, 6, 1–25.
Google Scholar
Kou, Z., Shang, L., Zhang, Y., Yue, Z., Zeng, H., & Wang, D. (2022a). Crowd, expert & AI: A human–AI interactive approach towards natural language explanation based COVID-19 misinformation detection. In Proceedings of the thirty-first international joint conference on artificial intelligence (IJCAI-22) (pp. 5087–5093).
Lin, Y. C., & Su, K.-Y. (2021). How fast can BERT learn simple natural language inference? In Proceedings of the 16th conference of the European chapter of the association for computational linguistics (pp. 626–633).
Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2021). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. arXiv:2107.13586.
Liu, Z., Xiong, C., Dai, Z., Sun, S., Sun, M., & Liu, Z. (2020). Adapting open domain fact extraction and verification to COVID-FACT through in-domain language modeling. In Findings of the association for computational linguistics: EMNLP (pp. 2395–2400).
Luo, L., & Wang, Y. (2019). EmotionX-HSU: Adopting pre-trained BERT for emotion classification. arXiv:1907.09669.
Luo, L., Wang, Y., & Liu, L. (2022). COVID-19 personal health mention detection from tweets using dual convolutional neural network. Expert Systems with Applications, 200, 117139.
Article Google Scholar
Luo, L., Wang, Y., & Mo, D. Y. (2023). Identifying heart disease risk factors from electronic health records using an ensemble of deep learning method. IISE Transactions on Healthcare Systems Engineering, 13(3), 237–247.
Article MATH Google Scholar
Meng, Y., Xiong, C., Bajaj, P., Tiwary, S., Bennett, P., Han, J., & Song, X. (2021). COCO-LM: Correcting and contrasting text sequences for language model pretraining. arXiv:2102.08473.
Mohr, I., Wührl, A., & Klinger, R. (2022). CoVERT: A corpus of fact-checked biomedical COVID-19 tweets. In Proceedings of the thirteenth language resources and evaluation conference (pp. 244–257).
Paka, W. S., Bansal, R., Kaushik, A., Sengupta, S., & Chakraborty, T. (2021). Cross-SEAN: A cross-stitch semi-supervised neural attention model for COVID-19 fake news detection. Applied Soft Computing, 107, 107393.
Article Google Scholar
Peng, Z., Li, M., Wang, Y., & Ho, G. T. S. (2023). Combating the COVID-19 infodemic using prompt-based curriculum learning. Expert Systems with Applications, 229(A), 120501.
Article Google Scholar
Pulido, C. M., Villarejo-Carballido, B., Redondo-Sama, G., & Gómez, A. (2020). COVID-19 infodemic: More retweets for science-based information on coronavirus than for false information. International Sociology, 35(4), 377–392.
Article Google Scholar
Qian, Z., Alaa, A. M., & van der Schaar, M. (2021). CPAS: The UK’s national machine learning-based hospital capacity planning system for COVID-19. Machine Learning, 110(1), 15–35.
Article MathSciNet MATH Google Scholar
Saakyan, A., Chakrabarty, T., & Muresan, S. (2021). COVID-Fact: Fact extraction and certification of real-world claims on COVID-19 pandemic. In Proceedings of the 59th annual meeting of the association for computational linguistics (pp. 2116–2129).
Sarrouti, M., Abacha, A. B., Mrabet, Y., & Demner-Fushman, D. (2021). Evidence-based fact-checking of health-related claims. In Findings of the association for computational linguistics EMNLP (pp. 3499–3512).
Serrano, J. C. M., Papakyriakopoulos, O., & Hegelich, S. (2020). NLP-based feature extraction for the detection of COVID-19 misinformation videos on YouTube. In Proceedings of the 1st workshop on NLP for COVID-19, Online.
Schick, Y., & Schütze., H. (2021). Exploiting cloze questions for few-shot text classification and natural language inference. In Proceedings of the 16th conference of the European chapter of the association for computational linguistics (pp. 255–269).
Solayman, S., Aumi, S., Mery, C., Mubassir, M., & Khan, R. (2023). Automatic COVID-19 prediction using explainable machine learning techniques. International Journal of Cognitive Computing in Engineering, 4, 36–46.
Article Google Scholar
Sushil, M., Suster, S., & Daelemans, W. (2021). Are we there yet? Exploring clinical domain knowledge of BERT models. In Proceedings of the 20th workshop on biomedical language processing, 41–53, Online.
Talib, M. A., Afadar, Y., Nasir, Q., Nassif, A., Hijazi, H., & Hasasneh, A. (2024). A tree-based explainable AI model for early detection of Covid-19 using physiological data. BMC Medical Informatics and Decision Making, 24, 179.
Article Google Scholar
Tan, Q., Song, X., Ye, G., & Wu, C. (2023). An effective negative sampling approach for contrastive learning of sentence embedding. Machine Learning, 112(11), 4837–4861.
Article MathSciNet MATH Google Scholar
Talman, A., & Chatzikyriakidis, S. (2019). Testing the generalization power of neural network models across NLI benchmarks. In Proceedings of the 2019 ACL workshop BlackboxNLP: Analyzing and interpreting neural networks for NLP (pp. 85–94).
van der Schaar, M., Alaa, A. M., Floto, A., Gimson, A., Scholtes, S., Wood, A., McKinney, E., Jarrett, E., Lio, P., & Ercole, A. (2021). How artificial intelligence and machine learning can help healthcare systems respond to COVID-19. Machine Learning, 110(1), 1–14.
Article MathSciNet MATH Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In Proceedings of 31st conference on neural information processing systems (NIPS 2017), Long Beach, CA, USA.
Vijjali, R., Potluri, P., Kumar, S., & Teki, S. (2020). Two stage Transformer model for COVID-19 fake news detection and fact checking. In Proceedings of the 3rd NLP4IF workshop on NLP for internet freedom: Censorship, disinformation, and propaganda, Barcelona, Spain (pp. 1–10).
Wani, A., Joshi, I., Khandve, S., Wagh, V., & Joshi, R. (2021). Evaluating deep learning approaches for covid19 fake news detection. In Proceedings of workshop on combating online hostile posts in regional languages during emergency situation, CONSTRAINT 2021, collocated with AAAI 2021 (pp. 153–163). Springer.
Wang, Y., & Li, X. (2021). Mining product reviews for needs-based product configurator design: A transfer learning-based approach. IEEE Transactions on Industrial Informatics, 17(9), 6192–6199.
Article MATH Google Scholar
Wang, Y., Zhao, W., & Wan, X. (2021). Needs-based product configurator design for mass customization using hierarchical attention network. IEEE Transactions on Automation Science and Engineering, 18(1), 195–204.
Article MATH Google Scholar
WHO. (2021). Infodemic, online resource. https://www.who.int/health-topics/infodemic#tab=tab_1
Williams, A., Nangia, N., & Bowman, S. (2018). A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 conference of the North American chapter of the Association for Computational Linguistics: Human language technologies (pp. 1112–1122).
Wu, Z., Wang, S., Gu, J., Khabsa, M., Sun, F., & Ma, H. (2020). CLEAR: Contrastive learning for sentence representation. arXiv:2012.15466, 1–10.
Yakovyna, V., Shakhovska, N., & Szpakowska, A. (2024). A novel hybrid supervised and unsupervised hierarchical ensemble for COVID-19 cases and mortality prediction. Scientific Reports, 14, 9782.
Article MATH Google Scholar
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., & Hovy, E. (2016). Hierarchical attention networks for document classification. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: Human language technologies (pp. 1480–1489).

Download references

Funding

This research is supported by Hong Kong Research Grant Council FDS UGC/FDS14/E05/22.

Author information

Authors and Affiliations

The Hong Kong University of Science and Technology, Sai Kung, Hong Kong SAR, China
Zifan Peng
Khoury College of Computer Sciences, Northeastern University, Boston, USA
Mingchen Li
The Education University of Hong Kong, Tai Po, Hong Kong SAR, China
Yue Wang
The Hang Seng University of Hong Kong, Sha Tin, Hong Kong SAR, China
Daniel Y. Mo

Authors

Zifan Peng
View author publications
You can also search for this author in PubMed Google Scholar
Mingchen Li
View author publications
You can also search for this author in PubMed Google Scholar
Yue Wang
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Y. Mo
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

ZP: processed the data, implemented the algorithm and conducted the experiment; ML: implemented the algorithm and conducted the experiment; YW: conceived the idea, applied the research fund, analyze the results and wrote the paper; DM: conducted the experiment and analyzed the results.

Corresponding author

Correspondence to Yue Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing interests or personal relationships that could have appeared to influence the work reported in this paper.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Ethics approval

Not applicable.

Additional information

Editors: Longbing Cao, David Anastasiu, Qi Zhang, Xiaolin Huang.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Peng, Z., Li, M., Wang, Y. et al. Prompt-based contrastive learning to combat the COVID-19 infodemic. Mach Learn 114, 6 (2025). https://doi.org/10.1007/s10994-024-06731-8

Download citation

Received: 12 April 2024
Revised: 10 August 2024
Accepted: 23 September 2024
Published: 14 January 2025
DOI: https://doi.org/10.1007/s10994-024-06731-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Prompt-based contrastive learning to combat the COVID-19 infodemic

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

CoBertTC: Covid-19 Text Classification Using Transformer-Based Language Models

Exploring Contrastive Learning for Long-Tailed Multi-label Text Classification

Addressing long-tailed distribution in judicial text for criminal motive classification: a balanced contrastive learning approach

Data availability

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Consent to participate

Consent for publication

Ethics approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Prompt-based contrastive learning to combat the COVID-19 infodemic

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

CoBertTC: Covid-19 Text Classification Using Transformer-Based Language Models

Exploring Contrastive Learning for Long-Tailed Multi-label Text Classification

Addressing long-tailed distribution in judicial text for criminal motive classification: a balanced contrastive learning approach

Explore related subjects

Data availability

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Consent to participate

Consent for publication

Ethics approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation