ABSTRACT
The COVID-19 pandemic has brought about a proliferation of harmful news articles online, with sources lacking credibility and misrepresenting scientific facts. Misinformation has real consequences for consumer health search, i.e., users searching for health information. In the context of multi-stage ranking architectures, there has been little work exploring whether they prioritize correct and credible information over misinformation. We find that, indeed, training models on standard relevance ranking datasets like MS MARCO passage---which have been curated to contain mostly credible information---yields models that might also promote harmful misinformation. To rectify this, we propose a label prediction technique that can separate helpful from harmful content. Our design leverages pretrained sequence-to-sequence transformer models for both relevance ranking and label prediction. Evaluated at the TREC 2020 Health Misinformation Track, our techniques represent the top-ranked system: Our best submitted run was 19.2 points higher than the second-best run based on the primary metric, a 68% relative improvement. Additional post-hoc experiments show that we can boost effectiveness by another 3.5 points.
Supplemental Material
- Mustafa Abualsaud, Christina Lioma, Maria Maistro, Mark D. Smucker, and Guido Zuccon. 2019. Overview of the TREC 2019 Decision Track. In Proceedings of the Twenty-Eigth Text REtrieval Conference (TREC 2019).Google Scholar
- Zeynep Akkalyoncu Yilmaz, Charles L. A. Clarke, and Jimmy Lin. 2020. A Lightweight Environment for Learning Experimental IR Research Practices. In Proceedings of the 43rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2020). 2113--2116.Google Scholar
- Nima Asadi and Jimmy Lin. 2013. Effectiveness/Efficiency Tradeoffs for Candidate Generation in Multi-Stage Retrieval Architectures. In Proceedings of the 36th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2013). Dublin, Ireland, 997--1000.Google ScholarDigital Library
- Charles L.A. Clarke, Maria Maistro, and Mark D. Smucker. 2020 a. Overview of the TREC 2020 Health Misinformation Track (Notebook). In Proceedings of the Twenty-Ninth Text REtrieval Conference (TREC 2020).Google Scholar
- Charles L.A. Clarke, Alexandra Vtyurina, and Mark D. Smucker. 2020 c. Offline Evaluation without Gain. In Proceedings of the 2020 ACM SIGIR International Conference on Theory of Information Retrieval (ICTIR '20). 185--192.Google Scholar
- Charles L. A. Clarke, Mark D. Smucker, and Alexandra Vtyurina. 2020 b. Offline Evaluation by Maximum Similarity to an Ideal Ranking. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (CIKM '20). 225--234.Google ScholarDigital Library
- Zhuyun Dai and Jamie Callan. 2019. Deeper Text Understanding for IR with Contextual Neural Language Modeling. In Proceedings of the 42nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019). Paris, France, 985--988.Google ScholarDigital Library
- Andreas Hanselowski, Christian Stab, Claudia Schulz, Zile Li, and Iryna Gurevych. 2019. A Richly Annotated Corpus for Different Tasks in Automated Fact-Checking. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL). Hong Kong, China, 493--503.Google ScholarCross Ref
- Nayeon Lee, Yejin Bang, Andrea Madotto, and Pascale Fung. 2020. Misinformation has High Perplexity. arXiv:2006.04666 (2020).Google Scholar
- Jimmy Lin, Xueguang Ma, Sheng-Chieh Lin, Jheng-Hong Yang, Ronak Pradeep, and Rodrigo Nogueira. 2021. Pyserini: A Python Toolkit for Reproducible Information Retrieval Research with Sparse and Dense Representations. In Proceedings of the 44th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021).Google ScholarDigital Library
- Jimmy Lin, Rodrigo Nogueira, and Andrew Yates. 2020. Pretrained Transformers for Text Ranking: BERT and Beyond. arXiv:2010.06467 (2020).Google Scholar
- Sean MacAvaney, Arman Cohan, and Nazli Goharian. 2020. SLEDGE: A Simple Yet Effective Zero-Shot Baseline for Coronavirus Scientific Knowledge Search. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 4171--4179.Google Scholar
- Irina Matveeva, Chris Burges, Timo Burkard, Andy Laucius, and Leon Wong. 2006. High Accuracy Retrieval with Multiple Nested Ranker. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2006). Seattle, Washington, 437--444.Google ScholarDigital Library
- Rodrigo Nogueira, Zhiying Jiang, Ronak Pradeep, and Jimmy Lin. 2020. Document Ranking with a Pretrained Sequence-to-Sequence Model. In Findings of EMNLP.Google Scholar
- Ronak Pradeep, Xueguang Ma, Rodrigo Nogueira, and Jimmy Lin. 2021 a. Scientific Claim Verification with VerT5erini. In Proceedings of the 12th International Workshop on Health Text Mining and Information Analysis. 94--103.Google Scholar
- Ronak Pradeep, Rodrigo Nogueira, and Jimmy Lin. 2021 b. The Expando-Mono-Duo Design Pattern for Text Ranking with Pretrained Sequence-to-Sequence Models. arXiv:2101.05667 (2021).Google Scholar
- Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research, Vol. 21, 140 (2020), 1--67.Google Scholar
- Stephen E. Robertson, Steve Walker, Susan Jones, Micheline Hancock-Beaulieu, and Mike Gatford. 1994. Okapi at TREC-3. In Proceedings of the 3rd Text REtrieval Conference (TREC-3). 109--126.Google Scholar
- James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal. 2018. FEVER: A Large-scale Dataset for Fact Extraction and VERification. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). New Orleans, Louisiana, 809--819.Google ScholarCross Ref
- David Wadden, Shanchuan Lin, Kyle Lo, Lucy Lu Wang, Madeleine van Zuylen, Arman Cohan, and Hannaneh Hajishirzi. 2020. Fact or Fiction: Verifying Scientific Claims. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 7534--7550.Google ScholarCross Ref
- Lidan Wang, Jimmy Lin, and Donald Metzler. 2011. A Cascade Ranking Model for Efficient Ranked Retrieval. In Proceedings of the 34th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2011). Beijing, China, 105--114.Google ScholarDigital Library
- Lucy Lu Wang, Kyle Lo, Yoganand Chandrasekhar, Russell Reas, Jiangjiang Yang, Doug Burdick, Darrin Eide, Kathryn Funk, Yannis Katsis, Rodney Kinney, Yunyao Li, Ziyang Liu, William Merrill, Paul Mooney, Dewey Murdick, Devvret Rishi, Jerry Sheehan, Zhihong Shen, Brandon Stilson, Alex Wade, Kuansan Wang, Nancy Xin Ru Wang, Chris Wilhelm, Boya Xie, Douglas Raymond, Daniel S. Weld, Oren Etzioni, and Sebastian Kohlmeier. 2020. CORD-19: The COVID-19 Open Research Dataset. arxiv: 2004.10706 [cs.DL]Google Scholar
- Adina Williams, Nikita Nangia, and Samuel Bowman. 2018. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). New Orleans, Louisiana, 1112--1122.Google ScholarCross Ref
- Peilin Yang, Hui Fang, and Jimmy Lin. 2017. Anserini: Enabling the Use of Lucene for Information Retrieval Research. In Proceedings of the 40th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017). Tokyo, Japan, 1253--1256.Google ScholarDigital Library
- Peilin Yang, Hui Fang, and Jimmy Lin. 2018. Anserini: Reproducible Ranking Baselines Using Lucene. Journal of Data and Information Quality, Vol. 10, 4 (2018), Article 16.Google ScholarDigital Library
- Xinyu Zhang, Andrew Yates, and Jimmy Lin. 2021. Comparing Score Aggregation Approaches for Document Retrieval with Pretrained Transformers. In Proceedings of the 43rd European Conference on Information Retrieval (ECIR 2021), Part II. 150--163.Google ScholarDigital Library
Index Terms
- Vera: Prediction Techniques for Reducing Harmful Misinformation in Consumer Health Search
Recommendations
How Misinformation Density Affects Health Information Search
WWW '22: Proceedings of the ACM Web Conference 2022Search engine results can include misinformation that is inaccurate, misleading, or even harmful. But people may not recognize or realize false information results when searching online. We suspect that the percentage of misinformation search results (...
Health Misinformation in Search and Social Media
DH '17: Proceedings of the 2017 International Conference on Digital HealthPeople regularly use web search and social media to investigate health related issues. This type of Internet data might contain misinformation i.e incorrect information which contradicts current established medical understanding. If people are ...
Going beyond fact-checking to fight health misinformation: A multi-level analysis of the Twitter response to health news stories
AbstractHealth misinformation has become an unfortunate truism of social media platforms, where lies could spread faster than truth. Despite considerable work devoted to suppressing fake news, health misinformation, including low-quality ...
Highlights- This study analyzes health news reports published in 10 years and 50,000 Twitter posts.
Comments