short-paper

Vera: Prediction Techniques for Reducing Harmful Misinformation in Consumer Health Search

Authors:

Rodrigo Nogueira,

Jimmy LinAuthors Info & Claims

SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 2066 - 2070

https://doi.org/10.1145/3404835.3463120

Published: 11 July 2021 Publication History

Abstract

The COVID-19 pandemic has brought about a proliferation of harmful news articles online, with sources lacking credibility and misrepresenting scientific facts. Misinformation has real consequences for consumer health search, i.e., users searching for health information. In the context of multi-stage ranking architectures, there has been little work exploring whether they prioritize correct and credible information over misinformation. We find that, indeed, training models on standard relevance ranking datasets like MS MARCO passage---which have been curated to contain mostly credible information---yields models that might also promote harmful misinformation. To rectify this, we propose a label prediction technique that can separate helpful from harmful content. Our design leverages pretrained sequence-to-sequence transformer models for both relevance ranking and label prediction. Evaluated at the TREC 2020 Health Misinformation Track, our techniques represent the top-ranked system: Our best submitted run was 19.2 points higher than the second-best run based on the primary metric, a 68% relative improvement. Additional post-hoc experiments show that we can boost effectiveness by another 3.5 points.

Supplementary Material

MP4 File (SIGIR_SHORT_VERA.mp4)

Presentation Video for "Vera: Prediction Techniques for Reducing Harmful Misinformation in Consumer Health Search"

Download
11.88 MB

References

[1]

Mustafa Abualsaud, Christina Lioma, Maria Maistro, Mark D. Smucker, and Guido Zuccon. 2019. Overview of the TREC 2019 Decision Track. In Proceedings of the Twenty-Eigth Text REtrieval Conference (TREC 2019).

[2]

Zeynep Akkalyoncu Yilmaz, Charles L. A. Clarke, and Jimmy Lin. 2020. A Lightweight Environment for Learning Experimental IR Research Practices. In Proceedings of the 43rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2020). 2113--2116.

[3]

Nima Asadi and Jimmy Lin. 2013. Effectiveness/Efficiency Tradeoffs for Candidate Generation in Multi-Stage Retrieval Architectures. In Proceedings of the 36th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2013). Dublin, Ireland, 997--1000.

Digital Library

[4]

Charles L.A. Clarke, Maria Maistro, and Mark D. Smucker. 2020 a. Overview of the TREC 2020 Health Misinformation Track (Notebook). In Proceedings of the Twenty-Ninth Text REtrieval Conference (TREC 2020).

[5]

Charles L.A. Clarke, Alexandra Vtyurina, and Mark D. Smucker. 2020 c. Offline Evaluation without Gain. In Proceedings of the 2020 ACM SIGIR International Conference on Theory of Information Retrieval (ICTIR '20). 185--192.

[6]

Charles L. A. Clarke, Mark D. Smucker, and Alexandra Vtyurina. 2020 b. Offline Evaluation by Maximum Similarity to an Ideal Ranking. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (CIKM '20). 225--234.

Digital Library

[7]

Zhuyun Dai and Jamie Callan. 2019. Deeper Text Understanding for IR with Contextual Neural Language Modeling. In Proceedings of the 42nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019). Paris, France, 985--988.

Digital Library

[8]

Andreas Hanselowski, Christian Stab, Claudia Schulz, Zile Li, and Iryna Gurevych. 2019. A Richly Annotated Corpus for Different Tasks in Automated Fact-Checking. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL). Hong Kong, China, 493--503.

[9]

Nayeon Lee, Yejin Bang, Andrea Madotto, and Pascale Fung. 2020. Misinformation has High Perplexity. arXiv:2006.04666 (2020).

[10]

Jimmy Lin, Xueguang Ma, Sheng-Chieh Lin, Jheng-Hong Yang, Ronak Pradeep, and Rodrigo Nogueira. 2021. Pyserini: A Python Toolkit for Reproducible Information Retrieval Research with Sparse and Dense Representations. In Proceedings of the 44th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021).

Digital Library

[11]

Jimmy Lin, Rodrigo Nogueira, and Andrew Yates. 2020. Pretrained Transformers for Text Ranking: BERT and Beyond. arXiv:2010.06467 (2020).

[12]

Sean MacAvaney, Arman Cohan, and Nazli Goharian. 2020. SLEDGE: A Simple Yet Effective Zero-Shot Baseline for Coronavirus Scientific Knowledge Search. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 4171--4179.

[13]

Irina Matveeva, Chris Burges, Timo Burkard, Andy Laucius, and Leon Wong. 2006. High Accuracy Retrieval with Multiple Nested Ranker. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2006). Seattle, Washington, 437--444.

Digital Library

[14]

Rodrigo Nogueira, Zhiying Jiang, Ronak Pradeep, and Jimmy Lin. 2020. Document Ranking with a Pretrained Sequence-to-Sequence Model. In Findings of EMNLP.

[15]

Ronak Pradeep, Xueguang Ma, Rodrigo Nogueira, and Jimmy Lin. 2021 a. Scientific Claim Verification with VerT5erini. In Proceedings of the 12th International Workshop on Health Text Mining and Information Analysis. 94--103.

[16]

Ronak Pradeep, Rodrigo Nogueira, and Jimmy Lin. 2021 b. The Expando-Mono-Duo Design Pattern for Text Ranking with Pretrained Sequence-to-Sequence Models. arXiv:2101.05667 (2021).

[17]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research, Vol. 21, 140 (2020), 1--67.

[18]

Stephen E. Robertson, Steve Walker, Susan Jones, Micheline Hancock-Beaulieu, and Mike Gatford. 1994. Okapi at TREC-3. In Proceedings of the 3rd Text REtrieval Conference (TREC-3). 109--126.

[19]

James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal. 2018. FEVER: A Large-scale Dataset for Fact Extraction and VERification. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). New Orleans, Louisiana, 809--819.

[20]

David Wadden, Shanchuan Lin, Kyle Lo, Lucy Lu Wang, Madeleine van Zuylen, Arman Cohan, and Hannaneh Hajishirzi. 2020. Fact or Fiction: Verifying Scientific Claims. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 7534--7550.

[21]

Lidan Wang, Jimmy Lin, and Donald Metzler. 2011. A Cascade Ranking Model for Efficient Ranked Retrieval. In Proceedings of the 34th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2011). Beijing, China, 105--114.

Digital Library

[22]

Lucy Lu Wang, Kyle Lo, Yoganand Chandrasekhar, Russell Reas, Jiangjiang Yang, Doug Burdick, Darrin Eide, Kathryn Funk, Yannis Katsis, Rodney Kinney, Yunyao Li, Ziyang Liu, William Merrill, Paul Mooney, Dewey Murdick, Devvret Rishi, Jerry Sheehan, Zhihong Shen, Brandon Stilson, Alex Wade, Kuansan Wang, Nancy Xin Ru Wang, Chris Wilhelm, Boya Xie, Douglas Raymond, Daniel S. Weld, Oren Etzioni, and Sebastian Kohlmeier. 2020. CORD-19: The COVID-19 Open Research Dataset. arxiv: 2004.10706 [cs.DL]

[23]

Adina Williams, Nikita Nangia, and Samuel Bowman. 2018. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). New Orleans, Louisiana, 1112--1122.

[24]

Peilin Yang, Hui Fang, and Jimmy Lin. 2017. Anserini: Enabling the Use of Lucene for Information Retrieval Research. In Proceedings of the 40th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017). Tokyo, Japan, 1253--1256.

Digital Library

[25]

Peilin Yang, Hui Fang, and Jimmy Lin. 2018. Anserini: Reproducible Ranking Baselines Using Lucene. Journal of Data and Information Quality, Vol. 10, 4 (2018), Article 16.

Digital Library

[26]

Xinyu Zhang, Andrew Yates, and Jimmy Lin. 2021. Comparing Score Aggregation Approaches for Document Retrieval with Pretrained Transformers. In Proceedings of the 43rd European Conference on Information Retrieval (ECIR 2021), Part II. 150--163.

Digital Library

Cited By

Zhang BNaderi NMishra RTeodoro D(2024)Online Health Search Via Multidimensional Information Quality Assessment Based on Deep Language Models: Algorithm Development and ValidationJMIR AI10.2196/426303(e42630)Online publication date: 2-May-2024
https://doi.org/10.2196/42630
Pathiyan Cherumanal SGadiraju USpina D(2024)Everything We Hear: Towards Tackling Misinformation in PodcastsProceedings of the 26th International Conference on Multimodal Interaction10.1145/3678957.3678959(596-601)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3678957.3678959
Diaz-Garcia JGutiérrez-Batista KFernandez-Basso CRuiz MMartin-Bautista M(2024)A Flexible Big Data System for Credibility-Based Filtering of Social Media Information According to ExpertiseInternational Journal of Computational Intelligence Systems10.1007/s44196-024-00483-y17:1Online publication date: 15-Apr-2024
https://doi.org/10.1007/s44196-024-00483-y
Show More Cited By

Index Terms

Vera: Prediction Techniques for Reducing Harmful Misinformation in Consumer Health Search
1. Information systems
  1. Information retrieval
    1. Users and interactive retrieval

Recommendations

How Misinformation Density Affects Health Information Search
WWW '22: Proceedings of the ACM Web Conference 2022

Search engine results can include misinformation that is inaccurate, misleading, or even harmful. But people may not recognize or realize false information results when searching online. We suspect that the percentage of misinformation search results (...
Health Misinformation in Search and Social Media
DH '17: Proceedings of the 2017 International Conference on Digital Health

People regularly use web search and social media to investigate health related issues. This type of Internet data might contain misinformation i.e incorrect information which contradicts current established medical understanding. If people are ...
Going beyond fact-checking to fight health misinformation: A multi-level analysis of the Twitter response to health news stories
Abstract
Health misinformation has become an unfortunate truism of social media platforms, where lies could spread faster than truth. Despite considerable work devoted to suppressing fake news, health misinformation, including low-quality ...
Highlights
- This study analyzes health news reports published in 10 years and 50,000 Twitter posts.

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2021

2998 pages

ISBN:9781450380379

DOI:10.1145/3404835

General Chairs:
Fernando Diaz
(Google)
,
Chirag Shah
University of Washington
,
Torsten Suel
New York University
,
Program Chairs:
Pablo Castells
Universidad Autónoma de Madrid, Amazon
,
Rosie Jones
Spotify
,
Tetsuya Sakai
Waseda University

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

Canada First Research Excellence Fund
Natural Sciences and Engineering Research Council of Canada
Waterloo-Huawei Joint Innovation Laboratory

Conference

SIGIR '21

Sponsor:

SIGIR

SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 11 - 15, 2021

Virtual Event, Canada

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
244
Total Downloads

Downloads (Last 12 months)42
Downloads (Last 6 weeks)3

Reflects downloads up to 10 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang BNaderi NMishra RTeodoro D(2024)Online Health Search Via Multidimensional Information Quality Assessment Based on Deep Language Models: Algorithm Development and ValidationJMIR AI10.2196/426303(e42630)Online publication date: 2-May-2024
https://doi.org/10.2196/42630
Pathiyan Cherumanal SGadiraju USpina D(2024)Everything We Hear: Towards Tackling Misinformation in PodcastsProceedings of the 26th International Conference on Multimodal Interaction10.1145/3678957.3678959(596-601)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3678957.3678959
Diaz-Garcia JGutiérrez-Batista KFernandez-Basso CRuiz MMartin-Bautista M(2024)A Flexible Big Data System for Credibility-Based Filtering of Social Media Information According to ExpertiseInternational Journal of Computational Intelligence Systems10.1007/s44196-024-00483-y17:1Online publication date: 15-Apr-2024
https://doi.org/10.1007/s44196-024-00483-y
Ahmad PLiu YAli GWani MElAffendi M(2023)Robust Benchmark for Propagandist Text Detection and Mining High-Quality DataMathematics10.3390/math1112266811:12(2668)Online publication date: 12-Jun-2023
https://doi.org/10.3390/math11122668
Hu XHong ZGuo ZWen LYu PChen HDuh WHuang HKato MMothe JPoblete B(2023)Read it Twice: Towards Faithfully Interpretable Fact Verification by Revisiting EvidenceProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3592049(2319-2323)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3592049
Saeed MTraub NNicolas MDemartini GPapotti PAl Hasan MXiong L(2022)Crowdsourced Fact-Checking at TwitterProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557279(1736-1746)Online publication date: 17-Oct-2022
https://dl.acm.org/doi/10.1145/3511808.3557279
Pradeep RLi YWang YLin JAmigo ECastells PGonzalo JCarterette BCulpepper JKazai G(2022)Neural Query Synthesis and Domain-Specific Ranking Templates for Multi-Stage Clinical Trial MatchingProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531853(2325-2330)Online publication date: 6-Jul-2022
https://dl.acm.org/doi/10.1145/3477495.3531853
Zhang DVakili Tahami AAbualsaud MSmucker MAmigo ECastells PGonzalo JCarterette BCulpepper JKazai G(2022)Learning Trustworthy Web Sources to Derive Correct Answers and Reduce Health Misinformation in SearchProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531812(2099-2104)Online publication date: 6-Jul-2022
https://dl.acm.org/doi/10.1145/3477495.3531812
Upadhyay RPasi GViviani M(2022)An Unsupervised Approach to Genuine Health Information Retrieval Based on Scientific EvidenceWeb Information Systems Engineering – WISE 202210.1007/978-3-031-20891-1_10(119-135)Online publication date: 31-Oct-2022
https://dl.acm.org/doi/10.1007/978-3-031-20891-1_10
Pradeep RLiu YZhang XLi YYates ALin J(2022)Squeezing Water from a Stone: A Bag of Tricks for Further Improving Cross-Encoder Effectiveness for RerankingAdvances in Information Retrieval10.1007/978-3-030-99736-6_44(655-670)Online publication date: 10-Apr-2022
https://dl.acm.org/doi/10.1007/978-3-030-99736-6_44
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten