research-article

People on drugs: credibility of user statements in health communities

Authors:
Subhabrata Mukherjee

Max Planck Institute for Informatics, Saarbruecken, Germany

Max Planck Institute for Informatics, Saarbruecken, Germany
View Profile

,
Gerhard Weikum

Max Planck Institute for Informatics, Saarbruecken, Germany

Max Planck Institute for Informatics, Saarbruecken, Germany
View Profile

,
Cristian Danescu-Niculescu-Mizil

Max Planck Institute for Software Systems, Saarbruecken, Germany

Max Planck Institute for Software Systems, Saarbruecken, Germany
View Profile

KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2014Pages 65–74https://doi.org/10.1145/2623330.2623714

Published:24 August 2014Publication History

KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 65–74

ABSTRACT

Online health communities are a valuable source of information for patients and physicians. However, such user-generated resources are often plagued by inaccuracies and misinformation. In this work we propose a method for automatically establishing the credibility of user-generated medical statements and the trustworthiness of their authors by exploiting linguistic cues and distant supervision from expert sources. To this end we introduce a probabilistic graphical model that jointly learns user trustworthiness, statement credibility, and language objectivity.

We apply this methodology to the task of extracting rare or unknown side-effects of medical drugs --- this being one of the problems where large scale non-expert data has the potential to complement expert medical knowledge. We show that our method can reliably extract side-effects and filter out false statements, while identifying trustworthy users that are likely to contribute valuable medical information.

Supplemental Material

p65-sidebyside.mp4

mp4

299.5 MB

Download

References

B.T. Adler, L. Alfaro. A content-driven reputation system for the Wikipedia. WWW, 2007. Google ScholarDigital Library
N. Agarwal, H. Liu. Trust in Blogosphere. Encyclopedia of Database Systems, 2009.Google ScholarCross Ref
L. Alfaro, A. Kulshreshtha, I. Pye, B.T. Adler. Reputation systems for open collaboration. Commun. ACM, 2011. Google ScholarDigital Library
J. Björne, F. Ginter, S. Pyysalo, J. Tsujii, T. Salakoski. Complex event extraction at PubMed scale. Bioinformatics {ISMB}, 2010. Google ScholarDigital Library
P. Bohannon, N. Dalvi, Y. Filmus, N. Jacoby, S. Keerthi, A. Kirpal. Automatic web-scale information extraction. SIGMOD, 2012. Google ScholarDigital Library
M. Bundschus, M. Dejori, M. Stetter, V. Tresp, H.P. Kriegel. Extraction of semantic biomedical relations from text using conditional random fields. BMC Bioinformatics, 2008.Google ScholarCross Ref
R.J.W. Cline, K.M. Haynes. Consumer health information seeking on the Internet: the state of the art. Health education research, 2001.Google Scholar
J. Coates. Epistemic Modality and Spoken Discourse. Transactions of the Philological Society, 1987.Google ScholarCross Ref
Z. Despotovic. Trust and Reputation in Peer-to-Peer Systems. Encyclopedia of Database Systems, 2009.Google Scholar
X. Dong, L. Berti-Equille, Y. Hu, D. Srivastava. SOLOMON: Seeking the Truth Via Copying Detection. PVLDB, 2010. Google ScholarDigital Library
P. Ernst, C. Meng, A. Siu, G. Weikum. KnowLife: a Knowledge Graph for Health and Life Sciences. ICDE, 2014.Google ScholarCross Ref
R.E. Fan, K.W. Chang, C.J. Hsieh, X.R. Wang, C.J. Lin. LIBLINEAR: A library for large linear classification. JMLR, 2008. Google ScholarDigital Library
S. Fox, M. Duggan. Health online 2013. Pew Internet and American Life Project, 2013.Google Scholar
S. Greene, P. Resnik. More than Words: Syntactic Packaging and Implicit Sentiment. HLT-NAACL, 2009. Google ScholarDigital Library
R.V. Guha, R. Kumar, P. Raghavan, A. Tomkins. Propagation of trust and distrust. WWW, 2004. Google ScholarDigital Library
C. Hang, Z. Zhang, M.P. Singh. Shin: Generalized Trust Propagation with Limited Evidence. IEEE Computer, 2013. Google ScholarDigital Library
IMS Institute for Healthcare Informatics. Engaging Patients through Social Media. Report, 2014, http://www.theimsinstitute.org/.Google Scholar
K. Järvelin. J. Kekäläinen. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst., 2002. Google ScholarDigital Library
P. Jindal, D. Roth. End-to-End Coreference Resolution for Clinical Narratives. IJCAI, 2013. Google ScholarDigital Library
S.D. Kamvar, M.T. Schlosser, H. Garcia-Molina. The Eigentrust algorithm for reputation management in P2P networks. WWW, 2003. Google ScholarDigital Library
D. Koller, N. Friedman. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009. Google ScholarDigital Library
M. Krallinger, A. Valencia, L. Hirschman. Linking Genes to Literature: Text Mining, Information Extraction, and Retrieval Applications for Biology. Genome Biology, 2008.Google ScholarCross Ref
R. Krishnamurthy, Y. Li, S. Raghavan, F. Reiss, S. Vaithyanathan. Web Information Extraction. Encyclopedia of Database Systems, 2009.Google ScholarCross Ref
X. Li, X.L. Dong, K. Lyons, W. Meng, D. Srivastava. Truth Finding on the Deep Web: Is the Problem Solved? PVLDB, 2012. Google ScholarDigital Library
X. Li, W. Meng, C.T. Yu. T-verifier: Verifying truthfulness of fact statements. ICDE, 2011. Google ScholarDigital Library
C. Lin, Y. He, R. Everson. Sentence Subjectivity Detection with Weakly-Supervised Learning. IJCNLP, 2011.Google Scholar
C. Lin, R.C. Weng, S.S Keerthi. Trust Region Newton Method for Logistic Regression. JMLR, 2008. Google ScholarDigital Library
B. Liu. Sentiment Analysis and Opinion Mining. Synthesis Lectures on Human Language Technologies, Morgan & Claypool Publishers, 2012.Google Scholar
A. McCallum, K. Bellare, F. Pereira. A conditional random field for discriminatively-trained finite-state string edit distance. UAI, 2005.Google ScholarCross Ref
S. Mukherjee, G. Basu, S. Joshi. Joint Author Sentiment Topic Model. SDM, 2014.Google ScholarCross Ref
S. Mukherjee, P. Bhattacharyya. Sentiment Analysis in Twitter with Lightweight Discourse Analysis. COLING, 2012.Google Scholar
B. Pang, L. Lee. Opinion Mining and Sentiment Analysis. Foundations and Trends in Information Retrieval, 2007. Google ScholarDigital Library
J. Pasternack, D. Roth. Knowing What to Believe (when you already know something). COLING, 2010. Google ScholarDigital Library
J. Pasternack, D. Roth. Latent credibility analysis. WWW, 2013. Google ScholarDigital Library
J. Pasternack, D. Roth. Making Better Informed Trust Decisions with Generalized Fact-Finding. IJCAI, 2011. Google ScholarDigital Library
M.J. Paul, M. Dredze. Drug Extraction from the Web: Summarizing Drug Experiences with Multi-Dimensional Topic Models. HLT-NAACL, 2013.Google Scholar
G. Peterson, P Aslani, K.A. Williams. How do consumers search for and appraise information on medicines on the Internet? A qualitative study using focus groups. Journal of Medical Internet Research, 2003.Google ScholarCross Ref
M. Recasens, C. Danescu-Niculescu-Mizil, D. Jurafsky. Linguistic Models for Analyzing and Detecting Biased Language. ACL, 2013.Google Scholar
S. Sarawagi. Information Extraction. Foundations and Trends in Databases, 2008. Google ScholarDigital Library
C. Strapparava, A. Valitutti. Wordnet-affect: an affective extension of Wordnet. LREC, 2004.Google Scholar
F.M. Suchanek, G. Weikum. Knowledge harvesting from text and Web sources. ICDE, 2013. Google ScholarDigital Library
C.A. Sutton, A. McCallum. An Introduction to Conditional Random Fields. Foundations and Trends in Machine Learning, 2012.Google ScholarDigital Library
P.D. Turney. Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. ACL, 2002. Google ScholarDigital Library
V.G.V. Vydiswaran, C. Zhai, D. Roth. Content-driven Trust Propagation Framework. KDD, 2011. Google ScholarDigital Library
V.G.V. Vydiswaran, C. Zhai, D. Roth. Gauging the Internet Doctor: Ranking Medical Claims based on Community Knowledge. KDD Workshop on Data Mining for Healthcare, 2011. Google ScholarDigital Library
P. Westney. How to Be More-or-Less Certain in English - Scalarity in Epistemic Modality. IRAL, 1986.Google Scholar
R.W. White, R. Harpaz, N.H. Shah, W. DuMouchel, E. Horvitz. Toward Enhanced Pharmacovigilance using Patient-Generated Data on the Internet. Nature CPT, 2014.Google ScholarCross Ref
R.W. White, E. Horvitz. From health search to healthcare: explorations of intention and utilization via query logs and user surveys. JAMIA, 2014.Google ScholarCross Ref
J. Wiebe, E. Riloff. Creating Subjective and Objective Sentence Classifiers from Unannotated Texts. CICLing, 2005. Google ScholarDigital Library
J. Wiebe, E. Riloff. Finding Mutual Benefit between Subjectivity Analysis and Information Extraction. Trans. Affective Computing, 2011. Google ScholarDigital Library
F. Wolf, E. Gibson, T. Desmet. Discourse coherence and pronoun resolution. Language and Cognitive Processes, 2004.Google ScholarCross Ref
Y. Xu, K. Hong, J. Tsujii, E.C. Chang. Feature engineering combined with machine learning and rule-based methods for structured information extraction from narrative clinical discharge summaries. JAMIA, 2012.Google ScholarCross Ref
X. Yin, J. Han, P.S. Yu. Truth discovery with multiple conflicting information providers on the Web. KDD, 2007. Google ScholarDigital Library
X. Zhu, Z. Ghahramani, J. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. ICML, 2003.Google ScholarDigital Library

Index Terms

People on drugs: credibility of user statements in health communities
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Document filtering
      2. Information extraction

Recommendations

How do credibility and utility play in the user experience of health informatics services?

While the use of health informatics is increasing in health care, how it is improving health care and how users accept the services has been little studied, and due to increasing uncertainty, credibility has become a key determinant of health ...
Read More
Information Credibility: A Probabilistic Graphical Model for Identifying Credible Influenza Posts on Social Media
ICSH 2015: Revised Selected Papers of the International Conference on Smart Health - Volume 9545

Social media is an important data source to compliment traditional epidemic surveillance. However, misinformation in social media hinders the exploitation of valuable information. Analysis of information credibility has drawn much attention of academia ...
Read More
What makes Web sites credible?: a report on a large quantitative study
CHI '01: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

The credibility of web sites is becoming an increasingly important area to understand. To expand knowledge in this domain, we conducted an online study that investigated how different elements of Web sites affect people's perception of credibility. Over ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2014
2028 pages
ISBN:9781450329569
DOI:10.1145/2623330
General Chairs:
Sofus Macskassy
Facebook
,
Claudia Perlich
Dstillery
,
Program Chairs:
Jure Leskovec
Stanford University
,
Wei Wang
UCLA
,
Rayid Ghani
University of Chicago
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 August 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
credibility
objectivity
probabilistic graphical models
trustworthiness
veracity
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '14 Paper Acceptance Rate151of1,036submissions,15%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

KDD '24: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 77
  Total Citations
  View Citations
- 1,133
  Total Downloads
- Downloads (Last 12 months)19
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

People on drugs: credibility of user statements in health communities

KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

How do credibility and utility play in the user experience of health informatics services?

Information Credibility: A Probabilistic Graphical Model for Identifying Credible Influenza Posts on Social Media

What makes Web sites credible?: a report on a large quantitative study