ABSTRACT
Online health communities are a valuable source of information for patients and physicians. However, such user-generated resources are often plagued by inaccuracies and misinformation. In this work we propose a method for automatically establishing the credibility of user-generated medical statements and the trustworthiness of their authors by exploiting linguistic cues and distant supervision from expert sources. To this end we introduce a probabilistic graphical model that jointly learns user trustworthiness, statement credibility, and language objectivity.
We apply this methodology to the task of extracting rare or unknown side-effects of medical drugs --- this being one of the problems where large scale non-expert data has the potential to complement expert medical knowledge. We show that our method can reliably extract side-effects and filter out false statements, while identifying trustworthy users that are likely to contribute valuable medical information.
Supplemental Material
- B.T. Adler, L. Alfaro. A content-driven reputation system for the Wikipedia. WWW, 2007. Google ScholarDigital Library
- N. Agarwal, H. Liu. Trust in Blogosphere. Encyclopedia of Database Systems, 2009.Google ScholarCross Ref
- L. Alfaro, A. Kulshreshtha, I. Pye, B.T. Adler. Reputation systems for open collaboration. Commun. ACM, 2011. Google ScholarDigital Library
- J. Björne, F. Ginter, S. Pyysalo, J. Tsujii, T. Salakoski. Complex event extraction at PubMed scale. Bioinformatics {ISMB}, 2010. Google ScholarDigital Library
- P. Bohannon, N. Dalvi, Y. Filmus, N. Jacoby, S. Keerthi, A. Kirpal. Automatic web-scale information extraction. SIGMOD, 2012. Google ScholarDigital Library
- M. Bundschus, M. Dejori, M. Stetter, V. Tresp, H.P. Kriegel. Extraction of semantic biomedical relations from text using conditional random fields. BMC Bioinformatics, 2008.Google ScholarCross Ref
- R.J.W. Cline, K.M. Haynes. Consumer health information seeking on the Internet: the state of the art. Health education research, 2001.Google Scholar
- J. Coates. Epistemic Modality and Spoken Discourse. Transactions of the Philological Society, 1987.Google ScholarCross Ref
- Z. Despotovic. Trust and Reputation in Peer-to-Peer Systems. Encyclopedia of Database Systems, 2009.Google Scholar
- X. Dong, L. Berti-Equille, Y. Hu, D. Srivastava. SOLOMON: Seeking the Truth Via Copying Detection. PVLDB, 2010. Google ScholarDigital Library
- P. Ernst, C. Meng, A. Siu, G. Weikum. KnowLife: a Knowledge Graph for Health and Life Sciences. ICDE, 2014.Google ScholarCross Ref
- R.E. Fan, K.W. Chang, C.J. Hsieh, X.R. Wang, C.J. Lin. LIBLINEAR: A library for large linear classification. JMLR, 2008. Google ScholarDigital Library
- S. Fox, M. Duggan. Health online 2013. Pew Internet and American Life Project, 2013.Google Scholar
- S. Greene, P. Resnik. More than Words: Syntactic Packaging and Implicit Sentiment. HLT-NAACL, 2009. Google ScholarDigital Library
- R.V. Guha, R. Kumar, P. Raghavan, A. Tomkins. Propagation of trust and distrust. WWW, 2004. Google ScholarDigital Library
- C. Hang, Z. Zhang, M.P. Singh. Shin: Generalized Trust Propagation with Limited Evidence. IEEE Computer, 2013. Google ScholarDigital Library
- IMS Institute for Healthcare Informatics. Engaging Patients through Social Media. Report, 2014, http://www.theimsinstitute.org/.Google Scholar
- K. Järvelin. J. Kekäläinen. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst., 2002. Google ScholarDigital Library
- P. Jindal, D. Roth. End-to-End Coreference Resolution for Clinical Narratives. IJCAI, 2013. Google ScholarDigital Library
- S.D. Kamvar, M.T. Schlosser, H. Garcia-Molina. The Eigentrust algorithm for reputation management in P2P networks. WWW, 2003. Google ScholarDigital Library
- D. Koller, N. Friedman. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009. Google ScholarDigital Library
- M. Krallinger, A. Valencia, L. Hirschman. Linking Genes to Literature: Text Mining, Information Extraction, and Retrieval Applications for Biology. Genome Biology, 2008.Google ScholarCross Ref
- R. Krishnamurthy, Y. Li, S. Raghavan, F. Reiss, S. Vaithyanathan. Web Information Extraction. Encyclopedia of Database Systems, 2009.Google ScholarCross Ref
- X. Li, X.L. Dong, K. Lyons, W. Meng, D. Srivastava. Truth Finding on the Deep Web: Is the Problem Solved? PVLDB, 2012. Google ScholarDigital Library
- X. Li, W. Meng, C.T. Yu. T-verifier: Verifying truthfulness of fact statements. ICDE, 2011. Google ScholarDigital Library
- C. Lin, Y. He, R. Everson. Sentence Subjectivity Detection with Weakly-Supervised Learning. IJCNLP, 2011.Google Scholar
- C. Lin, R.C. Weng, S.S Keerthi. Trust Region Newton Method for Logistic Regression. JMLR, 2008. Google ScholarDigital Library
- B. Liu. Sentiment Analysis and Opinion Mining. Synthesis Lectures on Human Language Technologies, Morgan & Claypool Publishers, 2012.Google Scholar
- A. McCallum, K. Bellare, F. Pereira. A conditional random field for discriminatively-trained finite-state string edit distance. UAI, 2005.Google ScholarCross Ref
- S. Mukherjee, G. Basu, S. Joshi. Joint Author Sentiment Topic Model. SDM, 2014.Google ScholarCross Ref
- S. Mukherjee, P. Bhattacharyya. Sentiment Analysis in Twitter with Lightweight Discourse Analysis. COLING, 2012.Google Scholar
- B. Pang, L. Lee. Opinion Mining and Sentiment Analysis. Foundations and Trends in Information Retrieval, 2007. Google ScholarDigital Library
- J. Pasternack, D. Roth. Knowing What to Believe (when you already know something). COLING, 2010. Google ScholarDigital Library
- J. Pasternack, D. Roth. Latent credibility analysis. WWW, 2013. Google ScholarDigital Library
- J. Pasternack, D. Roth. Making Better Informed Trust Decisions with Generalized Fact-Finding. IJCAI, 2011. Google ScholarDigital Library
- M.J. Paul, M. Dredze. Drug Extraction from the Web: Summarizing Drug Experiences with Multi-Dimensional Topic Models. HLT-NAACL, 2013.Google Scholar
- G. Peterson, P Aslani, K.A. Williams. How do consumers search for and appraise information on medicines on the Internet? A qualitative study using focus groups. Journal of Medical Internet Research, 2003.Google ScholarCross Ref
- M. Recasens, C. Danescu-Niculescu-Mizil, D. Jurafsky. Linguistic Models for Analyzing and Detecting Biased Language. ACL, 2013.Google Scholar
- S. Sarawagi. Information Extraction. Foundations and Trends in Databases, 2008. Google ScholarDigital Library
- C. Strapparava, A. Valitutti. Wordnet-affect: an affective extension of Wordnet. LREC, 2004.Google Scholar
- F.M. Suchanek, G. Weikum. Knowledge harvesting from text and Web sources. ICDE, 2013. Google ScholarDigital Library
- C.A. Sutton, A. McCallum. An Introduction to Conditional Random Fields. Foundations and Trends in Machine Learning, 2012.Google ScholarDigital Library
- P.D. Turney. Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. ACL, 2002. Google ScholarDigital Library
- V.G.V. Vydiswaran, C. Zhai, D. Roth. Content-driven Trust Propagation Framework. KDD, 2011. Google ScholarDigital Library
- V.G.V. Vydiswaran, C. Zhai, D. Roth. Gauging the Internet Doctor: Ranking Medical Claims based on Community Knowledge. KDD Workshop on Data Mining for Healthcare, 2011. Google ScholarDigital Library
- P. Westney. How to Be More-or-Less Certain in English - Scalarity in Epistemic Modality. IRAL, 1986.Google Scholar
- R.W. White, R. Harpaz, N.H. Shah, W. DuMouchel, E. Horvitz. Toward Enhanced Pharmacovigilance using Patient-Generated Data on the Internet. Nature CPT, 2014.Google ScholarCross Ref
- R.W. White, E. Horvitz. From health search to healthcare: explorations of intention and utilization via query logs and user surveys. JAMIA, 2014.Google ScholarCross Ref
- J. Wiebe, E. Riloff. Creating Subjective and Objective Sentence Classifiers from Unannotated Texts. CICLing, 2005. Google ScholarDigital Library
- J. Wiebe, E. Riloff. Finding Mutual Benefit between Subjectivity Analysis and Information Extraction. Trans. Affective Computing, 2011. Google ScholarDigital Library
- F. Wolf, E. Gibson, T. Desmet. Discourse coherence and pronoun resolution. Language and Cognitive Processes, 2004.Google ScholarCross Ref
- Y. Xu, K. Hong, J. Tsujii, E.C. Chang. Feature engineering combined with machine learning and rule-based methods for structured information extraction from narrative clinical discharge summaries. JAMIA, 2012.Google ScholarCross Ref
- X. Yin, J. Han, P.S. Yu. Truth discovery with multiple conflicting information providers on the Web. KDD, 2007. Google ScholarDigital Library
- X. Zhu, Z. Ghahramani, J. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. ICML, 2003.Google ScholarDigital Library
Index Terms
- People on drugs: credibility of user statements in health communities
Recommendations
How do credibility and utility play in the user experience of health informatics services?
While the use of health informatics is increasing in health care, how it is improving health care and how users accept the services has been little studied, and due to increasing uncertainty, credibility has become a key determinant of health ...
Information Credibility: A Probabilistic Graphical Model for Identifying Credible Influenza Posts on Social Media
ICSH 2015: Revised Selected Papers of the International Conference on Smart Health - Volume 9545Social media is an important data source to compliment traditional epidemic surveillance. However, misinformation in social media hinders the exploitation of valuable information. Analysis of information credibility has drawn much attention of academia ...
What makes Web sites credible?: a report on a large quantitative study
CHI '01: Proceedings of the SIGCHI Conference on Human Factors in Computing SystemsThe credibility of web sites is becoming an increasingly important area to understand. To expand knowledge in this domain, we conducted an online study that investigated how different elements of Web sites affect people's perception of credibility. Over ...
Comments