skip to main content
10.1145/2623330.2623714acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

People on drugs: credibility of user statements in health communities

Published:24 August 2014Publication History

ABSTRACT

Online health communities are a valuable source of information for patients and physicians. However, such user-generated resources are often plagued by inaccuracies and misinformation. In this work we propose a method for automatically establishing the credibility of user-generated medical statements and the trustworthiness of their authors by exploiting linguistic cues and distant supervision from expert sources. To this end we introduce a probabilistic graphical model that jointly learns user trustworthiness, statement credibility, and language objectivity.

We apply this methodology to the task of extracting rare or unknown side-effects of medical drugs --- this being one of the problems where large scale non-expert data has the potential to complement expert medical knowledge. We show that our method can reliably extract side-effects and filter out false statements, while identifying trustworthy users that are likely to contribute valuable medical information.

Skip Supplemental Material Section

Supplemental Material

p65-sidebyside.mp4

mp4

299.5 MB

References

  1. B.T. Adler, L. Alfaro. A content-driven reputation system for the Wikipedia. WWW, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. N. Agarwal, H. Liu. Trust in Blogosphere. Encyclopedia of Database Systems, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  3. L. Alfaro, A. Kulshreshtha, I. Pye, B.T. Adler. Reputation systems for open collaboration. Commun. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Björne, F. Ginter, S. Pyysalo, J. Tsujii, T. Salakoski. Complex event extraction at PubMed scale. Bioinformatics {ISMB}, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Bohannon, N. Dalvi, Y. Filmus, N. Jacoby, S. Keerthi, A. Kirpal. Automatic web-scale information extraction. SIGMOD, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Bundschus, M. Dejori, M. Stetter, V. Tresp, H.P. Kriegel. Extraction of semantic biomedical relations from text using conditional random fields. BMC Bioinformatics, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  7. R.J.W. Cline, K.M. Haynes. Consumer health information seeking on the Internet: the state of the art. Health education research, 2001.Google ScholarGoogle Scholar
  8. J. Coates. Epistemic Modality and Spoken Discourse. Transactions of the Philological Society, 1987.Google ScholarGoogle ScholarCross RefCross Ref
  9. Z. Despotovic. Trust and Reputation in Peer-to-Peer Systems. Encyclopedia of Database Systems, 2009.Google ScholarGoogle Scholar
  10. X. Dong, L. Berti-Equille, Y. Hu, D. Srivastava. SOLOMON: Seeking the Truth Via Copying Detection. PVLDB, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Ernst, C. Meng, A. Siu, G. Weikum. KnowLife: a Knowledge Graph for Health and Life Sciences. ICDE, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  12. R.E. Fan, K.W. Chang, C.J. Hsieh, X.R. Wang, C.J. Lin. LIBLINEAR: A library for large linear classification. JMLR, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Fox, M. Duggan. Health online 2013. Pew Internet and American Life Project, 2013.Google ScholarGoogle Scholar
  14. S. Greene, P. Resnik. More than Words: Syntactic Packaging and Implicit Sentiment. HLT-NAACL, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R.V. Guha, R. Kumar, P. Raghavan, A. Tomkins. Propagation of trust and distrust. WWW, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Hang, Z. Zhang, M.P. Singh. Shin: Generalized Trust Propagation with Limited Evidence. IEEE Computer, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. IMS Institute for Healthcare Informatics. Engaging Patients through Social Media. Report, 2014, http://www.theimsinstitute.org/.Google ScholarGoogle Scholar
  18. K. Järvelin. J. Kekäläinen. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst., 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. P. Jindal, D. Roth. End-to-End Coreference Resolution for Clinical Narratives. IJCAI, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S.D. Kamvar, M.T. Schlosser, H. Garcia-Molina. The Eigentrust algorithm for reputation management in P2P networks. WWW, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Koller, N. Friedman. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Krallinger, A. Valencia, L. Hirschman. Linking Genes to Literature: Text Mining, Information Extraction, and Retrieval Applications for Biology. Genome Biology, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  23. R. Krishnamurthy, Y. Li, S. Raghavan, F. Reiss, S. Vaithyanathan. Web Information Extraction. Encyclopedia of Database Systems, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  24. X. Li, X.L. Dong, K. Lyons, W. Meng, D. Srivastava. Truth Finding on the Deep Web: Is the Problem Solved? PVLDB, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. X. Li, W. Meng, C.T. Yu. T-verifier: Verifying truthfulness of fact statements. ICDE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C. Lin, Y. He, R. Everson. Sentence Subjectivity Detection with Weakly-Supervised Learning. IJCNLP, 2011.Google ScholarGoogle Scholar
  27. C. Lin, R.C. Weng, S.S Keerthi. Trust Region Newton Method for Logistic Regression. JMLR, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. B. Liu. Sentiment Analysis and Opinion Mining. Synthesis Lectures on Human Language Technologies, Morgan & Claypool Publishers, 2012.Google ScholarGoogle Scholar
  29. A. McCallum, K. Bellare, F. Pereira. A conditional random field for discriminatively-trained finite-state string edit distance. UAI, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  30. S. Mukherjee, G. Basu, S. Joshi. Joint Author Sentiment Topic Model. SDM, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  31. S. Mukherjee, P. Bhattacharyya. Sentiment Analysis in Twitter with Lightweight Discourse Analysis. COLING, 2012.Google ScholarGoogle Scholar
  32. B. Pang, L. Lee. Opinion Mining and Sentiment Analysis. Foundations and Trends in Information Retrieval, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Pasternack, D. Roth. Knowing What to Believe (when you already know something). COLING, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Pasternack, D. Roth. Latent credibility analysis. WWW, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. Pasternack, D. Roth. Making Better Informed Trust Decisions with Generalized Fact-Finding. IJCAI, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M.J. Paul, M. Dredze. Drug Extraction from the Web: Summarizing Drug Experiences with Multi-Dimensional Topic Models. HLT-NAACL, 2013.Google ScholarGoogle Scholar
  37. G. Peterson, P Aslani, K.A. Williams. How do consumers search for and appraise information on medicines on the Internet? A qualitative study using focus groups. Journal of Medical Internet Research, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  38. M. Recasens, C. Danescu-Niculescu-Mizil, D. Jurafsky. Linguistic Models for Analyzing and Detecting Biased Language. ACL, 2013.Google ScholarGoogle Scholar
  39. S. Sarawagi. Information Extraction. Foundations and Trends in Databases, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. C. Strapparava, A. Valitutti. Wordnet-affect: an affective extension of Wordnet. LREC, 2004.Google ScholarGoogle Scholar
  41. F.M. Suchanek, G. Weikum. Knowledge harvesting from text and Web sources. ICDE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. C.A. Sutton, A. McCallum. An Introduction to Conditional Random Fields. Foundations and Trends in Machine Learning, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. P.D. Turney. Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. ACL, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. V.G.V. Vydiswaran, C. Zhai, D. Roth. Content-driven Trust Propagation Framework. KDD, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. V.G.V. Vydiswaran, C. Zhai, D. Roth. Gauging the Internet Doctor: Ranking Medical Claims based on Community Knowledge. KDD Workshop on Data Mining for Healthcare, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. P. Westney. How to Be More-or-Less Certain in English - Scalarity in Epistemic Modality. IRAL, 1986.Google ScholarGoogle Scholar
  47. R.W. White, R. Harpaz, N.H. Shah, W. DuMouchel, E. Horvitz. Toward Enhanced Pharmacovigilance using Patient-Generated Data on the Internet. Nature CPT, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  48. R.W. White, E. Horvitz. From health search to healthcare: explorations of intention and utilization via query logs and user surveys. JAMIA, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  49. J. Wiebe, E. Riloff. Creating Subjective and Objective Sentence Classifiers from Unannotated Texts. CICLing, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. J. Wiebe, E. Riloff. Finding Mutual Benefit between Subjectivity Analysis and Information Extraction. Trans. Affective Computing, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. F. Wolf, E. Gibson, T. Desmet. Discourse coherence and pronoun resolution. Language and Cognitive Processes, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  52. Y. Xu, K. Hong, J. Tsujii, E.C. Chang. Feature engineering combined with machine learning and rule-based methods for structured information extraction from narrative clinical discharge summaries. JAMIA, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  53. X. Yin, J. Han, P.S. Yu. Truth discovery with multiple conflicting information providers on the Web. KDD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. X. Zhu, Z. Ghahramani, J. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. ICML, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. People on drugs: credibility of user statements in health communities

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining
          August 2014
          2028 pages
          ISBN:9781450329569
          DOI:10.1145/2623330

          Copyright © 2014 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 24 August 2014

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          KDD '14 Paper Acceptance Rate151of1,036submissions,15%Overall Acceptance Rate1,133of8,635submissions,13%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader