Abstract
Despite tremendous efforts made before the release of every drug, some adverse drug reactions (ADRs) may go undetected and thus, cause harm to both the users and to the pharmaceutical companies. One plausible venue to collect evidence of such ADRs is online social media, where patients and doctors discuss medical conditions and their treatments. There is substantial previous research on ADRs extraction from English online forums. However, very limited research was done on Chinese data. In this paper, we try to use the posts from two popular Chinese social media as the original dataset. We propose a semi-supervised learning framework that detects mentions of medications and colloquial ADR terms and extracts lexicon-syntactic features from natural language text to recognize positive associations between drug use and ADRs. The key contribution is an automatic label generation algorithm, which requires very little manual annotation. This bootstrapping algorithm could also be further applied on English data. The research results indicate that our algorithm outperforms the hidden Markov model and conditional random fields. With this approach, we discovered a large number of side effects for a variety of popular medicines in real world scenarios.




Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Sogou Pinyin is a Chinese input method, and there are many available lexicons, one of which is the ADRs lexicon, http://pinyin.sogou.com/dict/detail/index/644.
AveP is defined athttps://en.wikipedia.org/wiki/Information_retrieval.
References
Benton A, Ungar LH, Hill S, Hennessy S, Mao J, Chung A, Leonard CE, Holmes JH (2011) Identifying potential adverse effects using the web: a new approach to medical hypothesis generation. J Biomed Inform 44(6):989–996
Bombardier C, Laine L, Reicin A, Shapiro D, Burgos-Vargas R, Davis B, Day R, Ferraz MB, Hawkey CJ, Hochberg MC et al (2000) Comparison of upper gastrointestinal toxicity of rofecoxib and naproxen in patients with rheumatoid arthritis. N Engl J Med 343(21):1520–1528
Bresalier RS, Sandler RS, Quan H, Bolognese JA, Oxenius B, Horgan K, Lines C, Riddell R, Morton D, Lanas A et al (2005) Cardiovascular events associated with rofecoxib in a colorectal adenoma chemoprevention trial. N Engl J Med 352(11):1092–1102
Brown E, Wood L, Wood S (1999) The medical dictionary for regulatory activities (meddra). Drug Saf 20(2):109–117
Cocos A, Fiks AG, Masino AJ (2017) Deep learning for pharmacovigilance: recurrent neural network architectures for labeling adverse drug reactions in twitter posts. J Am Med Inform Assoc 24(4):813–821
Freifeld CC, Brownstein JS, Menone CM, Bao W, Filice R, Kass-Hout T, Dasgupta N (2014) Digital drug safety surveillance: monitoring pharmaceutical products in twitter. Drug Saf 37(5):343–350
Graham DJ, Campen D, Hui R, Spence M, Cheetham C, Levy G, Shoor S, Ray WA (2005) Risk of acute myocardial infarction and sudden cardiac death in patients treated with cyclo-oxygenase 2 selective and non-selective non-steroidal anti-inflammatory drugs: nested case–control study. The Lancet 365(9458):475–481
Gurulingappa H, Toldo L, Rajput AM, Kors JA, Taweel A, Tayrouz Y (2013) Automatic detection of adverse events to predict drug label changes using text and data mining techniques. Pharmacoepidemiol Drug Saf 22(11):1189–1194
Hahn U, Cohen KB, Garten Y, Shah NH (2012) Mining the pharmacogenomics literaturea survey of the state of the art. Brief Bioinform 13(4):460–494
Harpaz R, Haerian K, Chase HS, Friedman C (2010) Statistical mining of potential drug interaction adverse effects in FDAS spontaneous reporting system. In: AMIA annual symposium proceedings, vol 2010. American Medical Informatics Association, p 281
Harpaz R, DuMouchel W, Shah NH, Madigan D, Ryan P, Friedman C (2012) Novel data-mining methodologies for adverse drug event discovery and analysis. Clin Pharmacol Ther 91(6):1010–1021
Huynh T, He Y, Willis A, Rüger S (2016) Adverse drug reaction classification with deep neural networks. COLING
Jiang L, Yang CC, Li J (2013) Discovering consumer health expressions from consumer-contributed content. In: SBP. Springer, Berlin, pp 164–174
Jonnagaddala J, Jue TR, Dai H (2016) Binary classification of twitter posts for adverse drug reactions. In: Proceedings of the social media mining shared task workshop at the pacific symposium on biocomputing, Big Island, HI, USA, pp 4–8
Karimi S, Kim S, Cavedon L (2011) Drug side-effects: What do patient forums reveal. In: The second international workshop on Web science and information exchange in the medical Web. ACM, pp 10–11
Leaman R, Wojtulewicz L, Sullivan R, Skariah A, Yang J, Gonzalez G (2010) Towards internet-age pharmacovigilance: extracting adverse drug reactions from user posts to health-related social networks. In: Proceedings of the 2010 workshop on biomedical natural language processing. Association for Computational Linguistics, pp 117–125
Lee K, Qadir A, Hasan SA, Datla V, Prakash A, Liu J, Farri O (2017) Adverse drug event detection in tweets with semi-supervised convolutional neural networks. In: Proceedings of the 26th international conference on World Wide Web. International World Wide Web Conferences Steering Committee, pp 705–714
Li YA (2011) Medical data mining: improving information accessibility using online patient drug reviews. PhD thesis, Massachusetts Institute of Technology
Liu X, Chen H (2013) Azdrugminer: an information extraction system for mining patient-reported adverse drug events in online patient forums. In: International conference on smart health. Springer, Berlin, pp 134–150
Liu X, Liu J, Chen H (2014) Identifying adverse drug events from health social media: a case study on heart disease discussion forums. In: International conference on smart health. Springer, Berlin, pp 25–36
Nikfarjam A, Gonzalez GH (2011) Pattern mining for extraction of mentions of adverse drug reactions from user comments. In: AMIA annual symposium proceedings, vol 2011. American Medical Informatics Association, p 1019
Nikfarjam A, Sarker A, OConnor K, Ginn R, Gonzalez G (2015) Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features. J Am Med Inform Assoc 22(3):671–681
Pandey C, Ibrahim Z, Wu H, Iqbal E, Dobson R (2017) Improving RNN with attention and embedding for adverse drug reactions. In: Proceedings of the 2017 international conference on digital health. ACM, pp 67–71
Sampathkumar H, Xw Chen, Luo B (2014) Mining adverse drug reactions from online healthcare forums using hidden Markov model. BMC Med Inform Decis Mak 14(1):91
Sarker A, Gonzalez G (2015) Portable automatic text classification for adverse drug reaction detection via multi-corpus training. J Biomed Inform 53:196–207
Scheiber J, Jenkins JL, Sukuru SCK, Bender A, Mikhailov D, Milik M, Azzaoui K, Whitebread S, Hamon J, Urban L et al (2009) Mapping adverse drug reactions in chemical space. J Med Chem 52(9):3103–3107
Sharif H, Zaffar F, Abbasi A, Zimbra D (2014) Detecting adverse drug reactions using a sentiment classification framework. In: SocialCom, Academy of Science and Engineering (ASE), USA, ASE 2014
Sohn S, Kocher JPA, Chute CG, Savova GK (2011) Drug side effect extraction from clinical narratives of psychiatry and psychology patients. J Am Med Inform Assoc 18(Supplement-1):i144–i149
Trotti A, Colevas AD, Setser A, Rusch V, Jaques D, Budach V, Langer C, Murphy B, Cumberlin R, Coleman CN et al (2003) Ctcae v3. 0: development of a comprehensive grading system for the adverse effects of cancer treatment. Semin Radiat Oncol 13:176–181
Wang W, Haerian K, Salmasian H, Harpaz R, Chase H, Friedman C (2011) A drug-adverse event extraction algorithm to support pharmacovigilance knowledge mining from pubmed citations. In: AMIA annual symposium proceedings, vol 2011. American Medical Informatics Association, p 1464
Wang F, Zhang P, Cao N, Hu J, Sorrentino R (2014) Exploring the associations between drug side-effects and therapeutic indications. J Biomed Inform 51:15–23
Warrer P, Hansen EH, Juhl-Jensen L, Aagaard L (2012) Using text-mining techniques in electronic patient records to identify ADRs from medicine use. Br J Clin Pharmacol 73(5):674–684
Wu H, Fang H, Stanhope SJ (2012) An early warning system for unrecognized drug side effects discovery. In: Proceedings of the 21st international conference on World Wide Web. ACM, pp 437–440
Wu H, Fang H, Stanhope S et al (2013) Exploiting online discussions to discover unrecognized drug side effects. Methods Inf Med 52(2):152–9
Xiao C, Zhang P, Chaowalitwongse WA, Hu J, Wang F (2017) Adverse drug reaction prediction with symbolic latent Dirichlet allocation. In: Proceedings of the thirty-first AAAI conference on artificial intelligence
Xie L, Li J, Xie L, Bourne PE (2009) Drug discovery using chemical systems biology: identification of the protein–ligand binding network to explain the side effects of CETP inhibitors. PLoS Comput Biol 5(5):e1000387
Yamanishi Y, Pauwels E, Kotera M (2012) Drug side-effect prediction based on the integration of chemical and biological spaces. J Chem Inf Model 52(12):3284–3292
Yang C, Srinivasan P, Polgreen PM (2012a) Automatic adverse drug events detection using letters to the editor. In: AMIA annual symposium proceedings. American Medical Informatics Association, vol 2012, p 1030
Yang CC, Jiang L, Yang H, Tang X (2012b) Detecting signals of adverse drug reactions from health consumer contributed content in social media. In: Proceedings of ACM SIGKDD workshop on health informatics
Yates A, Goharian N (2013) ADRTrace: detecting expected and unexpected adverse drug reactions from user reviews on social media sites. Springer, Berlin
Ye H, Liu Q, Wei J (2014) Construction of drug network based on side effects and its application for drug repositioning. PLoS ONE 9(2):e87864
Yeleswarapu S, Rao A, Joseph T, Saipradeep VG, Srinivasan R (2014) A pipeline to extract drug-adverse event pairs from multiple data sources. BMC Med Inform Decis Mak 14(1):13
Zhang HP, Yu HK, Xiong DY, Liu Q (2003) HHMM-based Chinese lexical analyzer ICTCLAS. In: Proceedings of the second SIGHAN workshop on Chinese language processing, -volume 17. Association for Computational Linguistics, pp 184–187
Acknowledgements
This work has been partially supported by AstraZeneca and NSFC grant 91646205.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Responsible editor: Fei Wang
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A List of 79 drugs studied
A List of 79 drugs studied



Rights and permissions
About this article
Cite this article
Zhang, M., Zhang, M., Ge, C. et al. Automatic discovery of adverse reactions through Chinese social media. Data Min Knowl Disc 33, 848–870 (2019). https://doi.org/10.1007/s10618-018-00610-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-018-00610-2