research-article

Mining Frequency of Drug Side Effects over a Large Twitter Dataset Using Apache Spark

Authors:

Teng-Sheng MohAuthors Info & Claims

ASONAM '17: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017

Pages 915 - 924

https://doi.org/10.1145/3110025.3110110

Published: 31 July 2017 Publication History

Abstract

Despite clinical trials by pharmaceutical companies as well as current FDA reporting systems, there are still drug side effects that have not been caught. To find a larger sample of reports, a possible way is to mine online social media. With its current widespread use, social media such as Twitter has given rise to massive amounts of data, which can be used as reports for drug side effects. To process these large datasets, Apache Spark has become popular for fast, distributed batch processing. In this work, we have improved on previous pipelines in sentimental analysis-based mining, processing, and extracting tweets with drug-caused side effects. We have also added a new ensemble classifier using a combination of sentiment analysis features to increase the accuracy of identifying drug-caused side effects. In addition, the frequency count for the side effects is also provided. Furthermore, we have also implemented the same pipeline in Apache Spark to improve the speed of processing of tweets by 2.5 times, as well as to support the process of large tweet datasets. As the frequency count of drug side effects opens a wide door for further analysis, we present a preliminary study on this issue, including the side effects of simultaneously using two drugs, and the potential danger of using less-common combination of drugs. We believe the pipeline design and the results present in this work would have great implication on studying drug side effects and on big data analysis in general.

References

[1]

FDA Adverse Event Reporting System (FAERS). Last Retrieved on December 15th, 2016 from http://www.fda.gov/Drugs/GuidanceComplianceRegulatoryInformation/Surveillance/AdverseDrugEffects/default.htm

[2]

L. Wu, T.-S. Moh, and N. Khuri, "Twitter Opinion Mining for Adverse Drug Reactions", Proceedings of the 2015 IEEE International Conference on Big Data (BigData), Santa Clara, California, Oct. 2015, pp.1570--1574.

Digital Library

[3]

F. Yu, M. Moh and T. S. Moh, "Towards Extracting Drug-Effect Relation from Twitter: A Supervised Learning Approach," 2016 IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS), New York, NY, 2016, pp. 339--344.

[4]

Y. Peng, M. Moh, and T. Moh, "Efficient Adverse Drug Event Extraction Using Twitter Sentiment Analysis," Proceedings of the 8th IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), San Francisco, California, Aug. 2016, pp. 1101--1018.

[5]

K. Jiang, Y. Zheng, "Mining Twitter Data for Potential Drug Effects" in Advanced Data Mining and Applications, Springer Berlin Heidelberg, pp. 434--443, 2013.

[6]

A. Agarwal, B. Xie, I. Vovsha, O. Rambow, and R. Passonneau. 2011. Sentiment analysis of Twitter data. In Proceedings of the Workshop on Languages in Social Media (LSM '11). Association for Computational Linguistics, Stroudsburg, PA, USA, 30--38.

[7]

D. Bates, D. Cullen, N. Laird, L. Petersen, S. Small, D. Servi et al. "Incidence of Adverse Drug Events and Potential Adverse Drug Events Implications for Prevention." JAMA. 1995;274(1):29--34.

[8]

R. Banerjee, I. V. Ramakrishnan, M. Henry and M. Perciavalle, "Patient Centered Identification, Attribution, and Ranking of Adverse Drug Events," 2015 International Conference on Healthcare Informatics, Dallas, TX, 2015, pp. 18--27.

Digital Library

[9]

M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, I. Stoica, "Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing". NSDI 2012. April 2012.

[10]

A. R. Aronson. "Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program". Proc AMIA Symposium, pages 17--21, 2001.

[11]

W. B. Cavnar, and J. M. Trenkle, "N-Gram-Based Text Categorization", in Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, NV, pp. 161--175, 1994.

[12]

D. Harnie, A.E. Vapirev, J.K. Wegner, A. Gedich, M. Steijaert; R. Wuyts, W.D. Meuter, "Scaling Machine Learning for Target Prediction in Drug Discovery using Apache Spark," 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Shenzhen, 2015, pp. 871--879.

Digital Library

[13]

Tweepy (An easy-to-use Python library for accessing the Twitter API). Last Retrieved on December 15, 2016 from http://www.tweepy.org

[14]

"Popular Drugs" from Drug Index A to Z. Last Retrieved on December 14, 2016 from https://www.drugs.com/drug_information.html

[15]

NLTK (Nature Language Tool Kit). Last Retrieved on December 15, 2016 from www.nltk.org

[16]

Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. "SENTIWORDNET 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining". LREC Conference, 2015.

[17]

K. Toutanova, D. Klein, C.D. Manning, and Y. Singer, "Featurerich part-of-speech tagging with a cyclic dependency network". In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1 (NAACL '03), Vol. 1. Association for Computational Linguistics, Stroudsburg, PA, USA, 173--180. 2003.

Digital Library

[18]

Finn Årup Nielsen. "A new ANEW: Evaluation of a word list for sentiment analysis in microblogs", Proceedings of the ESWC2011 Workshop on 'Making Sense of Microposts': Big things come in small packages 718 in CEUR Workshop Proceedings, pp. 93--98. May 2011.

[19]

Lingjia Deng and Janyce Wiebe (2015). "MPQA 3.0: An Entity/Event-Level Sentiment Corpus". NAACL-HLT, 2015.

[20]

B. Liu. "Sentiment Analysis: mining opinions, sentiments, and emotions". Cambridge University Press, 2015. Last Retrieved on December 21, 2016 from https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html

[21]

N. Tabassum and T. Ahmed, "A theoretical study on classifier ensemble methods and its applications," 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, 2016, pp. 374--378.

[22]

Bodenreider, Olivier; Hole, William T.; Humphreys, Betsy, L.; Roth, Laura, A.; Srinivasan, Suresh. "Customizing the UMLS Metathesaurus for your Applications". Proc AMIA Symposium. Nov. 2002.

[23]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel et al. "Scikit-learn: Machine Learning in Python", JMLR 12, pp. 2825--2830, 2011.

Digital Library

[24]

X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu et al. "MLlib: Machine Learning in Apache Spark", J. Mach. Learn. Res. 17, 1 (January 2016), 1235--1241. 2016.

[25]

Pyspark (Spark Python API). Last Retrieved on December 21, 2016 from http://spark.apache.org/docs/latest/api/python/index.html

[26]

C. Burges. "A Tutorial on Support Vector Machines for Pattern Recognition". Data Mining and Knowledge Discovery 2, pp. 121--167, 1998.

Digital Library

Cited By

Wessel DPogrebnyakov N(2024)Using Social Media as a Source of Real-World Data for Pharmaceutical Drug Development and Regulatory Decision MakingDrug Safety10.1007/s40264-024-01409-547:5(495-511)Online publication date: 6-Mar-2024
https://doi.org/10.1007/s40264-024-01409-5
K VSamuel PKrishna BJ M(2023)Exploration of sentiment analysis in twitter propaganda: a deep diveMultimedia Tools and Applications10.1007/s11042-023-17383-683:15(44729-44751)Online publication date: 19-Oct-2023
https://doi.org/10.1007/s11042-023-17383-6
Denecke KDenecke K(2023)Machine Learning-Based Sentiment Analysis ApproachesSentiment Analysis in the Medical Domain10.1007/978-3-031-30187-2_11(71-78)Online publication date: 24-Mar-2023
https://doi.org/10.1007/978-3-031-30187-2_11
Show More Cited By

Mining Frequency of Drug Side Effects over a Large Twitter Dataset Using Apache Spark
1. Computing methodologies
  1. Artificial intelligence
2. Information systems
  1. Information systems applications

Recommendations

An Apache Spark implementation for graph-based hashtag sentiment classification on Twitter
PCI '18: Proceedings of the 22nd Pan-Hellenic Conference on Informatics

Sentiment Analysis has been extensively investigated in recent years as a method of human emotions' classification to specific events, products, services etc. It is considered as a very important problem, especially for organizations or companies who ...
Twitter Data Classification Using Big Data Technologies
ICIEB '18: Proceedings of the 2018 1st International Conference on Internet and e-Business

Tweets classification or in general the classification of the social network's data is a recent field of scientific research, where researchers look for new methods to classify users data (tweets, Facebook's post...) into classes (positive, negative, ...
A comparative between hadoop mapreduce and apache Spark on HDFS
IML '17: Proceedings of the 1st International Conference on Internet of Things and Machine Learning

Data is growing now in a very high speed with a large volume, Spark and MapReduce¹ both provide a processing model for analyzing and managing this large data -Big Data- stored on HDFS. In this paper, we discuss a comparative between Apache Spark and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASONAM '17: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017

July 2017

698 pages

ISBN:9781450349932

DOI:10.1145/3110025

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 July 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ASONAM '17

Sponsor:

SIGKDD

ASONAM '17: Advances in Social Networks Analysis and Mining 2017

July 31 - August 3, 2017

Sydney, Australia

Acceptance Rates

Overall Acceptance Rate 116 of 549 submissions, 21%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
159
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)0

Reflects downloads up to 06 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wessel DPogrebnyakov N(2024)Using Social Media as a Source of Real-World Data for Pharmaceutical Drug Development and Regulatory Decision MakingDrug Safety10.1007/s40264-024-01409-547:5(495-511)Online publication date: 6-Mar-2024
https://doi.org/10.1007/s40264-024-01409-5
K VSamuel PKrishna BJ M(2023)Exploration of sentiment analysis in twitter propaganda: a deep diveMultimedia Tools and Applications10.1007/s11042-023-17383-683:15(44729-44751)Online publication date: 19-Oct-2023
https://doi.org/10.1007/s11042-023-17383-6
Denecke KDenecke K(2023)Machine Learning-Based Sentiment Analysis ApproachesSentiment Analysis in the Medical Domain10.1007/978-3-031-30187-2_11(71-78)Online publication date: 24-Mar-2023
https://doi.org/10.1007/978-3-031-30187-2_11
Lee JKwon H(2022)Large-scale digital forensic investigation for Windows registry on Apache SparkPLOS ONE10.1371/journal.pone.026741117:12(e0267411)Online publication date: 7-Dec-2022
https://doi.org/10.1371/journal.pone.0267411
Bazzaz Abkenar SHaghi Kashani MMahdipour EMahdi Jameii S(2020)Big data analytics meets social media: A systematic review of techniques, open issues, and future directionsTelematics and Informatics10.1016/j.tele.2020.101517(101517)Online publication date: Oct-2020
https://doi.org/10.1016/j.tele.2020.101517
Pradhan IPotika KEirinaki MPotikas PDesai BAnagnostopoulos DManolopoulos YNikolaidou M(2019)Exploratory data analysis and crime prediction for smart citiesProceedings of the 23rd International Database Applications & Engineering Symposium10.1145/3331076.3331114(1-9)Online publication date: 10-Jun-2019
https://dl.acm.org/doi/10.1145/3331076.3331114
Pappa DStergioulas L(2019)Harnessing social media data for pharmacovigilance: a review of current state of the art, challenges and future directionsInternational Journal of Data Science and Analytics10.1007/s41060-019-00175-3Online publication date: 12-Feb-2019
https://doi.org/10.1007/s41060-019-00175-3
Fan CHao HLeung CSun LTran J(2018)Social Network Mining for Recommendation of Friends Based on Music Interests2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)10.1109/ASONAM.2018.8508262(833-840)Online publication date: Aug-2018
https://doi.org/10.1109/ASONAM.2018.8508262
Moh MMoh TPeng YWu L(2017)On adverse drug event extractions using twitter sentiment analysisNetwork Modeling Analysis in Health Informatics and Bioinformatics10.1007/s13721-017-0159-46:1Online publication date: 18-Sep-2017
https://doi.org/10.1007/s13721-017-0159-4

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents