Lexicon-Based System for Drug Abuse Entity Extraction from Twitter

Jenhani, Ferdaous; Gouider, Mohamed Salah; Said, Lamjed Ben

doi:10.1007/978-3-319-34099-9_54

Ferdaous Jenhani¹⁵,
Mohamed Salah Gouider¹⁵ &
Lamjed Ben Said¹⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 613))

Included in the following conference series:

1232 Accesses
4 Citations

Abstract

Drug abuse and addiction is a serious healthcare problem and social phenomenon that has not received the interest deserved in scientific research due to the lack of information. Today, social media have become an ubiquitous source of information in this field since they are the environment on which addicted individuals rely to talk about their dependencies. However, extracting salient information from social media is a difficult task regarding their noisy, dynamic and unstructured character. In addition, natural language processing tools (NLP) are not conceived to manage social data and cannot extract semantic and domain-specific entities.

In this paper, we propose a framework for real time collection and analysis of Twitter data which heart is a personalized NLP process for the extraction of drug abuse information. We extend Stanford CoreNLP pipeline with a customized annotator based on fuzzy matching with drug abuse and addiction lexicons in a dictionary. Our system, ran on 86 041 tweets, achieved 82 % of accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
www.Twitter.com.
2.
https://www.nlm.nih.gov/research/umls/.
3.
http://knoesis-hpco.cs.wright.edu/predose/ontologies/DAO.owl.
4.
http://consumerhealthvocab.org/.
5.
http://www.fda.gov/.
6.
http://www.aemps.gob.es/cima.
7.
http://www.meddra.org/.
8.
http://cs.nyu.edu/faculty/grishman/muc6.
9.
http://www.drugabuse.gov.
10.
http://www.who.int.
11.
http://www.drugabuse.gov/.
12.
http://www.drugbank.ca/.
13.
https://www.noslang.com/drugs/dictionary.
14.
http://consumerhealthvocab.org.

References

Abboute, A., Boudjeriou, Y., Entringer, G., Azé, J., Bringay, S., Poncelet, P.: Mining twitter for suicide prevention. In: Métais, E., Roche, M., Teisseire, M. (eds.) NLDB 2014. LNCS, vol. 8455, pp. 250–253. Springer, Heidelberg (2014)
Google Scholar
Abeed, S., Graciela, G.: Portable automatic text classification for adverse drug reaction detection via multi-corpus training. J. Biomed. Inf. 53, 196–207 (2014)
Google Scholar
Abeed, S., Rachel, G., Azadeh, N., Karen, O., Karen, S., Swetha, J., Tejaswi, U., Graciela, G.: Utilizing social media data for pharmacovigilance: a review. J. Biomed. Inf. 54, 202–212 (2015)
Article Google Scholar
Achrekar, H., Gandhe, A., Lazarus, R., Yu, S., Liu, B.: Twitter improves seasonal influenza prediction (2012)
Google Scholar
Aramaki, E., Maskwa, S., Morita, M.: Twitter catches the flu: detecting influenza epidemics using twitter. In: Proceedings of 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, Scotland, pp. 1568–1576 (2011)
Google Scholar
Aronson, A.: Effective mapping of biomedical text to the umls metathesaurus: the metamap program. In: Proceedings of the AMIA (2001)
Google Scholar
Carbonell, P., Mayer, M., Bravo, À.: Exploring brand-name drug mentions on twitter for pharmacovigilance. In: Digital Healthcare Empowering Europeans 2015 European Federation for Medical Informatics (EFMI), pp. 55–59 (2015)
Google Scholar
Corley, C.D., Cook, D.J., Mikler, A.R., Singh, K.P.: Using web and social media for influenza surveillance. In: Arabnia, H.R. (ed.) Advances in Computational Biology. Advances in Experimental Medicine and Biology, vol. 680, pp. 559–564. Springer, New York (2010)
Chapter Google Scholar
Culotta, A.: Toward detecting influenza epidemics by analyzing twitter messages. In: First Workshop on Social Media Analysis (SOMA 2010), Washington, USA (2010)
Google Scholar
De Choudhury, M., Gamon, M., Counts, S., Horvitz, E.: Predicting depression via social media. In: Association for the Advancement of Artificial Intelligence (2013)
Google Scholar
De Coster, X., De Groote, C., Destin, A., Deville, P.: Mahalanobis distance, jaro-winkler distance and ndollar in usigesture (2012)
Google Scholar
Delroy, C., Gary, A., Raminta, D., Amit, P., Drashti, D., Lu, C., Gaurish, A., Robert, C., Kera, Z., Russel, F.: PREDOSE: a semantic web platform for drug abuse epidemiology using social media. J. Biomed. Inf. 46(6), 985–997 (2013)
Article Google Scholar
Dredze, M.: How social media will change public health. IEEE Intell. Syst. 27(4), 81–84 (2012). IEEE Computer Society
Article Google Scholar
Lee, K., Agrawal, A., Choudhary, A.: Real time disease surveillance using twitter data: demonstration on flu and cancer. In: KDD 2013, Chicago Illinois, USA (2013)
Google Scholar
Leon, D., Diana, M., Giuseppe, R., van Marieke, E., Genevieve, G., Raphal, T., Johann, P., Kalina, B.: Analysis of named entity recognition and linking for tweets. Inf. Process. Manage. 51, 32–49 (2015)
Article Google Scholar
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The stanford corenlp natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)
Google Scholar
Metke-Jimenez, A., Karimi, S.: Concept extraction to identify adverse drug reactions in medical forums: a comparaison of algorithms (2015)
Google Scholar
Paul, M., Dredz, M.: You are what you tweet: analyzing twitter for public health. In: Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media (2011)
Google Scholar
Piskorski, J., Yangarber, R.: Information extraction: Past, present and future. In: Poibeau, T., Saggion, H., Piskorski, J., Yangarber, R. (eds.) Multi-source, Multilingual Information Extraction and Summarization. Theory and Applications of Natural Language Processing, pp. 23–49. Springer, Heidelberg (2013)
Chapter Google Scholar
Sadilek, A., Kautz, H., Silenzio, V.: Modeling spread of disease from social interactions. In: Proceedings of the Sixth International AAAI Conference on Weblogs and Social Media (2012)
Google Scholar
Sadilek, A., Kautz, H., Silenzio, V.: Predicting disease transmission from geo tagged micro blog data. In: Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, pp. 136–142 (2012)
Google Scholar
Savova, G., Bethard, S., Styler, W., Martin, J., Palmer, M., Masanz, J., Ward, W.: Towards temporal relation discovery from the clinical narrative. In: Proceedings of AMIA Annual Symposium (2009)
Google Scholar
Segua-Bedmar, I., Martinez, P., Revert, R., Moreno-Shneider, J.: Exploring spanish health social media for detecting drug effects. Med. Inf. Decis. Making 15, S6 (2015). From Louhi 2014: The Fifth International Workshop on Health Text Mining and Information Analysis. Gothenburg, Sweden
Article Google Scholar
Zirikly, A., Diab, M.: Named entity recognition for arabic social media. In: Proceedings of NAACL-HLT, pp. 176–185, Denver, Colorado (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

SOIE Laboratory, Institut Suprieur de Gestion de Tunis, Rue de la Liberte. Bouchoucha, Tunis, Tunisia
Ferdaous Jenhani, Mohamed Salah Gouider & Lamjed Ben Said

Authors

Ferdaous Jenhani
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Salah Gouider
View author publications
You can also search for this author in PubMed Google Scholar
Lamjed Ben Said
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ferdaous Jenhani .

Editor information

Editors and Affiliations

Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Stanisław Kozielski
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Dariusz Mrozek
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Paweł Kasprowski
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Bożena Małysiak-Mrozek
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Daniel Kostrzewa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jenhani, F., Gouider, M.S., Said, L.B. (2016). Lexicon-Based System for Drug Abuse Entity Extraction from Twitter. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Advanced Technologies for Data Mining and Knowledge Discovery. BDAS BDAS 2015 2016. Communications in Computer and Information Science, vol 613. Springer, Cham. https://doi.org/10.1007/978-3-319-34099-9_54

Download citation

DOI: https://doi.org/10.1007/978-3-319-34099-9_54
Published: 28 April 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-34098-2
Online ISBN: 978-3-319-34099-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics