Skip to main content

Lexicon-Based System for Drug Abuse Entity Extraction from Twitter

  • Conference paper
  • First Online:
Beyond Databases, Architectures and Structures. Advanced Technologies for Data Mining and Knowledge Discovery (BDAS 2015, BDAS 2016)

Abstract

Drug abuse and addiction is a serious healthcare problem and social phenomenon that has not received the interest deserved in scientific research due to the lack of information. Today, social media have become an ubiquitous source of information in this field since they are the environment on which addicted individuals rely to talk about their dependencies. However, extracting salient information from social media is a difficult task regarding their noisy, dynamic and unstructured character. In addition, natural language processing tools (NLP) are not conceived to manage social data and cannot extract semantic and domain-specific entities.

In this paper, we propose a framework for real time collection and analysis of Twitter data which heart is a personalized NLP process for the extraction of drug abuse information. We extend Stanford CoreNLP pipeline with a customized annotator based on fuzzy matching with drug abuse and addiction lexicons in a dictionary. Our system, ran on 86 041 tweets, achieved 82 % of accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    www.Twitter.com.

  2. 2.

    https://www.nlm.nih.gov/research/umls/.

  3. 3.

    http://knoesis-hpco.cs.wright.edu/predose/ontologies/DAO.owl.

  4. 4.

    http://consumerhealthvocab.org/.

  5. 5.

    http://www.fda.gov/.

  6. 6.

    http://www.aemps.gob.es/cima.

  7. 7.

    http://www.meddra.org/.

  8. 8.

    http://cs.nyu.edu/faculty/grishman/muc6.

  9. 9.

    http://www.drugabuse.gov.

  10. 10.

    http://www.who.int.

  11. 11.

    http://www.drugabuse.gov/.

  12. 12.

    http://www.drugbank.ca/.

  13. 13.

    https://www.noslang.com/drugs/dictionary.

  14. 14.

    http://consumerhealthvocab.org.

References

  1. Abboute, A., Boudjeriou, Y., Entringer, G., Azé, J., Bringay, S., Poncelet, P.: Mining twitter for suicide prevention. In: Métais, E., Roche, M., Teisseire, M. (eds.) NLDB 2014. LNCS, vol. 8455, pp. 250–253. Springer, Heidelberg (2014)

    Google Scholar 

  2. Abeed, S., Graciela, G.: Portable automatic text classification for adverse drug reaction detection via multi-corpus training. J. Biomed. Inf. 53, 196–207 (2014)

    Google Scholar 

  3. Abeed, S., Rachel, G., Azadeh, N., Karen, O., Karen, S., Swetha, J., Tejaswi, U., Graciela, G.: Utilizing social media data for pharmacovigilance: a review. J. Biomed. Inf. 54, 202–212 (2015)

    Article  Google Scholar 

  4. Achrekar, H., Gandhe, A., Lazarus, R., Yu, S., Liu, B.: Twitter improves seasonal influenza prediction (2012)

    Google Scholar 

  5. Aramaki, E., Maskwa, S., Morita, M.: Twitter catches the flu: detecting influenza epidemics using twitter. In: Proceedings of 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, Scotland, pp. 1568–1576 (2011)

    Google Scholar 

  6. Aronson, A.: Effective mapping of biomedical text to the umls metathesaurus: the metamap program. In: Proceedings of the AMIA (2001)

    Google Scholar 

  7. Carbonell, P., Mayer, M., Bravo, À.: Exploring brand-name drug mentions on twitter for pharmacovigilance. In: Digital Healthcare Empowering Europeans 2015 European Federation for Medical Informatics (EFMI), pp. 55–59 (2015)

    Google Scholar 

  8. Corley, C.D., Cook, D.J., Mikler, A.R., Singh, K.P.: Using web and social media for influenza surveillance. In: Arabnia, H.R. (ed.) Advances in Computational Biology. Advances in Experimental Medicine and Biology, vol. 680, pp. 559–564. Springer, New York (2010)

    Chapter  Google Scholar 

  9. Culotta, A.: Toward detecting influenza epidemics by analyzing twitter messages. In: First Workshop on Social Media Analysis (SOMA 2010), Washington, USA (2010)

    Google Scholar 

  10. De Choudhury, M., Gamon, M., Counts, S., Horvitz, E.: Predicting depression via social media. In: Association for the Advancement of Artificial Intelligence (2013)

    Google Scholar 

  11. De Coster, X., De Groote, C., Destin, A., Deville, P.: Mahalanobis distance, jaro-winkler distance and ndollar in usigesture (2012)

    Google Scholar 

  12. Delroy, C., Gary, A., Raminta, D., Amit, P., Drashti, D., Lu, C., Gaurish, A., Robert, C., Kera, Z., Russel, F.: PREDOSE: a semantic web platform for drug abuse epidemiology using social media. J. Biomed. Inf. 46(6), 985–997 (2013)

    Article  Google Scholar 

  13. Dredze, M.: How social media will change public health. IEEE Intell. Syst. 27(4), 81–84 (2012). IEEE Computer Society

    Article  Google Scholar 

  14. Lee, K., Agrawal, A., Choudhary, A.: Real time disease surveillance using twitter data: demonstration on flu and cancer. In: KDD 2013, Chicago Illinois, USA (2013)

    Google Scholar 

  15. Leon, D., Diana, M., Giuseppe, R., van Marieke, E., Genevieve, G., Raphal, T., Johann, P., Kalina, B.: Analysis of named entity recognition and linking for tweets. Inf. Process. Manage. 51, 32–49 (2015)

    Article  Google Scholar 

  16. Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The stanford corenlp natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)

    Google Scholar 

  17. Metke-Jimenez, A., Karimi, S.: Concept extraction to identify adverse drug reactions in medical forums: a comparaison of algorithms (2015)

    Google Scholar 

  18. Paul, M., Dredz, M.: You are what you tweet: analyzing twitter for public health. In: Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media (2011)

    Google Scholar 

  19. Piskorski, J., Yangarber, R.: Information extraction: Past, present and future. In: Poibeau, T., Saggion, H., Piskorski, J., Yangarber, R. (eds.) Multi-source, Multilingual Information Extraction and Summarization. Theory and Applications of Natural Language Processing, pp. 23–49. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  20. Sadilek, A., Kautz, H., Silenzio, V.: Modeling spread of disease from social interactions. In: Proceedings of the Sixth International AAAI Conference on Weblogs and Social Media (2012)

    Google Scholar 

  21. Sadilek, A., Kautz, H., Silenzio, V.: Predicting disease transmission from geo tagged micro blog data. In: Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, pp. 136–142 (2012)

    Google Scholar 

  22. Savova, G., Bethard, S., Styler, W., Martin, J., Palmer, M., Masanz, J., Ward, W.: Towards temporal relation discovery from the clinical narrative. In: Proceedings of AMIA Annual Symposium (2009)

    Google Scholar 

  23. Segua-Bedmar, I., Martinez, P., Revert, R., Moreno-Shneider, J.: Exploring spanish health social media for detecting drug effects. Med. Inf. Decis. Making 15, S6 (2015). From Louhi 2014: The Fifth International Workshop on Health Text Mining and Information Analysis. Gothenburg, Sweden

    Article  Google Scholar 

  24. Zirikly, A., Diab, M.: Named entity recognition for arabic social media. In: Proceedings of NAACL-HLT, pp. 176–185, Denver, Colorado (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ferdaous Jenhani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Jenhani, F., Gouider, M.S., Said, L.B. (2016). Lexicon-Based System for Drug Abuse Entity Extraction from Twitter. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Advanced Technologies for Data Mining and Knowledge Discovery. BDAS BDAS 2015 2016. Communications in Computer and Information Science, vol 613. Springer, Cham. https://doi.org/10.1007/978-3-319-34099-9_54

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-34099-9_54

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-34098-2

  • Online ISBN: 978-3-319-34099-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics