Exploring convolutional neural networks and topic models for user profiling from drug reviews

Tutubalina, Elena; Nikolenko, Sergey

doi:10.1007/s11042-017-5336-z

Exploring convolutional neural networks and topic models for user profiling from drug reviews

Published: 08 November 2017

Volume 77, pages 4791–4809, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

1017 Accesses
17 Citations
3 Altmetric
Explore all metrics

Abstract

Pharmacovigilance, and generally applications of natural language processing models to healthcare, have attracted growing attention over the recent years. In particular, drug reactions can be extracted from user reviews posted on the Web, and automated processing of this information represents a novel and exciting approach to personalized medicine and wide-scale drug tests. In medical applications, demographic information regarding the authors of these reviews such as age and gender is of primary importance; however, existing studies usually either assume that this information is available or overlook the issue entirely. In this work, we propose and compare several approaches to automated mining of demographic information from user-generated texts. We compare modern natural language processing techniques, including extensions of topic models and convolutional neural networks (CNN). We apply single-task and multi-task learning approaches to this problem. Based on a real-world dataset mined from a health-related web site, we conclude that while CNNs perform best in terms of predicting demographic information by jointly learning different user attributes, topic models provide additional information and reflect gender-specific and age-specific symptom profiles that may be of interest for a researcher.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automated Prediction of Demographic Information from Medical User Reviews

Convolutional and Recurrent Neural Networks for Opinion Mining on Drug Reviews

Machine learning in medicine: a practical introduction to natural language processing

Article Open access 31 July 2021

Notes

References

Adams DZ, Gruss R, Abrahams AS (2017) Automated discovery of safety and efficacy concerns for joint & muscle pain relief treatments from online reviews. Int J Med Inform 100:108–120
Article Google Scholar
Alekseev A, Nikolenko SI (2016) Predicting the age of social network users from user-generated texts with word embeddings. In: Artificial intelligence and natural language conference (AINL), IEEE. IEEE, pp 1–11
Alekseyev A, Nikolenko SI (2017) Word embeddings of user profiling in online social networks. Computación y Sistemas 21(2):203–226
Google Scholar
Alimova I, Tutubalina E (2017) Automated detection of adverse drug reactions from social media posts with machine learning. In: Proceedings of international conference on analysis of images, social networks and texts
Arnett JJ (2000) Emerging adulthood: a theory of development from the late teens through the twenties. Am Psychol 55(5):469
Article Google Scholar
Atzori L, Iera A, Morabito G (2010) The internet of things: a survey. Comput Netw 54(15):2787–2805. https://doi.org/10.1016/j.comnet.2010.05.010. https://www.sciencedirect.com/science/article/pii/S1389128610001568
Article MATH Google Scholar
Bardel A, Wallander M-A, Wedel H, Svärdsudd K (2009) Age-specific symptom prevalence in women 35–64 years old: a population-based study. BMC Public Health 9(1):37. https://doi.org/10.1186/1471-2458-9-37
Article Google Scholar
Benton A, Mitchell M, Hovy D (2017) Multitask learning for mental health conditions with limited social media data. In: Proceedings of the 15th conference of the EACL, vol 1, pp 152–162
Biyani P, Caragea C, Mitra P, Zhou C, Yen J, Greer GE, Portier K (2013) Co-training over domain-independent and domain-dependent features for sentiment analysis of an online cancer support community. In: Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and mining. ACM, pp 413–417
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3(4–5):993–1022
MATH Google Scholar
Bui N, Zorzi M (2011) Health care applications: a solution based on the internet of things. In: Proceedings of the 4th international symposium on applied sciences in biomedical and communication technologies, ISABEL ’11. ACM, New York, pp 131:1–131:5, DOI https://doi.org/10.1145/2093698.2093829, (to appear in print)
Burger JD, Henderson J, Kim G, Zarrella G (2011) Discriminating gender on twitter. In: Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 1301–1309
Buzzi MC, Buzzi M, Franchi D, Gazzè D, Iervasi G, Marchetti A, Pingitore A, Tesconi M (2017) Facebook: a new tool for collecting health data? Multimedia Tools and Applications 76(8):10,677–10,700. https://doi.org/10.1007/s11042-015-3190-4
Article Google Scholar
Cambria E, Benson T, Eckl C, Hussain A (2012) Sentic proms: application of sentic computing to the development of a novel unified framework for measuring health-care quality. Expert Syst Appl 39(12):10,533–10,543
Article Google Scholar
Choi S-P, Lee S, Jung H, Song S-K (2014) An intensive case study on kernel-based relation extraction. Multimedia Tools and Applications 71(2)
Chou W-Y S, Hunt YM, Beckjord EB, Moser RP, Hesse BW (2009) Social media use in the united states: implications for health communication. J Med Internet Res 11(4)
Coates J (2015) Women, men and language: a sociolinguistic account of gender differences in language. Routledge, Evanston
Google Scholar
Conway M, O’Connor D (2016) Social media, big data, and mental health: current advances and ethical implications. Current Opinion in Psychology 9:77–82
Article Google Scholar
Correa T, Hinsley AW, De Zuniga HG (2010) Who interacts on the web?: the intersection of users’ personality and social media use. Comput Hum Behav 26 (2):247–253
Article Google Scholar
Coulter A, Ellins J (2006) The quality enhancing interventions project: patient-focused interventions. The Health Foundation, London
Google Scholar
Dang T-T, Ho T-B (2016) Mixture of language models utilization in score-based sentiment classification on clinical narratives. In: International conference on industrial, engineering and other applications of applied intelligent systems. Springer, pp 255–268
del Pilar Salas-Zárate M, Medina-Moreira J, Lagos-Ortiz K, Luna-Aveiga H, Rodríguez-García MÁ, Valencia-García R (2017) Sentiment analysis on tweets about diabetes: an aspect-level approach. Comput Math Methods Med 2017:1–9
Deng Y, Stoehr M, Denecke K (2014) Retrieving attitudes: sentiment analysis from clinical narratives. In: MedIR@ SIGIR, pp 12–15
Deriu J, Lucchi A, De Luca V, Severyn A, Müller S, Cieliebak M, Hofmann T, Jaggi M (2017) Leveraging large amounts of weakly supervised data for multi-language sentiment classification. In: Proceedings of the 26th international conference on world wide web, International world wide web conferences steering committee, pp 1045–1052
Fischer JL (1958) Social influences on the choice of a linguistic variant. Word 14 (1):47–56
Article Google Scholar
Fisher CR (1980) Differences by age groups in health care spending. Health Care Financ Rev 1(4):65
Google Scholar
Gao Z, Li SH, Zhang GT, Zhu YJ, Wang C, Zhang H (2017) Evaluation of regularized multi-task leaning algorithms for single/multi-view human action recognition. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-017-4384-8
Garera N, Yarowsky D (2009) Modeling latent biographic attributes in conversational genres. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP, vol 2. Association for Computational Linguistics, pp 710–718
Glenn F (1981) Surgical management of acute cholecystitis in patients 65 years of age and older. Ann Surg 193(1):56
Article Google Scholar
Griffiths T, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101(Suppl. 1):5228–5335
Article Google Scholar
Harman G, Coppersmith M, Dredze C (2014) Quantifying mental health signals in twitter. ACL 2014:51
Google Scholar
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034
Helmert U, Merzenich H, Bammann K (2001) The association between educational attainment chronic diseases, and cardiovascular disease risk factors in young adults aged 18 to 29 years: results of the federal health survey 1998. SOZIAL-UND PRAVENTIVMEDIZIN 46(5):320–328
Article Google Scholar
Hossain MS, Goebel S, El Saddik A (2015) Guest editorial: advances in multimedia for health. Multimedia Tools and Applications 74(14):5205–5208. https://doi.org/10.1007/s11042-014-2202-0
Article Google Scholar
Karger A (2014) Geschlechtsspezifische aspekte bei depressiven erkrankungen. Bundesgesundheitsbl Gesundheitsforsch Gesundheitsschutz 57(9):1092–1098. https://doi.org/10.1007/s00103-014-2019-z
Article Google Scholar
Kim Y (2014) Convolutional neural networks for sentence classification. arXiv:1408.5882
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
Kotov A (2015) Social media analytics for healthcare. pp 309–340. http://www.crcnetbase.com/doi/abs/10.1201/b18588-11
LeCun Y, Kavukcuoglu K, Farabet C (2010) Convolutional networks and applications in vision. In: Proceedings of 2010 IEEE international symposium on circuits and systems (ISCAS). IEEE, pp 253–256
Liu M, Zhang H, Hu H, Wei W (2017) Topic categorization and representation of health community generated data. Multimedia Tools and Applications 76(8):10,541–10,553
Article Google Scholar
McClellan C, Ali MM, Mutter R, Kroutil L, Landwehr J (2016) Using social media to monitor mental health discussions – evidence from Twitter. J Am Med Inform Assoc p ocw133
Miftakhutdinov Z, Tutubalina E (2017) Kfu at clef ehealth 2017 task 1: Icd-10 coding of english death certificates with recurrent neural networks. CLEF
Miftahutdinov Z, Tutubalina E, Tropsha A (2017) Identifying disease-related expressions in reviews using conditional random fields. Komp’juternaja Lingvistika i Intellektual’nye Tehnologii 1(16):155–166
Google Scholar
Na J-C, Kyaing WYM, Khoo CSG, Foo S, Chang Y-K, Theng Y-L (2012) Sentiment classification of drug reviews using a rule-based linguistic approach. In: International conference on asian digital libraries. Springer, pp 189–198
Nguyen D, Smith NA, Rosé CP (2011) Author age prediction from text using linear regression. In: Proceedings of the 5th ACL-HLT workshop on language technology for cultural heritage, social sciences, and humanities. Association for Computational Linguistics, pp 115–123
Nguyen T, O’Dea B, Larsen M, Phung D, Venkatesh S, Christensen H (2017) Using linguistic and topic analysis to classify sub-groups of online depression communities. Multimedia Tools and Applications 76(8):10,653–10,676. https://doi.org/10.1007/s11042-015-3128-x
Article Google Scholar
Nikolenko SI (2016) Topic quality metrics based on distributed word representations. In: Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval, pp 1029–1032
Ofek N, Caragea C, Rokach L, Biyani P, Mitra P, Yen J, Portier K, Greer G (2013) Improving sentiment analysis in an online cancer survivor community using dynamic sentiment lexicon. In: International conference on social intelligence and technology (SOCIETY), 2013. IEEE, pp 109–113
Pogorelc B, Bosnić Z, Gams M (2012) Automatic recognition of gait-related health problems in the elderly using machine learning. Multimedia Tools and Applications 58(2):333–354. https://doi.org/10.1007/s11042-011-0786-1
Article Google Scholar
Preotiuc-Pietro D, Eichstaedt J, Park G, Sap M, Smith L, Tobolsky V, Schwartz HA, Ungar L (2015) The role of personality, age and gender in tweeting about mental illnesses. In: NAACL HLT, vol 2015, p 21
Pyysalo S, Ginter F, Moen H, Salakoski T, Ananiadou S (2013) Distributional semantics resources for biomedical text processing. In: Proceedings of Languages in Biology and Medicine
Ramage D, Manning CD, Dumais S (2011) Partially labeled topic models for interpretable text mining. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 457–465
Ramtekkar UP, Reiersen AM, Todorov AA, Todd RD (2010) Sex and age differences in attention-deficit/hyperactivity disorder symptoms and diagnoses: implications for dsm-v and icd-11. J Am Acad Child Adolesc Psychiatry 49(3):217–228
Google Scholar
Ranzato M, Hinton G, Lecun Y (2015) Guest editorial: deep learning. Int J Comput Vis 113(1):1–2
Article MathSciNet Google Scholar
Rao D, Yarowsky D, Shreevats A, Gupta M (2010) Classifying latent user attributes in twitter. In: Proceedings of the 2nd international workshop on search and mining user-generated contents. ACM, pp 37–44
Rodrigues RG, das Dores RM, Camilo-Junior CG, Rosa TC (2016) Sentihealth-cancer: a sentiment analysis tool to help detecting mood of patients in online social networks. Int J Med Inform 85(1):80–95
Article Google Scholar
Sarker A, Mollá D, Paris C (2011) Outcome polarity identification of medical papers. In: Proceedings of Australasian language technology association workshop, pp 105–114
Schwartz HA, Eichstaedt JC, Kern ML, Dziurzynski L, Ramones SM, Agrawal M, Shah A, Kosinski M, Stillwell D, Seligman ME et al (2013) Personality, gender, and age in the language of social media: the open-vocabulary approach. PloS one 8(9):e73,791
Article Google Scholar
Sharif H, Zaffar F, Abbasi A, Zimbra D (2014) Detecting adverse drug reactions using a sentiment classification framework. In: Proceedings of the 6th ASE international conference on social computing (SocialCom ’14). Stanford, pp 1–10
Sidana S, Mishra S, Amer-Yahia S, Clausel M, Amini MR (2016) Health monitoring on social media over time. In: Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval, SIGIR ’16. https://doi.org/10.1145/2911451.2914697. ACM, New York, pp 849–852
Slutske WS, Jackson KM, Sher KJ (2003) The natural history of problem gambling from age 18 to 29. J Abnorm Psychol 112(2):263
Article Google Scholar
Snyder PJ, Peachey H, Hannoush P, Berlin JA, Loh L, Lenrow DA, Holmes JH, Dlewati A, Santanna J, Rosen CJ et al (1999) Effect of testosterone treatment on body composition and muscle strength in men over 65 years of age. J Clin Endocrinol Metab 84(8):2647–2653
Google Scholar
Søgaard A, Goldberg Y (2016) Deep multi-task learning with low level tasks supervised at lower layers. In: Proceedings of the 54th annual meeting of the association for computational linguistics, vol 2, pp 231–235
Solovyev V, Ivanov V (2016) Knowledge-driven event extraction in russian: corpus-based linguistic resources. Comput Intell Neurosci 2016:16
Article Google Scholar
Turney P, Littman M (2003) Measuring praise and criticism: inference of semantic orientation from association. http://cogprints.org/3164/
Tutubalina E, Nikolenko S (2015) Inferring sentiment-based priors in topic models. In: Mexican international conference on artificial intelligence. Springer, pp 92–104
Tutubalina E, Nikolenko S (2016) Automated prediction of demographic information from medical user reviews. In: International conference on mining intelligence and knowledge exploration. Springer, pp 174–184
Tutubalina E, Nikolenko SI (2016) Constructing aspect-based sentiment lexicons with topic modeling. In: Proceedings of the 5th international conference on analysis of images, social networks, and texts, pp 208–220
Tutubalina E, Nikolenko S (2017) Demographic prediction based on user reviews about medications. Computación y Sistemas 21(2):227–241
Article Google Scholar
Tutubalina E, Nikolenko SI (2017) Combination of deep recurrent neural networks and conditional random fields for extracting adverse drug reactions from user reviews. Journal of Healthcare Engineering 9451342:2017
Google Scholar
Volkova S, Van Durme B (2014) Inferring user political preferences from streaming communications. In: Proceedings of the association for computational linguistics (ACL)
Xia L, Gentile AL, Munro J, Iria J (2009) Improving patient opinion mining through multi-step classification. In: TSD, vol 5729. Springer, pp 70–76
Yalamanchi D (2011) Sideffective-system to mine patient reviews: sentiment analysis. Ph.D. thesis, Rutgers University-Graduate School-New Brunswick
Yang Z, Kotov A, Mohan A, Lu S (2015) Parametric and non-parametric user-aware sentiment topic models. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 413–422

Download references

Acknowledgements

This work was supported by the Russian Science Foundation grant no. 15-11-10019. The authors are grateful to Prof. Valery Solovyev for his continuous support. The authors also thank Ilseyar Alimova for her suggestions on related work.

Author information

Authors and Affiliations

Kazan (Volga Region) Federal University, Kazan, Russia
Elena Tutubalina & Sergey Nikolenko
Steklov Institute of Mathematics, St. Petersburg, Russia
Sergey Nikolenko

Authors

Elena Tutubalina
View author publications
You can also search for this author in PubMed Google Scholar
Sergey Nikolenko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elena Tutubalina.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tutubalina, E., Nikolenko, S. Exploring convolutional neural networks and topic models for user profiling from drug reviews. Multimed Tools Appl 77, 4791–4809 (2018). https://doi.org/10.1007/s11042-017-5336-z

Download citation

Received: 26 January 2017
Revised: 19 August 2017
Accepted: 20 October 2017
Published: 08 November 2017
Issue Date: February 2018
DOI: https://doi.org/10.1007/s11042-017-5336-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploring convolutional neural networks and topic models for user profiling from drug reviews

Abstract

Access this article

Similar content being viewed by others

Automated Prediction of Demographic Information from Medical User Reviews

Convolutional and Recurrent Neural Networks for Opinion Mining on Drug Reviews

Machine learning in medicine: a practical introduction to natural language processing

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Exploring convolutional neural networks and topic models for user profiling from drug reviews

Abstract

Access this article

Similar content being viewed by others

Automated Prediction of Demographic Information from Medical User Reviews

Convolutional and Recurrent Neural Networks for Opinion Mining on Drug Reviews

Machine learning in medicine: a practical introduction to natural language processing

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation