skip to main content
10.1145/3110025.3110114acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Medical Persona Classification in Social Media

Published: 31 July 2017 Publication History

Abstract

Identifying medical persona from a social media post is of paramount importance for drug marketing and pharmacovigilance. In this work, we propose multiple approaches to infer the medical persona associated with a social media post. We pose this as a supervised multi-label text classification problem. The main challenge is to identify the hidden cues in a post that are indicative of a particular persona. We first propose a large set of manually engineered features for this task. Further, we propose multiple neural network based architectures to extract useful features from these posts using pre-trained word embeddings. Our experiments on thousands of blogs and tweets show that the proposed approach results in 7% and 5% gain in F-measure over manual feature engineering based approach for blogs and tweets respectively.

References

[1]
A. R. Aronson. Effective Mapping of Biomedical Text to the UMLS Metathesaurus: the MetaMap Program. In Proceedings of the AMIA Symposium, pages 17--21. American Medical Informatics Association, 2001.
[2]
N. R. Asheghi, K. Markert, and S. Sharoff. Semi-supervised Graph-based Genre Classification for Web Pages. TextGraphs-9, pages 39--47, 2014.
[3]
K. ckecke and W. Nejdl. How valuable is Medical Social Media data? Content Analysis of the Medical Web. Information Sciences, 179(12):1870--1880, 2009.
[4]
O. De Vel, A. M. Anderson, M. W. Corney, and G. M. Mohay. Multi-topic E-Mail Authorship Attribution Forensics. In Proceedings of the ACM Conference on Computer Security - Workshop on Data Mining for Security Applications. ACM, 2001.
[5]
K. Denecke. Social Media Data For Healthcare. In Y. Zhang, editor, Health Web Science, chapter 6, pages 33--49. Springer, 2015.
[6]
G. Eysenbach. Medicine 2.0: Social Networking, Collaboration, Participation, Aomediation, and Openness. Journal of medical Internet research, 10(3):e22, 2008.
[7]
S. Gopal and Y. Yang. Multilabel Classification with Meta-level Features. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 315--322. ACM, 2010.
[8]
C. Hawn. Take two Aspirin and Tweet me in the Morning: how Twitter, Facebook, and other Social Media are Reshaping Health Care. Health affairs, 28(2):361--368, 2009.
[9]
S. Hochreiter and J. Schmidhuber. Long Short-Term Memory. Neural computation, 9(8):1735--1780, 1997.
[10]
J. Houvardas and E. Stamatatos. N-gram Feature Selection for Authorship Identification. In International Conference on Artificial Intelligence: Methodology, Systems, and Applications, pages 77--86. Springer, 2006.
[11]
A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov. Bag of Tricks for Efficient Text Classification. arXiv preprint arXiv:1607.01759, 2016.
[12]
B. Kessler, G. Numberg, and H. Schütze. Automatic Detection of Text Genre. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics, pages 32--38. Association for Computational Linguistics, 1997.
[13]
D. Kingma and J. Ba. Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980, 2014.
[14]
M. J. Kusner, Y. Sun, N. I. Kolkin, K. Q. Weinberger, et al. From Word Embeddings To Document Distances. In ICML, volume 15, pages 957--966, 2015.
[15]
R. Layton, P. Watters, and R. Dazeley. Authorship Attribution for Twitter in 140 Characters or Less. In Cybercrime and Trustworthy Computing Workshop (CTC), 2010 Second, pages 1--8. IEEE, 2010.
[16]
N. Limsopatham and N. Collier. Adapting Phrase-based Machine Translation to Normalise Medical Terms in Social Media Messages. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17--21, 2015, pages 1675--1680, 2015.
[17]
N. Limsopatham and N. Collier. Normalising Medical Concepts in Social Media Texts by Learning Semantic Representation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, volume 1, pages 1014--1023, 2016.
[18]
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed Representations of Words and Phrases and their Compositionality. In Advances in neural information processing systems, pages 3111--3119, 2013.
[19]
J. Mitchell and M. Lapata. Vector-based Models of Semantic Composition. In ACL, pages 236--244, 2008.
[20]
K. OConnor, P. Pimpalkhute, A. Nikfarjam, R. Ginn, K. L. Smith, and G. Gonzalez. Pharmacovigilance on Twitter? Mining Tweets for Adverse Drug Reactions. In AMIA annual symposium proceedings, volume 2014, page 924. American Medical Informatics Association, 2014.
[21]
J. Pennington, R. Socher, and C. D. Manning. Glove: Global Vectors for Word Representation. In EMNLP, volume 14, pages 1532--1543, 2014.
[22]
D. Pritsos and E. Stamatatos. The Impact of Noise in Web Genre Identification. In International Conference of the Cross-Language Evaluation Forum for European Languages, pages 268--273. Springer, 2015.
[23]
S. Pyysalo, F. Ginter, H. Moen, T. Salakoski, and S. Ananiadou. Distributional Semantics Resources for Biomedical Text Processing. Proceedings of Languages in Biology and Medicine, 2013.
[24]
T. N. Rubin, A. Chambers, P. Smyth, and M. Steyvers. Statistical Topic Models for Multi-Label Document Classification. Machine learning, 88(1-2):157--208, 2012.
[25]
S. Sharoff, Z. Wu, and K. Markert. The Web Library of Babel: Evaluating Genre Collections. In LREC, 2010.
[26]
E. Stamatatos. A Survey of Modern Authorship Attribution Methods. Journal of the American Society for information Science and Technology, 60(3):538--556, 2009.
[27]
V. Vidulin, M. Lustrek, and M. Gams. Multi-Label Approaches to Web Genre Identification. JLCL, 24(1):97--114, 2009.
[28]
J. Wang, L.-C. Yu, K. R. Lai, and X. Zhang. Dimensional Sentiment Analysis using a Regional CNN-LSTM Model. In The 54th Annual Meeting of the Association for Computational Linguistics, volume 225, 2016.

Cited By

View all
  • (2022)SECNLPJournal of Biomedical Informatics10.1016/j.jbi.2019.103323101:COnline publication date: 21-Apr-2022
  • (2022)Leveraging Wikipedia Knowledge for Distant Supervision in Medical Concept NormalizationExperimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-031-13643-6_3(33-47)Online publication date: 25-Aug-2022
  • (2021)Categorizing Sexism and Misogyny through Neural ApproachesACM Transactions on the Web10.1145/345718915:4(1-31)Online publication date: 14-Jun-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASONAM '17: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017
July 2017
698 pages
ISBN:9781450349932
DOI:10.1145/3110025
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 July 2017

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ASONAM '17
Sponsor:

Acceptance Rates

Overall Acceptance Rate 116 of 549 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)21
  • Downloads (Last 6 weeks)3
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2022)SECNLPJournal of Biomedical Informatics10.1016/j.jbi.2019.103323101:COnline publication date: 21-Apr-2022
  • (2022)Leveraging Wikipedia Knowledge for Distant Supervision in Medical Concept NormalizationExperimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-031-13643-6_3(33-47)Online publication date: 25-Aug-2022
  • (2021)Categorizing Sexism and Misogyny through Neural ApproachesACM Transactions on the Web10.1145/345718915:4(1-31)Online publication date: 14-Jun-2021
  • (2021)The Use of Persona Towards Human-Centered Design in Health Field: Review of Types and Technologies2021 International Conference on e-Health and Bioengineering (EHB)10.1109/EHB52898.2021.9657744(1-4)Online publication date: 18-Nov-2021
  • (2020)Deep Contextualized Medical Concept Normalization in Social Media TextProcedia Computer Science10.1016/j.procs.2020.04.145171(1353-1362)Online publication date: 2020
  • (2020)Evolving dictionary based sentiment scoring framework for patient authored textEvolutionary Intelligence10.1007/s12065-020-00366-zOnline publication date: 18-Feb-2020
  • (2018)Predictive Analysis on Twitter: Techniques and ApplicationsEmerging Research Challenges and Opportunities in Computational Social Network Analysis and Mining10.1007/978-3-319-94105-9_4(67-104)Online publication date: 18-Sep-2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media