skip to main content
10.1145/3366423.3380198acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Leveraging Sentiment Distributions to Distinguish Figurative From Literal Health Reports on Twitter

Published:20 April 2020Publication History

ABSTRACT

Harnessing data from social media to monitor health events is a promising avenue for public health surveillance. A key step is the detection of reports of a disease (referred to as ‘health mention classification’) amongst tweets that mention disease words. Prior work shows that figurative usage of disease words may prove to be challenging for health mention classification. Since the experience of a disease is associated with a negative sentiment, we present a method that utilises sentiment information to improve health mention classification. Specifically, our classifier for health mention classification combines pre-trained contextual word representations with sentiment distributions of words in the tweet. For our experiments, we extend a benchmark dataset of tweets for health mention classification, adding over 14k manually annotated tweets across diseases. We also additionally annotate each tweet with a label that indicates if the disease words are used in a figurative sense. Our classifier outperforms current SOTA approaches in detecting both health-related and figurative tweets that mention disease words. We also show that tweets containing disease words are mentioned figuratively more often than in a health-related context, proving to be challenging for classifiers targeting health-related tweets.

References

  1. 2017. Public health surveillance. (Sep 2017). https://www.who.int/topics/public_health_surveillance/en/Google ScholarGoogle Scholar
  2. Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2010. Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining.. In Lrec, Vol. 10. 2200–2204.Google ScholarGoogle Scholar
  3. David A Broniatowski, Michael J Paul, and Mark Dredze. 2013. National and local influenza surveillance through Twitter: an analysis of the 2012-2013 influenza epidemic. PloS one 8, 12 (2013), e83672.Google ScholarGoogle ScholarCross RefCross Ref
  4. Liangzhe Chen, KSM Tozammel Hossain, Patrick Butler, Naren Ramakrishnan, and B Aditya Prakash. 2016. Syndromic surveillance of Flu on Twitter using weakly supervised temporal topic models. Data mining and knowledge discovery 30, 3 (2016), 681–710.Google ScholarGoogle Scholar
  5. N Collier, A Kawazoe, L Jin, M Shigematsu, D Dien, 2007. The BioCaster Ontology: A multilingual ontology for infectious disease outbreak surveillance: Rationale, design and challenges. J Lang Resources Eval 40: 405–413. (2007).Google ScholarGoogle ScholarCross RefCross Ref
  6. Mike Conway, John Dowling, and Wendy Chapman. 2011. Developing an application ontology for mining free text clinical reports: the extended syndromic surveillance ontology. In 3rd international workshop on health document text mining and information analysis (LOUHI 2011). Citeseer, 75–82.Google ScholarGoogle Scholar
  7. Glen Coppersmith, Mark Dredze, and Craig Harman. 2014. Quantifying mental health signals in Twitter. In Proceedings of the workshop on computational linguistics and clinical psychology: From linguistic signal to clinical reality. 51–60.Google ScholarGoogle ScholarCross RefCross Ref
  8. Munmun De Choudhury, Michael Gamon, Scott Counts, and Eric Horvitz. 2013. Predicting depression via social media. In Seventh international AAAI conference on weblogs and social media.Google ScholarGoogle Scholar
  9. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805(2018).Google ScholarGoogle Scholar
  10. Manas Gaur, Ugur Kursuncu, Amanuel Alambo, Amit Sheth, Raminta Daniulaityte, Krishnaprasad Thirunarayan, and Jyotishman Pathak. 2018. Let Me Tell You About Your Mental Health!: Contextualized Classification of Reddit Posts to DSM-5 for Web-based Intervention. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. ACM, 753–762.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Rachel Ginn, Pranoti Pimpalkhute, Azadeh Nikfarjam, Apurv Patki, Karen O?Connor, Abeed Sarker, Karen Smith, and Graciela Gonzalez. 2014. Mining Twitter for adverse drug reaction mentions: a corpus and classification benchmark. In Proceedings of the fourth workshop on building and evaluating resources for health and biomedical text processing. Citeseer, 1–8.Google ScholarGoogle Scholar
  12. Alec Go, Richa Bhayani, and Lei Huang. 2009. Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford 1, 12 (2009), 2009.Google ScholarGoogle Scholar
  13. Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. 2013. Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, 6645–6649.Google ScholarGoogle ScholarCross RefCross Ref
  14. Kelly J Henning. 2004. What is syndromic surveillance. Morbidity and mortality weekly report 53, Supplement (2004), 7–11.Google ScholarGoogle Scholar
  15. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Jeremy Howard and Sebastian Ruder. 2018. Universal language model fine-tuning for text classification. arXiv preprint arXiv:1801.06146(2018).Google ScholarGoogle Scholar
  17. Stacey Ivanko and Penny Pexman. 2003. Context Incongruity and Irony Processing. Discourse Processes - DISCOURSE PROCESS 35 (05 2003), 241–279. https://doi.org/10.1207/S15326950DP3503_2Google ScholarGoogle Scholar
  18. Adith Iyer, Aditya Joshi, Sarvnaz Karimi, Ross Sparks, and Cecile Paris. 2019. Figurative Usage Detection of Symptom Words to Improve Personal Health Mention Detection. arXiv preprint arXiv:1906.05466(2019).Google ScholarGoogle Scholar
  19. Keyuan Jiang, Shichao Feng, Qunhao Song, Ricardo A Calix, Matrika Gupta, and Gordon R Bernard. 2018. Identifying tweets of personal health experience through word embedding and LSTM neural network. BMC bioinformatics 19, 8 (2018), 210.Google ScholarGoogle Scholar
  20. Sophie Jordan, Sierra Hovet, Isaac Fung, Hai Liang, King-Wa Fu, and Zion Tse. 2019. Using Twitter for Public Health Surveillance from Monitoring and Prediction to Public Response. Data 4, 1 (2019), 6.Google ScholarGoogle ScholarCross RefCross Ref
  21. Aditya Joshi, Pushpak Bhattacharyya, and Mark J Carman. 2018. Understanding the Phenomenon of Sarcasm. In Investigations in Computational Sarcasm. Springer, 33–57.Google ScholarGoogle Scholar
  22. Aditya Joshi, Sarvnaz Karimi, Ross Sparks, Cécile Paris, and C. Raina Macintyre. 2019. Survey of Text-Based Epidemic Intelligence: A Computational Linguistics Perspective. ACM Comput. Surv. 52, 6, Article Article 119 (Oct. 2019), 19 pages. https://doi.org/10.1145/3361141Google ScholarGoogle Scholar
  23. Aditya Joshi, Vinita Sharma, and Pushpak Bhattacharyya. 2015. Harnessing context incongruity for sarcasm detection. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Vol. 2. 757–762.Google ScholarGoogle Scholar
  24. Aditya Joshi, Vaibhav Tripathi, Kevin Patel, Pushpak Bhattacharyya, and Mark Carman. 2016. Are word embedding-based features useful for sarcasm detection?arXiv preprint arXiv:1610.00883(2016).Google ScholarGoogle Scholar
  25. Shin Kanouchi, Mamoru Komachi, Naoaki Okazaki, Eiji Aramaki, and Hiroshi Ishikawa. 2015. Who caught a cold?-Identifying the subject of a symptom. In Proc. ACL ’15 (Volume 1: Long Papers), Vol. 1. 1660–1670.Google ScholarGoogle ScholarCross RefCross Ref
  26. Payam Karisani and Eugene Agichtein. 2018. Did You Really Just Have a Heart Attack?: Towards Robust Detection of Personal Health Mentions in Social Media. In Proc. WWW ’18. 137–146.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Changsheng Liu and Rebecca Hwa. 2018. Heuristically informed unsupervised idiom usage recognition. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 1723–1731.Google ScholarGoogle ScholarCross RefCross Ref
  28. Nelson F Liu, Matt Gardner, Yonatan Belinkov, Matthew Peters, and Noah A Smith. 2019. Linguistic knowledge and transferability of contextual representations. arXiv preprint arXiv:1903.08855(2019).Google ScholarGoogle Scholar
  29. Christopher Manning, Prabhakar Raghavan, and Hinrich Schütze. 2010. Introduction to information retrieval. Natural Language Engineering 16, 1 (2010), 100–103.Google ScholarGoogle Scholar
  30. Oren Melamud, Jacob Goldberger, and Ido Dagan. 2016. context2vec: Learning generic context embedding with bidirectional lstm. In Proceedings of the 20th SIGNLL conference on computational natural language learning. 51–61.Google ScholarGoogle ScholarCross RefCross Ref
  31. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111–3119.Google ScholarGoogle Scholar
  32. Saif Mohammad. 2018. Obtaining Reliable Human Ratings of Valence, Arousal, and Dominance for 20,000 English Words. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, 174–184. https://doi.org/10.18653/v1/P18-1017Google ScholarGoogle ScholarCross RefCross Ref
  33. Karen O’Connor, Pranoti Pimpalkhute, Azadeh Nikfarjam, Rachel Ginn, Karen L Smith, and Graciela Gonzalez. 2014. Pharmacovigilance on twitter? Mining tweets for adverse drug reactions. In AMIA annual symposium proceedings, Vol. 2014. American Medical Informatics Association, 924.Google ScholarGoogle Scholar
  34. Bridianne O’dea, Stephen Wan, Philip J Batterham, Alison L Calear, Cecile Paris, and Helen Christensen. 2015. Detecting suicidality on Twitter. Internet Interventions 2, 2 (2015), 183–188.Google ScholarGoogle ScholarCross RefCross Ref
  35. Michael J Paul and Mark Dredze. 2012. A model for mining public health topics from Twitter. Health 11, 16-16 (2012), 1.Google ScholarGoogle Scholar
  36. Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532–1543.Google ScholarGoogle ScholarCross RefCross Ref
  37. Matthew Peters, Sebastian Ruder, and Noah A Smith. 2019. To tune or not to tune? adapting pretrained representations to diverse tasks. arXiv preprint arXiv:1903.05987(2019).Google ScholarGoogle Scholar
  38. Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. arXiv preprint arXiv:1802.05365(2018).Google ScholarGoogle Scholar
  39. Matthew E Peters, Mark Neumann, Luke Zettlemoyer, and Wen-tau Yih. 2018. Dissecting contextual word embeddings: Architecture and representation. arXiv preprint arXiv:1808.08949(2018).Google ScholarGoogle Scholar
  40. Ellen Riloff, Ashequl Qadir, Prafulla Surve, Lalindra De Silva, Nathan Gilbert, and Ruihong Huang. 2013. Sarcasm as contrast between a positive sentiment and negative situation. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 704–714.Google ScholarGoogle Scholar
  41. Edward Velasco, Tumacha Agheneza, Kerstin Denecke, Goeran Kirchner, and Tim Eckmanns. 2014. Social media and Internet-Based data in global systems for public health surveillance: A systematic review. The Milbank Quarterly 92, 1 (2014), 7–33.Google ScholarGoogle ScholarCross RefCross Ref
  42. Shoko Wakamiya, Yukiko Kawai, and Eiji Aramaki. 2018. Twitter-based influenza detection after flu peak via tweets with indirect information: text mining study. JMIR public health and surveillance 4, 3 (2018), e65.Google ScholarGoogle Scholar

Index Terms

  1. Leveraging Sentiment Distributions to Distinguish Figurative From Literal Health Reports on Twitter
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            WWW '20: Proceedings of The Web Conference 2020
            April 2020
            3143 pages
            ISBN:9781450370233
            DOI:10.1145/3366423

            Copyright © 2020 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 20 April 2020

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited

            Acceptance Rates

            Overall Acceptance Rate1,899of8,196submissions,23%

            Upcoming Conference

            WWW '24
            The ACM Web Conference 2024
            May 13 - 17, 2024
            Singapore , Singapore

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format