skip to main content
10.1145/3269206.3271732acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

"Let Me Tell You About Your Mental Health!": Contextualized Classification of Reddit Posts to DSM-5 for Web-based Intervention

Published:17 October 2018Publication History

ABSTRACT

Social media platforms are increasingly being used to share and seek advice on mental health issues. In particular, Reddit users freely discuss such issues on various subreddits, whose structure and content can be leveraged to formally interpret and relate subreddits and their posts in terms of mental health diagnostic categories. There is prior research on the extraction of mental health-related information, including symptoms, diagnosis, and treatments from social media; however, our approach can additionally provide actionable information to clinicians about the mental health of a patient in diagnostic terms for web-based intervention. Specifically, we provide a detailed analysis of the nature of subreddit content from domain expert's perspective and introduce a novel approach to map each subreddit to the best matching DSM-5 (Diagnostic and Statistical Manual of Mental Disorders - 5th Edition) category using multi-class classifier. Our classification algorithm analyzes all the posts of a subreddit by adapting topic modeling and word-embedding techniques, and utilizing curated medical knowledge bases to quantify relationship to DSM-5 categories. Our semantic encoding-decoding optimization approach reduces the false-alarm-rate from 30% to 2.5% over a comparable heuristic baseline, and our mapping results have been verified by domain experts achieving a kappa score of 0.84.

References

  1. Amrudin Agovic and Arindam Banerjee. 2012. Gaussian process topic models. arXiv preprint arXiv:1203.3462 (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Melanie Andresen and Heike Zinsmeister. 2017. Approximating Style by N-gram-based Annotation. In Proceedings of the Workshop on Stylistic Variation .Google ScholarGoogle ScholarCross RefCross Ref
  3. Erik Cambria, Bjorn Schuller, Bing Liu, Haixun Wang, and Catherine Havasi. 2013. Knowledge-based approaches to concept-level sentiment analysis. IEEE intelligent systems (2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Delroy Cameron, Gary A Smith, Raminta Daniulaityte, Amit P Sheth, Drashti Dave, Lu Chen, Gaurish Anand, Robert Carlson, Kera Z Watkins, and Russel Falck. 2013. PREDOSE: a semantic web platform for drug abuse epidemiology using social media. Journal of biomedical informatics (2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. William B Cavnar, John M Trenkle, and others. 1994. N-gram-based text categorization. Ann arbor mi (1994).Google ScholarGoogle Scholar
  6. Chao Chen, Andy Liaw, and Leo Breiman. 2004. Using random forest to learn imbalanced data. University of California, Berkeley (2004).Google ScholarGoogle Scholar
  7. Raminta Daniulaityte, Robert Carlson, Gregory Brigham, Delroy Cameron, and Amit Sheth. 2015. "Sub is a weird drug:" A web-based study of lay attitudes about use of buprenorphine to self-treat opioid withdrawal symptoms. The American journal on addictions (2015).Google ScholarGoogle Scholar
  8. Raminta Daniulaityte, Francois R Lamy, G Alan Smith, Ramzi W Nahhas, Robert G Carlson, Krishnaprasad Thirunarayan, Silvia S Martins, Edward W Boyer, and Amit Sheth. 2017. "Retweet to Pass the Blunt": Analyzing Geographic and Content Features of Cannabis-Related Tweeting Across the United States. Journal of studies on alcohol and drugs (2017).Google ScholarGoogle Scholar
  9. Munmun De Choudhury, Scott Counts, and Mary Czerwinski. 2011. Identifying relevant social media content: leveraging information diversity and user cognition. In Proceedings of the 22nd ACM conference on Hypertext and hypermedia . Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Munmun De Choudhury, Scott Counts, and Eric Horvitz. 2013a. Social media as a measurement tool of depression in populations. In Proceedings of the 5th Annual ACM Web Science Conference . Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Munmun De Choudhury, Michael Gamon, Scott Counts, and Eric Horvitz. 2013b. Predicting depression via social media. ICWSM (2013).Google ScholarGoogle Scholar
  12. Munmun De Choudhury, Emre Kiciman, Mark Dredze, Glen Coppersmith, and Mrinal Kumar. 2016. Discovering shifts to suicidal ideation from mental health content in social media. In Proceedings of the 2016 CHI conference on human factors in computing systems . Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. George Gkotsis, Anika Oellrich, Tim Hubbard, Richard Dobson, Maria Liakata, Sumithra Velupillai, and Rina Dutta. 2016. The language of mental health problems in social media. In Proceedings of the Third Workshop on Computational Lingusitics and Clinical Psychology .Google ScholarGoogle ScholarCross RefCross Ref
  14. George Gkotsis, Anika Oellrich, Sumithra Velupillai, Maria Liakata, Tim JP Hubbard, Richard JB Dobson, and Rina Dutta. 2017. Characterisation of mental health conditions in social media using Informed Deep Learning. Scientific reports (2017).Google ScholarGoogle Scholar
  15. Li Guan, Bibo Hao, Qijin Cheng, Paul SF Yip, and Tingshao Zhu. 2015. Identifying Chinese microblog users with high suicide probability using internet-based profile and linguistic features: classification model. JMIR mental health (2015).Google ScholarGoogle Scholar
  16. Yuening Hu, Jordan Boyd-Graber, Brianna Satinoff, and Alison Smith. 2014. Interactive topic modeling. Machine learning (2014). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Matthew R Jamnik and David J Lane. 2017. The Use of Reddit as an Inexpensive Source for High-Quality Data. Practical Assessment, Research & Evaluation (2017).Google ScholarGoogle Scholar
  18. Elyor Kodirov, Tao Xiang, and Shaogang Gong. 2017. Semantic autoencoder for zero-shot learning. arXiv preprint arXiv:1704.08345 (2017).Google ScholarGoogle Scholar
  19. Bartosz Krawczyk. 2016. Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence (2016).Google ScholarGoogle Scholar
  20. Mrinal Kumar, Mark Dredze, Glen Coppersmith, and Munmun De Choudhury. 2015. Detecting changes in suicide content manifested in social media following celebrity suicides. In Proceedings of the 26th ACM Conference on Hypertext & Social Media . Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ugur Kursuncu, Manas Gaur, Usha Lokala, Krishnaprasad Thirunarayan, Amit Sheth, and I Budak Arpinar. 2018. Predictive Analysis on Twitter: Techniques and Applications. Emerging Research Challenges and Opportunities in Computational Social Network Analysis and Mining, Springer-Nature (2018).Google ScholarGoogle Scholar
  22. Francois R Lamy, Raminta Daniulaityte, Ramzi W Nahhas, Monica J Barratt, Alan G Smith, Amit Sheth, Silvia S Martins, Edward W Boyer, and Robert G Carlson. 2017. Increases in synthetic cannabinoids-related harms: Results from a longitudinal web-based content analysis. International Journal of Drug Policy (2017).Google ScholarGoogle ScholarCross RefCross Ref
  23. Raymond Lau, Ronald Rosenfeld, and Salim Roukos. 1997. Building scalable n-gram language models using maximum likelihood maximum entropy n-gram models. (1997).Google ScholarGoogle Scholar
  24. Neil A Macmillan and Howard L Kaplan. 1985. Detection theory analysis of group data: estimating sensitivity from average hit and false-alarm rates. Psychological bulletin (1985).Google ScholarGoogle Scholar
  25. Matthew J Maenner, Marshalyn Yeargin-Allsopp, Kim Van Naarden Braun, Deborah L Christensen, and Laura A Schieve. 2016. Development of a machine learning algorithm for the surveillance of autism spectrum disorder. PloS one (2016).Google ScholarGoogle Scholar
  26. Shervin Malmasi, Marcos Zampieri, and Mark Dras. 2016. Predicting post severity in mental health forums. Proceedings of the third workshop on computational lingusitics and clinical psychology .Google ScholarGoogle ScholarCross RefCross Ref
  27. Stefano Massei, Davide Palitta, and Leonardo Robol. 2017. Solving rank structured Sylvester and Lyapunov equations. arXiv preprint arXiv:1711.05493 (2017).Google ScholarGoogle Scholar
  28. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems . Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. David Mimno, Hanna M Wallach, Edmund Talley, Miriam Leenders, and Andrew McCallum. 2011. Optimizing semantic coherence in topic models. In Proceedings of the conference on empirical methods in natural language processing . Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M Mitchell, K Hollingshead, and G Coppersmith. 2015. Quantifying the language of schizophrenia in social media. In Proceedings of the 2nd workshop on Computational linguistics and clinical psychology: From linguistic signal to clinical reality .Google ScholarGoogle ScholarCross RefCross Ref
  31. Finn Årup Nielsen. 2011. A new ANEW: Evaluation of a word list for sentiment analysis in microblogs. arXiv preprint arXiv:1103.2903 (2011).Google ScholarGoogle Scholar
  32. Albert Park, Mike Conway, and Annie T Chen. 2018. Examining thematic similarity, difference, and membership in three online mental health communities from Reddit: a text mining and visualization approach. Computers in Human Behavior (2018). Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. D Preoţiuc-Pietro, M Sap, H A Schwartz, and L Ungar. 2015. Mental illness detection at the World Well-Being Project for the CLPsych 2015 shared task. In Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality .Google ScholarGoogle ScholarCross RefCross Ref
  34. Elvis Saravia, Chun-Hao Chang, Renaud Jollet De Lorenzo, and Yi-Shin Chen. 2016. MIDAS: Mental illness detection and analysis via social media. In Advances in Social Networks Analysis and Mining (ASONAM), 2016 IEEE/ACM International Conference on . Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Judy Hanwen Shen and Frank Rudzicz. 2017. Detecting Anxiety through Reddit. In Proceedings of the Fourth Workshop on Computational Linguistics and Clinical Psychology--From Linguistic Signal to Clinical Reality .Google ScholarGoogle ScholarCross RefCross Ref
  36. Richard Socher, Milind Ganjoo, Christopher D Manning, and Andrew Ng. 2013. Zero-shot learning through cross-modal transfer. Advances in neural information processing systems . Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Joseph Thomas. 2009. Medical records and issues in negligence. Indian journal of urology: IJU: journal of the Urological Society of India (2009).Google ScholarGoogle Scholar
  38. Xuerui Wang, Andrew McCallum, and Xing Wei. 2007. Topical n-grams: Phrase and topic discovery, with an application to information retrieval. In ICDM . Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Sanjaya Wijeratne, Lakshika Balasuriya, Derek Doran, and Amit Sheth. 2016. Word embeddings to enhance twitter gang member profile identification. (2016).Google ScholarGoogle Scholar
  40. Sanjaya Wijeratne, Amit Sheth, Shreyansh Bhatt, Lakshika Balasuriya, Hussein S Al-Olimat, Manas Gaur, AH Yazdavar, and Krishnaprasad Thirunarayan. 2017. Feature Engineering for Twitter-based Applications. Feature Engineering for Machine Learning and Data Analytics (2017).Google ScholarGoogle Scholar
  41. Marie Bee Hui Yap, Shireen Mahtani, Ronald M Rapee, Claire Nicolas, Katherine A Lawrence, Andrew Mackinnon, and Anthony F Jorm. 2018. A tailored web-based intervention to improve parenting risk and protective factors for adolescent depression and anxiety problems: postintervention findings from a randomized controlled trial. Journal of medical Internet research (2018).Google ScholarGoogle Scholar
  42. Amir Hossein Yazdavar, Hussein S Al-Olimat, Monireh Ebrahimi, Goonmeet Bajaj, Tanvi Banerjee, Krishnaprasad Thirunarayan, Jyotishman Pathak, and Amit Sheth. 2017. Semi-supervised approach to monitoring clinical depressive symptoms in social media. In Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017 . Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Fuzheng Zhang, Nicholas Jing Yuan, Defu Lian, Xing Xie, and Wei-Ying Ma. 2016. Collaborative knowledge base embedding for recommender systems. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining . Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. "Let Me Tell You About Your Mental Health!": Contextualized Classification of Reddit Posts to DSM-5 for Web-based Intervention

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader