research-article

"Let Me Tell You About Your Mental Health!": Contextualized Classification of Reddit Posts to DSM-5 for Web-based Intervention

Authors:
Manas Gaur

Wright State University, Dayton, OH, USA

Wright State University, Dayton, OH, USA
View Profile

,
Ugur Kursuncu

Wright State University, Dayton, OH, USA

Wright State University, Dayton, OH, USA
View Profile

,
Amanuel Alambo

Wright State University, Dayton, OH, USA

Wright State University, Dayton, OH, USA
View Profile

,
Amit Sheth

Wright State University, Dayton, OH, USA

Wright State University, Dayton, OH, USA
View Profile

,
Raminta Daniulaityte

Wright State University, Dayton, OH, USA

Wright State University, Dayton, OH, USA
View Profile

,
Krishnaprasad Thirunarayan

Wright State University, Dayton, OH, USA

Wright State University, Dayton, OH, USA
View Profile

,
Jyotishman Pathak

Cornell University, New York, NY, USA

Cornell University, New York, NY, USA
View Profile

CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge ManagementOctober 2018Pages 753–762https://doi.org/10.1145/3269206.3271732

Published:17 October 2018Publication History

CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management

Pages 753–762

ABSTRACT

Social media platforms are increasingly being used to share and seek advice on mental health issues. In particular, Reddit users freely discuss such issues on various subreddits, whose structure and content can be leveraged to formally interpret and relate subreddits and their posts in terms of mental health diagnostic categories. There is prior research on the extraction of mental health-related information, including symptoms, diagnosis, and treatments from social media; however, our approach can additionally provide actionable information to clinicians about the mental health of a patient in diagnostic terms for web-based intervention. Specifically, we provide a detailed analysis of the nature of subreddit content from domain expert's perspective and introduce a novel approach to map each subreddit to the best matching DSM-5 (Diagnostic and Statistical Manual of Mental Disorders - 5th Edition) category using multi-class classifier. Our classification algorithm analyzes all the posts of a subreddit by adapting topic modeling and word-embedding techniques, and utilizing curated medical knowledge bases to quantify relationship to DSM-5 categories. Our semantic encoding-decoding optimization approach reduces the false-alarm-rate from 30% to 2.5% over a comparable heuristic baseline, and our mapping results have been verified by domain experts achieving a kappa score of 0.84.

References

Amrudin Agovic and Arindam Banerjee. 2012. Gaussian process topic models. arXiv preprint arXiv:1203.3462 (2012). Google ScholarDigital Library
Melanie Andresen and Heike Zinsmeister. 2017. Approximating Style by N-gram-based Annotation. In Proceedings of the Workshop on Stylistic Variation .Google ScholarCross Ref
Erik Cambria, Bjorn Schuller, Bing Liu, Haixun Wang, and Catherine Havasi. 2013. Knowledge-based approaches to concept-level sentiment analysis. IEEE intelligent systems (2013). Google ScholarDigital Library
Delroy Cameron, Gary A Smith, Raminta Daniulaityte, Amit P Sheth, Drashti Dave, Lu Chen, Gaurish Anand, Robert Carlson, Kera Z Watkins, and Russel Falck. 2013. PREDOSE: a semantic web platform for drug abuse epidemiology using social media. Journal of biomedical informatics (2013). Google ScholarDigital Library
William B Cavnar, John M Trenkle, and others. 1994. N-gram-based text categorization. Ann arbor mi (1994).Google Scholar
Chao Chen, Andy Liaw, and Leo Breiman. 2004. Using random forest to learn imbalanced data. University of California, Berkeley (2004).Google Scholar
Raminta Daniulaityte, Robert Carlson, Gregory Brigham, Delroy Cameron, and Amit Sheth. 2015. "Sub is a weird drug:" A web-based study of lay attitudes about use of buprenorphine to self-treat opioid withdrawal symptoms. The American journal on addictions (2015).Google Scholar
Raminta Daniulaityte, Francois R Lamy, G Alan Smith, Ramzi W Nahhas, Robert G Carlson, Krishnaprasad Thirunarayan, Silvia S Martins, Edward W Boyer, and Amit Sheth. 2017. "Retweet to Pass the Blunt": Analyzing Geographic and Content Features of Cannabis-Related Tweeting Across the United States. Journal of studies on alcohol and drugs (2017).Google Scholar
Munmun De Choudhury, Scott Counts, and Mary Czerwinski. 2011. Identifying relevant social media content: leveraging information diversity and user cognition. In Proceedings of the 22nd ACM conference on Hypertext and hypermedia . Google ScholarDigital Library
Munmun De Choudhury, Scott Counts, and Eric Horvitz. 2013a. Social media as a measurement tool of depression in populations. In Proceedings of the 5th Annual ACM Web Science Conference . Google ScholarDigital Library
Munmun De Choudhury, Michael Gamon, Scott Counts, and Eric Horvitz. 2013b. Predicting depression via social media. ICWSM (2013).Google Scholar
Munmun De Choudhury, Emre Kiciman, Mark Dredze, Glen Coppersmith, and Mrinal Kumar. 2016. Discovering shifts to suicidal ideation from mental health content in social media. In Proceedings of the 2016 CHI conference on human factors in computing systems . Google ScholarDigital Library
George Gkotsis, Anika Oellrich, Tim Hubbard, Richard Dobson, Maria Liakata, Sumithra Velupillai, and Rina Dutta. 2016. The language of mental health problems in social media. In Proceedings of the Third Workshop on Computational Lingusitics and Clinical Psychology .Google ScholarCross Ref
George Gkotsis, Anika Oellrich, Sumithra Velupillai, Maria Liakata, Tim JP Hubbard, Richard JB Dobson, and Rina Dutta. 2017. Characterisation of mental health conditions in social media using Informed Deep Learning. Scientific reports (2017).Google Scholar
Li Guan, Bibo Hao, Qijin Cheng, Paul SF Yip, and Tingshao Zhu. 2015. Identifying Chinese microblog users with high suicide probability using internet-based profile and linguistic features: classification model. JMIR mental health (2015).Google Scholar
Yuening Hu, Jordan Boyd-Graber, Brianna Satinoff, and Alison Smith. 2014. Interactive topic modeling. Machine learning (2014). Google ScholarDigital Library
Matthew R Jamnik and David J Lane. 2017. The Use of Reddit as an Inexpensive Source for High-Quality Data. Practical Assessment, Research & Evaluation (2017).Google Scholar
Elyor Kodirov, Tao Xiang, and Shaogang Gong. 2017. Semantic autoencoder for zero-shot learning. arXiv preprint arXiv:1704.08345 (2017).Google Scholar
Bartosz Krawczyk. 2016. Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence (2016).Google Scholar
Mrinal Kumar, Mark Dredze, Glen Coppersmith, and Munmun De Choudhury. 2015. Detecting changes in suicide content manifested in social media following celebrity suicides. In Proceedings of the 26th ACM Conference on Hypertext & Social Media . Google ScholarDigital Library
Ugur Kursuncu, Manas Gaur, Usha Lokala, Krishnaprasad Thirunarayan, Amit Sheth, and I Budak Arpinar. 2018. Predictive Analysis on Twitter: Techniques and Applications. Emerging Research Challenges and Opportunities in Computational Social Network Analysis and Mining, Springer-Nature (2018).Google Scholar
Francois R Lamy, Raminta Daniulaityte, Ramzi W Nahhas, Monica J Barratt, Alan G Smith, Amit Sheth, Silvia S Martins, Edward W Boyer, and Robert G Carlson. 2017. Increases in synthetic cannabinoids-related harms: Results from a longitudinal web-based content analysis. International Journal of Drug Policy (2017).Google ScholarCross Ref
Raymond Lau, Ronald Rosenfeld, and Salim Roukos. 1997. Building scalable n-gram language models using maximum likelihood maximum entropy n-gram models. (1997).Google Scholar
Neil A Macmillan and Howard L Kaplan. 1985. Detection theory analysis of group data: estimating sensitivity from average hit and false-alarm rates. Psychological bulletin (1985).Google Scholar
Matthew J Maenner, Marshalyn Yeargin-Allsopp, Kim Van Naarden Braun, Deborah L Christensen, and Laura A Schieve. 2016. Development of a machine learning algorithm for the surveillance of autism spectrum disorder. PloS one (2016).Google Scholar
Shervin Malmasi, Marcos Zampieri, and Mark Dras. 2016. Predicting post severity in mental health forums. Proceedings of the third workshop on computational lingusitics and clinical psychology .Google ScholarCross Ref
Stefano Massei, Davide Palitta, and Leonardo Robol. 2017. Solving rank structured Sylvester and Lyapunov equations. arXiv preprint arXiv:1711.05493 (2017).Google Scholar
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems . Google ScholarDigital Library
David Mimno, Hanna M Wallach, Edmund Talley, Miriam Leenders, and Andrew McCallum. 2011. Optimizing semantic coherence in topic models. In Proceedings of the conference on empirical methods in natural language processing . Google ScholarDigital Library
M Mitchell, K Hollingshead, and G Coppersmith. 2015. Quantifying the language of schizophrenia in social media. In Proceedings of the 2nd workshop on Computational linguistics and clinical psychology: From linguistic signal to clinical reality .Google ScholarCross Ref
Finn Årup Nielsen. 2011. A new ANEW: Evaluation of a word list for sentiment analysis in microblogs. arXiv preprint arXiv:1103.2903 (2011).Google Scholar
Albert Park, Mike Conway, and Annie T Chen. 2018. Examining thematic similarity, difference, and membership in three online mental health communities from Reddit: a text mining and visualization approach. Computers in Human Behavior (2018). Google ScholarDigital Library
D Preoţiuc-Pietro, M Sap, H A Schwartz, and L Ungar. 2015. Mental illness detection at the World Well-Being Project for the CLPsych 2015 shared task. In Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality .Google ScholarCross Ref
Elvis Saravia, Chun-Hao Chang, Renaud Jollet De Lorenzo, and Yi-Shin Chen. 2016. MIDAS: Mental illness detection and analysis via social media. In Advances in Social Networks Analysis and Mining (ASONAM), 2016 IEEE/ACM International Conference on . Google ScholarDigital Library
Judy Hanwen Shen and Frank Rudzicz. 2017. Detecting Anxiety through Reddit. In Proceedings of the Fourth Workshop on Computational Linguistics and Clinical Psychology--From Linguistic Signal to Clinical Reality .Google ScholarCross Ref
Richard Socher, Milind Ganjoo, Christopher D Manning, and Andrew Ng. 2013. Zero-shot learning through cross-modal transfer. Advances in neural information processing systems . Google ScholarDigital Library
Joseph Thomas. 2009. Medical records and issues in negligence. Indian journal of urology: IJU: journal of the Urological Society of India (2009).Google Scholar
Xuerui Wang, Andrew McCallum, and Xing Wei. 2007. Topical n-grams: Phrase and topic discovery, with an application to information retrieval. In ICDM . Google ScholarDigital Library
Sanjaya Wijeratne, Lakshika Balasuriya, Derek Doran, and Amit Sheth. 2016. Word embeddings to enhance twitter gang member profile identification. (2016).Google Scholar
Sanjaya Wijeratne, Amit Sheth, Shreyansh Bhatt, Lakshika Balasuriya, Hussein S Al-Olimat, Manas Gaur, AH Yazdavar, and Krishnaprasad Thirunarayan. 2017. Feature Engineering for Twitter-based Applications. Feature Engineering for Machine Learning and Data Analytics (2017).Google Scholar
Marie Bee Hui Yap, Shireen Mahtani, Ronald M Rapee, Claire Nicolas, Katherine A Lawrence, Andrew Mackinnon, and Anthony F Jorm. 2018. A tailored web-based intervention to improve parenting risk and protective factors for adolescent depression and anxiety problems: postintervention findings from a randomized controlled trial. Journal of medical Internet research (2018).Google Scholar
Amir Hossein Yazdavar, Hussein S Al-Olimat, Monireh Ebrahimi, Goonmeet Bajaj, Tanvi Banerjee, Krishnaprasad Thirunarayan, Jyotishman Pathak, and Amit Sheth. 2017. Semi-supervised approach to monitoring clinical depressive symptoms in social media. In Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017 . Google ScholarDigital Library
Fuzheng Zhang, Nicholas Jing Yuan, Defu Lian, Xing Xie, and Wei-Ying Ma. 2016. Collaborative knowledge base embedding for recommender systems. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining . Google ScholarDigital Library

Index Terms

"Let Me Tell You About Your Mental Health!": Contextualized Classification of Reddit Posts to DSM-5 for Web-based Intervention

Recommendations

Knowledge-aware Assessment of Severity of Suicide Risk for Early Intervention
WWW '19: The World Wide Web Conference

Mental health illness such as depression is a significant risk factor for suicide ideation, behaviors, and attempts. A report by Substance Abuse and Mental Health Services Administration (SAMHSA) shows that 80% of the patients suffering from Borderline ...
Read More
Language on Reddit Reveals Differential Mental Health Markers for Individuals posting in Immigration Communities
WebSci '23: Proceedings of the 15th ACM Web Science Conference 2023

The experience of immigrating to a foreign land is associated with exposure to new cultures, changes in social networks, and challenges to prevalent systems of meaning. A body of literature has shown that the immigration experience, while pursued with ...
Read More
COVID-19 and Mental Health/Substance Use Disorders on Reddit: A Longitudinal Study
Pattern Recognition. ICPR International Workshops and Challenges
Abstract
COVID-19 pandemic has adversely and disproportionately impacted people suffering from mental health issues and substance use problems. This has been exacerbated by social isolation during the pandemic and the social stigma associated with mental ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management
October 2018
2362 pages
ISBN:9781450360142
DOI:10.1145/3269206
General Chair:
Alfredo Cuzzocrea
University of Trieste, Italy
,
Program Chairs:
James Allan
University of Massachusetts, USA
,
Norman Paton
University of Manchester, United Kingdom
,
Divesh Srivastava
AT&T Labs Research, USA
,
Rakesh Agrawal
Data Insights Lab, USA
,
Andrei Broder
Google Research, USA
,
Mohammed Zaki
Rensselaer Polytechnic Institute, USA
,
Selcuk Candan
Arizona State University, USA
,
Alexandros Labrinidis
University of Pittsburgh, USA
,
Assaf Schuster
Technion, Israel
,
Haixun Wang
Google Research, USA
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 October 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
drug abuse ontology
dsm-5
medical knowledge bases
mental health
reddit
semantic encoding and decoding
semantic social computing
Qualifiers
- research-article
Conference

Acceptance Rates
CIKM '18 Paper Acceptance Rate147of826submissions,18%Overall Acceptance Rate1,861of8,427submissions,22%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 40
  Total Citations
  View Citations
- 1,667
  Total Downloads
- Downloads (Last 12 months)219
- Downloads (Last 6 weeks)36
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

"Let Me Tell You About Your Mental Health!": Contextualized Classification of Reddit Posts to DSM-5 for Web-based Intervention

CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Knowledge-aware Assessment of Severity of Suicide Risk for Early Intervention

Language on Reddit Reveals Differential Mental Health Markers for Individuals posting in Immigration Communities

COVID-19 and Mental Health/Substance Use Disorders on Reddit: A Longitudinal Study