Skip to main content

Model Comparison for the Classification of Comments Containing Suicidal Traits from Reddit via NLP and Supervised Learning

  • Conference paper
  • First Online:
Information Management and Big Data (SIMBig 2021)

Abstract

In recent years, suicide has become one of the most critical issues regarding public health between teenagers and adults. On the other hand, the growth and wide-spread of social networks and mobile devices have allowed us to compile relevant information that helps us understand the thoughts, feelings, and emotions extracted from these platforms. The detection of suicidal traits on social media has be-come one relevant research topic. It has permitted the identification of probable suicide traits among media users by examining their posts on known social net-works such as Reddit. For that reason, the purpose of the present research is to compare different supervised classification models such as Logistic Regression, Support Vector Machines, Random Forest, AdaBoost, Gradient Boosting, and XGBoost; together with feature extraction techniques such as TF-IDF and Glove. The results from our experiments show that the best model is SVM with TF-IDF obtaining metrics of 91.50% in Accuracy, 92.40% in Precision, 90.30% in Re-call, and 91.50% regarding the F1-score. This study also shows that TF-IDF for feature extraction outperforms Glove when applied to the different models tested.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.reddit.com/r/SuicideWatch/.

  2. 2.

    https://www.kaggle.com/nikhileswarkomati/suicide-watch.

  3. 3.

    https://pypi.org/project/zeugma/.

References

  1. World Health Organization Prevención del suicidio: un imperativo global. http://apps.who.int/iris/bitstream/10665/136083/1/9789275318508_spa.pdf. Accessed 1 Oct 2020

  2. Fodeh, S., et al.: Using machine learning algorithms to detect suicide risk factors on Twitter. In: 2019 International Conference on Data Mining Workshops (ICDMW), pp. 941–948. IEEE, Beijing (2019). https://doi.org/10.1109/ICDMW.2019.00137

  3. Coppersmith, G., Leary, R., Crutchley, P., Fine, A.: Natural language processing of social media as screening for suicide risk. Biomed Inform Insights. 10, 117822261879286 (2018). https://doi.org/10.1177/1178222618792860

    Article  Google Scholar 

  4. McHugh, C.M., Corderoy, A., Ryan, C.J., Hickie, I.B., Large, M.M.: Association between suicidal ideation and suicide: meta-analyses of odds ratios, sensitivity, specificity and positive predictive value. BJPsych open. 5, e18 (2019). https://doi.org/10.1192/bjo.2018.88

    Article  Google Scholar 

  5. Franklin, J.C., et al.: Risk factors for suicidal thoughts and behaviors: a meta-analysis of 50 years of research. Psychol. Bull. 143, 187–232 (2017). https://doi.org/10.1037/bul0000084

    Article  Google Scholar 

  6. Nock, M.K., Ramirez, F., Rankin, O.: Advancing our understanding of the who, when, and why of suicide risk. JAMA Psychiat. 76, 11 (2019). https://doi.org/10.1001/jamapsychiatry.2018.3164

    Article  Google Scholar 

  7. Nobles, A.L., Glenn, J.J., Kowsari, K., Teachman, B.A., Barnes, L.E.: Identification of imminent suicide risk among young adults using text messages. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, pp. 1–11. ACM, Montreal (2018). https://doi.org/10.1145/3173574.3173987

  8. Canzian, L., Musolesi, M.: Trajectories of depression: unobtrusive monitoring of depressive states by means of smartphone mobility traces analysis. In: Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing - UbiComp 2015, pp. 1293–1304. ACM Press, Osaka (2015). https://doi.org/10.1145/2750858.2805845

  9. Sinha, P.P., Mishra, R., Sawhney, R., Mahata, D., Shah, R.R., Liu, H.: #suicidal - a multipronged approach to identify and explore suicidal ideation in Twitter. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 941–950. ACM, Beijing (2019). https://doi.org/10.1145/3357384.3358060

  10. Birjali, M., Beni-Hssane, A., Erritali, M.: Machine learning and semantic sentiment analysis based algorithms for suicide sentiment prediction in social networks. Procedia Comput. Sci. 113, 65–72 (2017). https://doi.org/10.1016/j.procs.2017.08.290

    Article  Google Scholar 

  11. Ji, S., Yu, C.P., Fung, S., Pan, S., Long, G.: Supervised learning for suicidal ideation detection in online user content. Complexity 2018, 1–10 (2018). https://doi.org/10.1155/2018/6157249

    Article  Google Scholar 

  12. Mbarek, A., Jamoussi, S., Charfi, A., Ben Hamadou, A.: Suicidal profiles detection in Twitter. In: Proceedings of the 15th International Conference on Web Information Systems and Technologies, pp. 289–296. SCITEPRESS - Science and Technology Publications, Vienna, Austria (2019). https://doi.org/10.5220/0008167602890296

  13. Tadesse, M.M., Lin, H., Xu, B., Yang, L.: Detection of suicide ideation in social media fo-rums using deep learning. Algorithms 13, 7 (2019). https://doi.org/10.3390/a13010007

    Article  Google Scholar 

  14. Roy, A., Nikolitch, K., McGinn, R., Jinah, S., Klement, W., Kaminsky, Z.A.: A machine learning approach predicts future risk to suicidal ideation from social media data. npj Digit. Med. 3, 78 (2020). https://doi.org/10.1038/s41746-020-0287-6

  15. Sawhney, R., Manchanda, P., Mathur, P., Shah, R., Singh, R.: Exploring and learning sui-cidal ideation connotations on social media with deep learning. In: Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 167–175. Association for Computational Linguistics, Brussels (2018). https://doi.org/10.18653/v1/W18-6223

  16. Huang, X., Zhang, L., Chiu, D., Liu, T., Li, X., Zhu, T.: Detecting Suicidal Ideation in Chinese Microblogs with Psychological Lexicons. In: 2014 IEEE 11th Intl Conf on Ubiquitous Intelligence and Computing and 2014 IEEE 11th Intl Conf on Autonomic and Trusted Computing and 2014 IEEE 14th Intl Conf on Scalable Computing and Communications and Its Associated Workshops. pp. 844–849. IEEE, Bali (2014). https://doi.org/10.1109/UIC-ATC-ScalCom.2014.48

  17. Vioules, M.J., Moulahi, B., Aze, J., Bringay, S.: Detection of suicide-related posts in Twitter data streams. IBM J. Res. Dev. 62, 7:1–7:12 (2018). https://doi.org/10.1147/JRD.2017.2768678

  18. Rajesh Kumar, E., Rama Rao, K.V.S.N., Nayak, S.R., Chandra, R.: Suicidal ideation prediction in twitter data using machine learning techniques. J. Interdisciplinary Math. 23, 117–125 (2020). https://doi.org/10.1080/09720502.2020.1721674

  19. Chiong, R., Budhi, G.S., Dhakal, S., Chiong, F.: A textual-based featuring approach for de-pression detection using machine learning classifiers and social media texts. Comput. Biol. Med. 135, 104499 (2021). https://doi.org/10.1016/j.compbiomed.2021.104499

    Article  Google Scholar 

  20. Eye, B.B.: Depression Analysis. 1 edn., Kaggle (2020)

    Google Scholar 

  21. Shen, G., Jia, J., Nie, L., Feng, F., Zhang, C., Hu, T., Chua, T.-S., Zhu, W.: Depression detection via harvesting social media: a multimodal dictionary learning solution. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pp. 3838–3844. International Joint Conferences on Artificial Intelligence Organization, Melbourne, Australia (2017). https://doi.org/10.24963/ijcai.2017/536

  22. Tanwar, R.: Victoria Suicide Data. Kaggle (2020)

    Google Scholar 

  23. Komati, N.: r/SuicideWatch and r/depression posts from Reddit. Kaggle (2020)

    Google Scholar 

  24. Virahonda, S.: Depression and anxiety comments. 1 edn. Kaggle (2020)

    Google Scholar 

  25. Benton, A., Coppersmith, G., Dredze, M.: Ethical research protocols for social media health research. In: Proceedings of the First ACL Workshop on Ethics in Natural Language Processing, pp. 94–102. Association for Computational Linguistics, Valencia (2017). https://doi.org/10.18653/v1/W17-1612

  26. Ramos, J.: Using TF-IDF to determine word relevance in document queries. In: Proceedings of the First Instructional Conference on Machine Learning, vol. 4, pp. 94–102 (2003)

    Google Scholar 

  27. Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543. Association for Computational Linguistics, Doha (2014). https://doi.org/10.3115/v1/D14-1162

  28. Dessi, D., Helaoui, R., Recupero, D.R., Riboni, D.: TF-IDF vs Word Embeddings for Morbidity Identification in Clinical Notes: An Initial Study. arXiv preprint (2021). 2105.09632

    Google Scholar 

  29. Piskorski, J., Jacquet, G.: TF-IDF Character N-grams versus word embedding-based models for fine-grained event classification: a preliminary study. In: Proceedings of the Workshop on Automated Extraction of Socio-political Events from News 2020(9), pp. 26–34 (2020)

    Google Scholar 

  30. Wang, Y., Zhou, Z., Jin, S., Liu, D., Lu, M.: Comparisons and selections of features and classifiers for short text classification. IOP Conf. Ser.: Mater. Sci. Eng. 261, 012018 (2017). https://doi.org/10.1088/1757-899X/261/1/012018

  31. Aladağ, A.E., Muderrisoglu, S., Akbas, N.B., Zahmacioglu, O., Bingol, H.O.: Detecting suicidal ideation on forums: proof-of-concept study. J. Med. Internet Res. 20(6), e9840 (2018)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Juan Gutiérrez-Cárdenas .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mantilla-Saavedra, C., Gutiérrez-Cárdenas, J. (2022). Model Comparison for the Classification of Comments Containing Suicidal Traits from Reddit via NLP and Supervised Learning. In: Lossio-Ventura, J.A., et al. Information Management and Big Data. SIMBig 2021. Communications in Computer and Information Science, vol 1577. Springer, Cham. https://doi.org/10.1007/978-3-031-04447-2_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-04447-2_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-04446-5

  • Online ISBN: 978-3-031-04447-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics