Model Comparison for the Classification of Comments Containing Suicidal Traits from Reddit via NLP and Supervised Learning

Mantilla-Saavedra, Camila; Gutiérrez-Cárdenas, Juan

doi:10.1007/978-3-031-04447-2_17

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1577))

Included in the following conference series:

Annual International Conference on Information Management and Big Data

526 Accesses
1 Citations

Abstract

In recent years, suicide has become one of the most critical issues regarding public health between teenagers and adults. On the other hand, the growth and wide-spread of social networks and mobile devices have allowed us to compile relevant information that helps us understand the thoughts, feelings, and emotions extracted from these platforms. The detection of suicidal traits on social media has be-come one relevant research topic. It has permitted the identification of probable suicide traits among media users by examining their posts on known social net-works such as Reddit. For that reason, the purpose of the present research is to compare different supervised classification models such as Logistic Regression, Support Vector Machines, Random Forest, AdaBoost, Gradient Boosting, and XGBoost; together with feature extraction techniques such as TF-IDF and Glove. The results from our experiments show that the best model is SVM with TF-IDF obtaining metrics of 91.50% in Accuracy, 92.40% in Precision, 90.30% in Re-call, and 91.50% regarding the F1-score. This study also shows that TF-IDF for feature extraction outperforms Glove when applied to the different models tested.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

World Health Organization Prevención del suicidio: un imperativo global. http://apps.who.int/iris/bitstream/10665/136083/1/9789275318508_spa.pdf. Accessed 1 Oct 2020
Fodeh, S., et al.: Using machine learning algorithms to detect suicide risk factors on Twitter. In: 2019 International Conference on Data Mining Workshops (ICDMW), pp. 941–948. IEEE, Beijing (2019). https://doi.org/10.1109/ICDMW.2019.00137
Coppersmith, G., Leary, R., Crutchley, P., Fine, A.: Natural language processing of social media as screening for suicide risk. Biomed Inform Insights. 10, 117822261879286 (2018). https://doi.org/10.1177/1178222618792860
Article Google Scholar
McHugh, C.M., Corderoy, A., Ryan, C.J., Hickie, I.B., Large, M.M.: Association between suicidal ideation and suicide: meta-analyses of odds ratios, sensitivity, specificity and positive predictive value. BJPsych open. 5, e18 (2019). https://doi.org/10.1192/bjo.2018.88
Article Google Scholar
Franklin, J.C., et al.: Risk factors for suicidal thoughts and behaviors: a meta-analysis of 50 years of research. Psychol. Bull. 143, 187–232 (2017). https://doi.org/10.1037/bul0000084
Article Google Scholar
Nock, M.K., Ramirez, F., Rankin, O.: Advancing our understanding of the who, when, and why of suicide risk. JAMA Psychiat. 76, 11 (2019). https://doi.org/10.1001/jamapsychiatry.2018.3164
Article Google Scholar
Nobles, A.L., Glenn, J.J., Kowsari, K., Teachman, B.A., Barnes, L.E.: Identification of imminent suicide risk among young adults using text messages. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, pp. 1–11. ACM, Montreal (2018). https://doi.org/10.1145/3173574.3173987
Canzian, L., Musolesi, M.: Trajectories of depression: unobtrusive monitoring of depressive states by means of smartphone mobility traces analysis. In: Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing - UbiComp 2015, pp. 1293–1304. ACM Press, Osaka (2015). https://doi.org/10.1145/2750858.2805845
Sinha, P.P., Mishra, R., Sawhney, R., Mahata, D., Shah, R.R., Liu, H.: #suicidal - a multipronged approach to identify and explore suicidal ideation in Twitter. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 941–950. ACM, Beijing (2019). https://doi.org/10.1145/3357384.3358060
Birjali, M., Beni-Hssane, A., Erritali, M.: Machine learning and semantic sentiment analysis based algorithms for suicide sentiment prediction in social networks. Procedia Comput. Sci. 113, 65–72 (2017). https://doi.org/10.1016/j.procs.2017.08.290
Article Google Scholar
Ji, S., Yu, C.P., Fung, S., Pan, S., Long, G.: Supervised learning for suicidal ideation detection in online user content. Complexity 2018, 1–10 (2018). https://doi.org/10.1155/2018/6157249
Article Google Scholar
Mbarek, A., Jamoussi, S., Charfi, A., Ben Hamadou, A.: Suicidal profiles detection in Twitter. In: Proceedings of the 15th International Conference on Web Information Systems and Technologies, pp. 289–296. SCITEPRESS - Science and Technology Publications, Vienna, Austria (2019). https://doi.org/10.5220/0008167602890296
Tadesse, M.M., Lin, H., Xu, B., Yang, L.: Detection of suicide ideation in social media fo-rums using deep learning. Algorithms 13, 7 (2019). https://doi.org/10.3390/a13010007
Article Google Scholar
Roy, A., Nikolitch, K., McGinn, R., Jinah, S., Klement, W., Kaminsky, Z.A.: A machine learning approach predicts future risk to suicidal ideation from social media data. npj Digit. Med. 3, 78 (2020). https://doi.org/10.1038/s41746-020-0287-6
Sawhney, R., Manchanda, P., Mathur, P., Shah, R., Singh, R.: Exploring and learning sui-cidal ideation connotations on social media with deep learning. In: Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 167–175. Association for Computational Linguistics, Brussels (2018). https://doi.org/10.18653/v1/W18-6223
Huang, X., Zhang, L., Chiu, D., Liu, T., Li, X., Zhu, T.: Detecting Suicidal Ideation in Chinese Microblogs with Psychological Lexicons. In: 2014 IEEE 11th Intl Conf on Ubiquitous Intelligence and Computing and 2014 IEEE 11th Intl Conf on Autonomic and Trusted Computing and 2014 IEEE 14th Intl Conf on Scalable Computing and Communications and Its Associated Workshops. pp. 844–849. IEEE, Bali (2014). https://doi.org/10.1109/UIC-ATC-ScalCom.2014.48
Vioules, M.J., Moulahi, B., Aze, J., Bringay, S.: Detection of suicide-related posts in Twitter data streams. IBM J. Res. Dev. 62, 7:1–7:12 (2018). https://doi.org/10.1147/JRD.2017.2768678
Rajesh Kumar, E., Rama Rao, K.V.S.N., Nayak, S.R., Chandra, R.: Suicidal ideation prediction in twitter data using machine learning techniques. J. Interdisciplinary Math. 23, 117–125 (2020). https://doi.org/10.1080/09720502.2020.1721674
Chiong, R., Budhi, G.S., Dhakal, S., Chiong, F.: A textual-based featuring approach for de-pression detection using machine learning classifiers and social media texts. Comput. Biol. Med. 135, 104499 (2021). https://doi.org/10.1016/j.compbiomed.2021.104499
Article Google Scholar
Eye, B.B.: Depression Analysis. 1 edn., Kaggle (2020)
Google Scholar
Shen, G., Jia, J., Nie, L., Feng, F., Zhang, C., Hu, T., Chua, T.-S., Zhu, W.: Depression detection via harvesting social media: a multimodal dictionary learning solution. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pp. 3838–3844. International Joint Conferences on Artificial Intelligence Organization, Melbourne, Australia (2017). https://doi.org/10.24963/ijcai.2017/536
Tanwar, R.: Victoria Suicide Data. Kaggle (2020)
Google Scholar
Komati, N.: r/SuicideWatch and r/depression posts from Reddit. Kaggle (2020)
Google Scholar
Virahonda, S.: Depression and anxiety comments. 1 edn. Kaggle (2020)
Google Scholar
Benton, A., Coppersmith, G., Dredze, M.: Ethical research protocols for social media health research. In: Proceedings of the First ACL Workshop on Ethics in Natural Language Processing, pp. 94–102. Association for Computational Linguistics, Valencia (2017). https://doi.org/10.18653/v1/W17-1612
Ramos, J.: Using TF-IDF to determine word relevance in document queries. In: Proceedings of the First Instructional Conference on Machine Learning, vol. 4, pp. 94–102 (2003)
Google Scholar
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543. Association for Computational Linguistics, Doha (2014). https://doi.org/10.3115/v1/D14-1162
Dessi, D., Helaoui, R., Recupero, D.R., Riboni, D.: TF-IDF vs Word Embeddings for Morbidity Identification in Clinical Notes: An Initial Study. arXiv preprint (2021). 2105.09632
Google Scholar
Piskorski, J., Jacquet, G.: TF-IDF Character N-grams versus word embedding-based models for fine-grained event classification: a preliminary study. In: Proceedings of the Workshop on Automated Extraction of Socio-political Events from News 2020(9), pp. 26–34 (2020)
Google Scholar
Wang, Y., Zhou, Z., Jin, S., Liu, D., Lu, M.: Comparisons and selections of features and classifiers for short text classification. IOP Conf. Ser.: Mater. Sci. Eng. 261, 012018 (2017). https://doi.org/10.1088/1757-899X/261/1/012018
Aladağ, A.E., Muderrisoglu, S., Akbas, N.B., Zahmacioglu, O., Bingol, H.O.: Detecting suicidal ideation on forums: proof-of-concept study. J. Med. Internet Res. 20(6), e9840 (2018)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Universidad de Lima, Lima, Peru
Camila Mantilla-Saavedra & Juan Gutiérrez-Cárdenas

Authors

Camila Mantilla-Saavedra
View author publications
You can also search for this author in PubMed Google Scholar
Juan Gutiérrez-Cárdenas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Juan Gutiérrez-Cárdenas .

Editor information

Editors and Affiliations

National Institutes of Health, Bethesda, MD, USA
Juan Antonio Lossio-Ventura
Visibilia, São Paulo, Brazil
Jorge Valverde-Rebaza
Universidad Peruana de Ciencias Aplicadas, Lima, Peru
Eduardo Díaz
ENSIIE and SAMOVAR, Evry, France
Denisse Muñante
The Open University, Milton Keynes, UK
Carlos Gavidia-Calderon
Federal University of São Carlos, São Carlos, Brazil
Alan Demétrius Baria Valejo
University of Engineering and Technology UTEC, Lima, Peru
Hugo Alatrista-Salas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mantilla-Saavedra, C., Gutiérrez-Cárdenas, J. (2022). Model Comparison for the Classification of Comments Containing Suicidal Traits from Reddit via NLP and Supervised Learning. In: Lossio-Ventura, J.A., et al. Information Management and Big Data. SIMBig 2021. Communications in Computer and Information Science, vol 1577. Springer, Cham. https://doi.org/10.1007/978-3-031-04447-2_17

Download citation

DOI: https://doi.org/10.1007/978-3-031-04447-2_17
Published: 20 April 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-04446-5
Online ISBN: 978-3-031-04447-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Model Comparison for the Classification of Comments Containing Suicidal Traits from Reddit via NLP and Supervised Learning