Abstract
In recent years, suicide has become one of the most critical issues regarding public health between teenagers and adults. On the other hand, the growth and wide-spread of social networks and mobile devices have allowed us to compile relevant information that helps us understand the thoughts, feelings, and emotions extracted from these platforms. The detection of suicidal traits on social media has be-come one relevant research topic. It has permitted the identification of probable suicide traits among media users by examining their posts on known social net-works such as Reddit. For that reason, the purpose of the present research is to compare different supervised classification models such as Logistic Regression, Support Vector Machines, Random Forest, AdaBoost, Gradient Boosting, and XGBoost; together with feature extraction techniques such as TF-IDF and Glove. The results from our experiments show that the best model is SVM with TF-IDF obtaining metrics of 91.50% in Accuracy, 92.40% in Precision, 90.30% in Re-call, and 91.50% regarding the F1-score. This study also shows that TF-IDF for feature extraction outperforms Glove when applied to the different models tested.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
World Health Organization Prevención del suicidio: un imperativo global. http://apps.who.int/iris/bitstream/10665/136083/1/9789275318508_spa.pdf. Accessed 1 Oct 2020
Fodeh, S., et al.: Using machine learning algorithms to detect suicide risk factors on Twitter. In: 2019 International Conference on Data Mining Workshops (ICDMW), pp. 941–948. IEEE, Beijing (2019). https://doi.org/10.1109/ICDMW.2019.00137
Coppersmith, G., Leary, R., Crutchley, P., Fine, A.: Natural language processing of social media as screening for suicide risk. Biomed Inform Insights. 10, 117822261879286 (2018). https://doi.org/10.1177/1178222618792860
McHugh, C.M., Corderoy, A., Ryan, C.J., Hickie, I.B., Large, M.M.: Association between suicidal ideation and suicide: meta-analyses of odds ratios, sensitivity, specificity and positive predictive value. BJPsych open. 5, e18 (2019). https://doi.org/10.1192/bjo.2018.88
Franklin, J.C., et al.: Risk factors for suicidal thoughts and behaviors: a meta-analysis of 50 years of research. Psychol. Bull. 143, 187–232 (2017). https://doi.org/10.1037/bul0000084
Nock, M.K., Ramirez, F., Rankin, O.: Advancing our understanding of the who, when, and why of suicide risk. JAMA Psychiat. 76, 11 (2019). https://doi.org/10.1001/jamapsychiatry.2018.3164
Nobles, A.L., Glenn, J.J., Kowsari, K., Teachman, B.A., Barnes, L.E.: Identification of imminent suicide risk among young adults using text messages. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, pp. 1–11. ACM, Montreal (2018). https://doi.org/10.1145/3173574.3173987
Canzian, L., Musolesi, M.: Trajectories of depression: unobtrusive monitoring of depressive states by means of smartphone mobility traces analysis. In: Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing - UbiComp 2015, pp. 1293–1304. ACM Press, Osaka (2015). https://doi.org/10.1145/2750858.2805845
Sinha, P.P., Mishra, R., Sawhney, R., Mahata, D., Shah, R.R., Liu, H.: #suicidal - a multipronged approach to identify and explore suicidal ideation in Twitter. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 941–950. ACM, Beijing (2019). https://doi.org/10.1145/3357384.3358060
Birjali, M., Beni-Hssane, A., Erritali, M.: Machine learning and semantic sentiment analysis based algorithms for suicide sentiment prediction in social networks. Procedia Comput. Sci. 113, 65–72 (2017). https://doi.org/10.1016/j.procs.2017.08.290
Ji, S., Yu, C.P., Fung, S., Pan, S., Long, G.: Supervised learning for suicidal ideation detection in online user content. Complexity 2018, 1–10 (2018). https://doi.org/10.1155/2018/6157249
Mbarek, A., Jamoussi, S., Charfi, A., Ben Hamadou, A.: Suicidal profiles detection in Twitter. In: Proceedings of the 15th International Conference on Web Information Systems and Technologies, pp. 289–296. SCITEPRESS - Science and Technology Publications, Vienna, Austria (2019). https://doi.org/10.5220/0008167602890296
Tadesse, M.M., Lin, H., Xu, B., Yang, L.: Detection of suicide ideation in social media fo-rums using deep learning. Algorithms 13, 7 (2019). https://doi.org/10.3390/a13010007
Roy, A., Nikolitch, K., McGinn, R., Jinah, S., Klement, W., Kaminsky, Z.A.: A machine learning approach predicts future risk to suicidal ideation from social media data. npj Digit. Med. 3, 78 (2020). https://doi.org/10.1038/s41746-020-0287-6
Sawhney, R., Manchanda, P., Mathur, P., Shah, R., Singh, R.: Exploring and learning sui-cidal ideation connotations on social media with deep learning. In: Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 167–175. Association for Computational Linguistics, Brussels (2018). https://doi.org/10.18653/v1/W18-6223
Huang, X., Zhang, L., Chiu, D., Liu, T., Li, X., Zhu, T.: Detecting Suicidal Ideation in Chinese Microblogs with Psychological Lexicons. In: 2014 IEEE 11th Intl Conf on Ubiquitous Intelligence and Computing and 2014 IEEE 11th Intl Conf on Autonomic and Trusted Computing and 2014 IEEE 14th Intl Conf on Scalable Computing and Communications and Its Associated Workshops. pp. 844–849. IEEE, Bali (2014). https://doi.org/10.1109/UIC-ATC-ScalCom.2014.48
Vioules, M.J., Moulahi, B., Aze, J., Bringay, S.: Detection of suicide-related posts in Twitter data streams. IBM J. Res. Dev. 62, 7:1–7:12 (2018). https://doi.org/10.1147/JRD.2017.2768678
Rajesh Kumar, E., Rama Rao, K.V.S.N., Nayak, S.R., Chandra, R.: Suicidal ideation prediction in twitter data using machine learning techniques. J. Interdisciplinary Math. 23, 117–125 (2020). https://doi.org/10.1080/09720502.2020.1721674
Chiong, R., Budhi, G.S., Dhakal, S., Chiong, F.: A textual-based featuring approach for de-pression detection using machine learning classifiers and social media texts. Comput. Biol. Med. 135, 104499 (2021). https://doi.org/10.1016/j.compbiomed.2021.104499
Eye, B.B.: Depression Analysis. 1 edn., Kaggle (2020)
Shen, G., Jia, J., Nie, L., Feng, F., Zhang, C., Hu, T., Chua, T.-S., Zhu, W.: Depression detection via harvesting social media: a multimodal dictionary learning solution. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pp. 3838–3844. International Joint Conferences on Artificial Intelligence Organization, Melbourne, Australia (2017). https://doi.org/10.24963/ijcai.2017/536
Tanwar, R.: Victoria Suicide Data. Kaggle (2020)
Komati, N.: r/SuicideWatch and r/depression posts from Reddit. Kaggle (2020)
Virahonda, S.: Depression and anxiety comments. 1 edn. Kaggle (2020)
Benton, A., Coppersmith, G., Dredze, M.: Ethical research protocols for social media health research. In: Proceedings of the First ACL Workshop on Ethics in Natural Language Processing, pp. 94–102. Association for Computational Linguistics, Valencia (2017). https://doi.org/10.18653/v1/W17-1612
Ramos, J.: Using TF-IDF to determine word relevance in document queries. In: Proceedings of the First Instructional Conference on Machine Learning, vol. 4, pp. 94–102 (2003)
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543. Association for Computational Linguistics, Doha (2014). https://doi.org/10.3115/v1/D14-1162
Dessi, D., Helaoui, R., Recupero, D.R., Riboni, D.: TF-IDF vs Word Embeddings for Morbidity Identification in Clinical Notes: An Initial Study. arXiv preprint (2021). 2105.09632
Piskorski, J., Jacquet, G.: TF-IDF Character N-grams versus word embedding-based models for fine-grained event classification: a preliminary study. In: Proceedings of the Workshop on Automated Extraction of Socio-political Events from News 2020(9), pp. 26–34 (2020)
Wang, Y., Zhou, Z., Jin, S., Liu, D., Lu, M.: Comparisons and selections of features and classifiers for short text classification. IOP Conf. Ser.: Mater. Sci. Eng. 261, 012018 (2017). https://doi.org/10.1088/1757-899X/261/1/012018
Aladağ, A.E., Muderrisoglu, S., Akbas, N.B., Zahmacioglu, O., Bingol, H.O.: Detecting suicidal ideation on forums: proof-of-concept study. J. Med. Internet Res. 20(6), e9840 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Mantilla-Saavedra, C., Gutiérrez-Cárdenas, J. (2022). Model Comparison for the Classification of Comments Containing Suicidal Traits from Reddit via NLP and Supervised Learning. In: Lossio-Ventura, J.A., et al. Information Management and Big Data. SIMBig 2021. Communications in Computer and Information Science, vol 1577. Springer, Cham. https://doi.org/10.1007/978-3-031-04447-2_17
Download citation
DOI: https://doi.org/10.1007/978-3-031-04447-2_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-04446-5
Online ISBN: 978-3-031-04447-2
eBook Packages: Computer ScienceComputer Science (R0)