Abstract
Depression is a common mental illness that has to be detected and treated at an early stage to avoid serious consequences. There are many methods and modalities for detecting depression that involves physical examination of the individual. However, diagnosing mental health using their social media data is more effective as it avoids such physical examinations. Also, people express their emotions well in social media, it is desirable to diagnose their mental health using social media data. Though there are many existing systems that detects mental illness of a person by analysing their social media data, detecting the level of depression is also important for further treatment. Thus, in this research, we developed a gold standard data set that detects the levels of depression as ‘not depressed’, ‘moderately depressed’ and ‘severely depressed’ from the social media postings. Traditional learning algorithms were employed on this data set and an empirical analysis was presented in this paper. Data augmentation technique was applied to overcome the data imbalance. Among the several variations that are implemented, the model with Word2Vec vectorizer and Random Forest classifier on augmented data outperforms the other variations with a score of 0.877 for both accuracy and F1 measure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
American Psychiatric Association. https://www.psychiatry.org/patients-families/depression/what-is-depression. Accessed 17 Nov 2021
Healthline. https://www.healthline.com/health/depression/mild-depression. Accessed 17 Nov 2021
Institute of Health Metrics and Evaluation. Global Health Data Exchange (GHDx). http://ghdx.healthdata.org/gbd-results-tool?params=gbd-api-2019-permalink/d780dffbe8a381b25e1416884959e88b. Accessed 17 Nov 2021
Statista statistics. https://www.statista.com/statistics/278414/number-of-worldwide-social-network-users/. Accessed 17 Nov 2021
Al Hanai, T., Ghassemi, M.M., Glass, J.R.: Detecting depression with audio/text sequence modeling of interviews. In: Interspeech, pp. 1716–1720 (2018)
Alghowinem, S., et al.: Multimodal depression detection: fusion analysis of paralinguistic, head pose and eye gaze behaviors. IEEE Trans. Affect. Comput. 9(4), 478–490 (2016)
Artstein, R., Poesio, M.: Inter-coder agreement for computational linguistics. Comput. Linguist. 34(4), 555–596 (2008)
Boettcher, N., et al.: Studies of depression and anxiety using reddit as a data source: scoping review. JMIR Ment. Health 8(11), e29487 (2021)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Measur. 20(1), 37–46 (1960)
Deshpande, M., Rao, V.: Depression detection using emotion artificial intelligence. In: 2017 International Conference on Intelligent Sustainable Systems (ICISS), pp. 858–862. IEEE (2017)
Dibeklioğlu, H., Hammal, Z., Yang, Y., Cohn, J.F.: Multimodal detection of depression in clinical interviews. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pp. 307–310 (2015)
Eichstaedt, J.C., et al.: Facebook language predicts depression in medical records. Proc. Natl. Acad. Sci. 115(44), 11203–11208 (2018)
Havigerová, J.M., Haviger, J., Kučera, D., Hoffmannová, P.: Text-based detection of the risk of depression. Front. Psychol. 10, 513 (2019)
Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33, 159–174 (1977)
Lin, C., et al.: SenseMood: depression detection on social media. In: Proceedings of the 2020 International Conference on Multimedia Retrieval, pp. 407–411 (2020)
Losada, D.E., Crestani, F., Parapar, J.: eRISK 2017: CLEF lab on early risk prediction on the internet: experimental foundations. In: Jones, G.J.F., et al. (eds.) CLEF 2017. LNCS, vol. 10456, pp. 346–360. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65813-1_30
Morales, M.R., Levitan, R.: Speech vs. text: a comparative analysis of features for depression detection systems. In: 2016 IEEE Spoken Language Technology Workshop (SLT), pp. 136–143. IEEE (2016)
Nasir, M., Jati, A., Shivakumar, P.G., Nallan Chakravarthula, S., Georgiou, P.: Multimodal and multiresolution depression detection from speech and facial landmark features. In: Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge, pp. 43–50 (2016)
Nguyen, T., Phung, D., Dao, B., Venkatesh, S., Berk, M.: Affective and content analysis of online depression communities. IEEE Trans. Affect. Comput. 5(3), 217–226 (2014)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Pedregosa, F., Varoquaux, G., Gramfort, A., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Pirina, I., Çöltekin, Ç.: Identifying depression on Reddit: the effect of training data. In: Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task, Brussels, Belgium, pp. 9–12. Association for Computational Linguistics, October 2018. https://doi.org/10.18653/v1/W18-5903, https://aclanthology.org/W18-5903
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Reece, A.G., Danforth, C.M.: Instagram photos reveal predictive markers of depression. EPJ Data Sci. 6, 1–12 (2017)
Reece, A.G., Reagan, A.J., Lix, K.L., Dodds, P.S., Danforth, C.M., Langer, E.J.: Forecasting the onset and course of mental illness with Twitter data. Sci. Rep. 7(1), 1–11 (2017)
Stankevich, M., Latyshev, A., Kuminskaya, E., Smirnov, I., Grigoriev, O.: Depression detection from social media texts. In: Data Analytics and Management in Data Intensive Domains: XXI International Conference DAMDID/RDCL 2019, p. 352 (2019)
Tadesse, M.M., Lin, H., Xu, B., Yang, L.: Detection of depression-related posts in Reddit social media forum. IEEE Access 7, 44883–44893 (2019). https://doi.org/10.1109/ACCESS.2019.2909180
Tsugawa, S., Kikuchi, Y., Kishino, F., Nakajima, K., Itoh, Y., Ohsaki, H.: Recognizing depression from Twitter activity. In: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pp. 3187–3196 (2015)
Tyshchenko, Y.: Depression and anxiety detection from blog posts data. Nature Precis. Sci., Institute of Computer Science, University of Tartu, Tartu, Estonia (2018)
Wolohan, J., Hiraga, M., Mukherjee, A., Sayyed, Z.A., Millard, M.: Detecting linguistic traces of depression in topic-restricted text: attending to self-stigmatized depression with NLP. In: Proceedings of the 1st International Workshop on Language Cognition and Computational Models, pp. 11–21 (2018)
Yao, H., Rashidian, S., Dong, X., Duanmu, H., Rosenthal, R.N., Wang, F.: Detection of suicidality among opioid users on Reddit: machine learning-based approach. J. Med. Internet Res. 22(11), e15293 (2020)
Acknowledgements
We would like to thank the Department of Science and Technology - Science and Engineering Research Board (DST-SERB) for providing funds to annotate the collected data.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 IFIP International Federation for Information Processing
About this paper
Cite this paper
Sampath, K., Durairaj, T. (2022). Data Set Creation and Empirical Analysis for Detecting Signs of Depression from Social Media Postings. In: Kalinathan, L., R., P., Kanmani, M., S., M. (eds) Computational Intelligence in Data Science. ICCIDS 2022. IFIP Advances in Information and Communication Technology, vol 654. Springer, Cham. https://doi.org/10.1007/978-3-031-16364-7_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-16364-7_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16363-0
Online ISBN: 978-3-031-16364-7
eBook Packages: Computer ScienceComputer Science (R0)