Skip to main content

Data Set Creation and Empirical Analysis for Detecting Signs of Depression from Social Media Postings

  • Conference paper
  • First Online:
Computational Intelligence in Data Science (ICCIDS 2022)

Abstract

Depression is a common mental illness that has to be detected and treated at an early stage to avoid serious consequences. There are many methods and modalities for detecting depression that involves physical examination of the individual. However, diagnosing mental health using their social media data is more effective as it avoids such physical examinations. Also, people express their emotions well in social media, it is desirable to diagnose their mental health using social media data. Though there are many existing systems that detects mental illness of a person by analysing their social media data, detecting the level of depression is also important for further treatment. Thus, in this research, we developed a gold standard data set that detects the levels of depression as ‘not depressed’, ‘moderately depressed’ and ‘severely depressed’ from the social media postings. Traditional learning algorithms were employed on this data set and an empirical analysis was presented in this paper. Data augmentation technique was applied to overcome the data imbalance. Among the several variations that are implemented, the model with Word2Vec vectorizer and Random Forest classifier on augmented data outperforms the other variations with a score of 0.877 for both accuracy and F1 measure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.healthline.com/health/depression/effects-on-body.

  2. 2.

    https://www.reddit.com.

  3. 3.

    https://en.wikipedia.org/wiki/Interrater_reliability.

References

  1. American Psychiatric Association. https://www.psychiatry.org/patients-families/depression/what-is-depression. Accessed 17 Nov 2021

  2. Healthline. https://www.healthline.com/health/depression/mild-depression. Accessed 17 Nov 2021

  3. Institute of Health Metrics and Evaluation. Global Health Data Exchange (GHDx). http://ghdx.healthdata.org/gbd-results-tool?params=gbd-api-2019-permalink/d780dffbe8a381b25e1416884959e88b. Accessed 17 Nov 2021

  4. Statista statistics. https://www.statista.com/statistics/278414/number-of-worldwide-social-network-users/. Accessed 17 Nov 2021

  5. Al Hanai, T., Ghassemi, M.M., Glass, J.R.: Detecting depression with audio/text sequence modeling of interviews. In: Interspeech, pp. 1716–1720 (2018)

    Google Scholar 

  6. Alghowinem, S., et al.: Multimodal depression detection: fusion analysis of paralinguistic, head pose and eye gaze behaviors. IEEE Trans. Affect. Comput. 9(4), 478–490 (2016)

    Article  Google Scholar 

  7. Artstein, R., Poesio, M.: Inter-coder agreement for computational linguistics. Comput. Linguist. 34(4), 555–596 (2008)

    Article  Google Scholar 

  8. Boettcher, N., et al.: Studies of depression and anxiety using reddit as a data source: scoping review. JMIR Ment. Health 8(11), e29487 (2021)

    Article  Google Scholar 

  9. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  Google Scholar 

  10. Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Measur. 20(1), 37–46 (1960)

    Article  Google Scholar 

  11. Deshpande, M., Rao, V.: Depression detection using emotion artificial intelligence. In: 2017 International Conference on Intelligent Sustainable Systems (ICISS), pp. 858–862. IEEE (2017)

    Google Scholar 

  12. Dibeklioğlu, H., Hammal, Z., Yang, Y., Cohn, J.F.: Multimodal detection of depression in clinical interviews. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pp. 307–310 (2015)

    Google Scholar 

  13. Eichstaedt, J.C., et al.: Facebook language predicts depression in medical records. Proc. Natl. Acad. Sci. 115(44), 11203–11208 (2018)

    Article  Google Scholar 

  14. Havigerová, J.M., Haviger, J., Kučera, D., Hoffmannová, P.: Text-based detection of the risk of depression. Front. Psychol. 10, 513 (2019)

    Article  Google Scholar 

  15. Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33, 159–174 (1977)

    Article  Google Scholar 

  16. Lin, C., et al.: SenseMood: depression detection on social media. In: Proceedings of the 2020 International Conference on Multimedia Retrieval, pp. 407–411 (2020)

    Google Scholar 

  17. Losada, D.E., Crestani, F., Parapar, J.: eRISK 2017: CLEF lab on early risk prediction on the internet: experimental foundations. In: Jones, G.J.F., et al. (eds.) CLEF 2017. LNCS, vol. 10456, pp. 346–360. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65813-1_30

    Chapter  Google Scholar 

  18. Morales, M.R., Levitan, R.: Speech vs. text: a comparative analysis of features for depression detection systems. In: 2016 IEEE Spoken Language Technology Workshop (SLT), pp. 136–143. IEEE (2016)

    Google Scholar 

  19. Nasir, M., Jati, A., Shivakumar, P.G., Nallan Chakravarthula, S., Georgiou, P.: Multimodal and multiresolution depression detection from speech and facial landmark features. In: Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge, pp. 43–50 (2016)

    Google Scholar 

  20. Nguyen, T., Phung, D., Dao, B., Venkatesh, S., Berk, M.: Affective and content analysis of online depression communities. IEEE Trans. Affect. Comput. 5(3), 217–226 (2014)

    Article  Google Scholar 

  21. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  22. Pedregosa, F., Varoquaux, G., Gramfort, A., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  23. Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)

    Google Scholar 

  24. Pirina, I., Çöltekin, Ç.: Identifying depression on Reddit: the effect of training data. In: Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task, Brussels, Belgium, pp. 9–12. Association for Computational Linguistics, October 2018. https://doi.org/10.18653/v1/W18-5903, https://aclanthology.org/W18-5903

  25. Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    Article  Google Scholar 

  26. Reece, A.G., Danforth, C.M.: Instagram photos reveal predictive markers of depression. EPJ Data Sci. 6, 1–12 (2017)

    Google Scholar 

  27. Reece, A.G., Reagan, A.J., Lix, K.L., Dodds, P.S., Danforth, C.M., Langer, E.J.: Forecasting the onset and course of mental illness with Twitter data. Sci. Rep. 7(1), 1–11 (2017)

    Article  Google Scholar 

  28. Stankevich, M., Latyshev, A., Kuminskaya, E., Smirnov, I., Grigoriev, O.: Depression detection from social media texts. In: Data Analytics and Management in Data Intensive Domains: XXI International Conference DAMDID/RDCL 2019, p. 352 (2019)

    Google Scholar 

  29. Tadesse, M.M., Lin, H., Xu, B., Yang, L.: Detection of depression-related posts in Reddit social media forum. IEEE Access 7, 44883–44893 (2019). https://doi.org/10.1109/ACCESS.2019.2909180

    Article  Google Scholar 

  30. Tsugawa, S., Kikuchi, Y., Kishino, F., Nakajima, K., Itoh, Y., Ohsaki, H.: Recognizing depression from Twitter activity. In: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pp. 3187–3196 (2015)

    Google Scholar 

  31. Tyshchenko, Y.: Depression and anxiety detection from blog posts data. Nature Precis. Sci., Institute of Computer Science, University of Tartu, Tartu, Estonia (2018)

    Google Scholar 

  32. Wolohan, J., Hiraga, M., Mukherjee, A., Sayyed, Z.A., Millard, M.: Detecting linguistic traces of depression in topic-restricted text: attending to self-stigmatized depression with NLP. In: Proceedings of the 1st International Workshop on Language Cognition and Computational Models, pp. 11–21 (2018)

    Google Scholar 

  33. Yao, H., Rashidian, S., Dong, X., Duanmu, H., Rosenthal, R.N., Wang, F.: Detection of suicidality among opioid users on Reddit: machine learning-based approach. J. Med. Internet Res. 22(11), e15293 (2020)

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank the Department of Science and Technology - Science and Engineering Research Board (DST-SERB) for providing funds to annotate the collected data.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kayalvizhi Sampath .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 IFIP International Federation for Information Processing

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sampath, K., Durairaj, T. (2022). Data Set Creation and Empirical Analysis for Detecting Signs of Depression from Social Media Postings. In: Kalinathan, L., R., P., Kanmani, M., S., M. (eds) Computational Intelligence in Data Science. ICCIDS 2022. IFIP Advances in Information and Communication Technology, vol 654. Springer, Cham. https://doi.org/10.1007/978-3-031-16364-7_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-16364-7_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-16363-0

  • Online ISBN: 978-3-031-16364-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics