Skip to main content
Log in

Depression and anorexia detection in social media as a one-class classification problem

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Taking advantage of the increasing amount of user-generated content in social media, some computational methods have already been proposed for detecting people suffering from depression and anorexia. Such complex tasks have been tackled as a binary classification problem using, in most cases, automatically generated training data. Despite its promising results, this approach has some important drawbacks, namely: it suffers from a severely skewed class distribution, the negative class is very diverse since it attempts to model all kinds of healthy users, and, above all, there is not a complete certainty about annotations, especially for the negative cases (i.e., healthy users). Motivated by these issues, in this paper, we propose to face the detection of these disorders following a one-class classification (OCC) approach. Particularly, we introduce two new instance-based OCC methods especially suited to manage the high diversity of content from social media documents. Taking up ideas from the gravitational attraction force, these methods evaluate the relation of documents by their strengths, considering their distances as well as their masses (relevance) with respect to the target task. Experiments were conducted on depression and anorexia benchmark datasets. The obtained results are encouraging; the overall performance was better than the results from other standard OCC methods, and competitive with regard to state-of-the-art results from binary classification approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data Availability

The data exploited for experimental purposes have not been collected by the authors of this manuscript. We kindly refer to the original owners for obtaining them.

Notes

  1. Groups who defend eating disorders as a lifestyle, often denoted as proana.

  2. http://clpsych.org/

  3. https://early.irlab.org/

  4. This component of Formula (1) can be computed using other measures such as the Euclidean distance. Indeed, we carried out experiments with different distance measures, obtaining a better performance when the cosine distance was used.

  5. We define a personal phrase as a sentence that contains a singular first-person pronoun.

  6. For example, the words listed in: https://github.com/first20hours/google-10000-english/blob/master/google-10000-english.txt

  7. The data can be obtained upon request. More information can be found in https://early.irlab.org/

  8. https://my.clevelandclinic.org/health/articles/9285-depression-glossary-of-depression-related-terms

  9. https://urbanthesaurus.org/synonyms/anorexia

  10. https://scikit-learn.org/stable/modules/generated/sklearn.svm.OneClassSVM.html

  11. The DB lexicon was discarded because it is computed from the training set, and its quality highly depends on the amount of training instances.

References

  1. Agarwal S, Sureka A (2015) Using KNN and SVM based one-class classifier for detecting online radicalization on Twitter. In: Proceedings of the 11th international conference on distributed computing and internet technology - volume 8956, ICDCIT 2015. Springer, Berlin, pp 431–442

  2. Aguilera J, González LC, Montes-y-Gómez M, Rosso P (2019) A new weighted k-nearest neighbor algorithm based on Newton’s gravitational force. In: Vera-Rodriguez R, Fierrez J, Morales A (eds) Progress in pattern recognition, image analysis, computer vision, and applications. Springer International Publishing, Cham, pp 305–313

  3. Aguilera J, González LC, Montes-y-Gomeź, M. López R, Escalante HJ (2020) From Neighbors to Strengths - The k-Strongest Strengths (kSS) Classification Algorithm. Pattern Recognition Letters 136:301–308

  4. Alam S, Sonbhadra SK, Agarwal S, Nagabhushan P (2020) One-class support vector classifiers: a survey. Knowl-Based Syst 196:105754

    Article  Google Scholar 

  5. Aragón ME, López-Monroy AP, González-Gurrola LC, Montes-y-Gómez M (2019) Detecting depression in social media using fine-grained emotions. In: Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers). Association for Computational Linguistics, Minneapolis, pp 1481–1486

  6. Benavoli A, Mangili F, Corani G, Zaffalon M, Ruggeri F (2014) A Bayesian Wilcoxon signed-rank test based on the Dirichlet process. In: Proceedings of the 31st international conference on international conference on machine learning - volume 32, ICML’14, pp II–1026–II–1034. JMLR.org

  7. Birnbaum ML, Ernala SK, Rizvi AF, De Choudhury M, Kane JM (2017) A collaborative approach to identifying social media markers of schizophrenia by employing machine learning and clinical appraisals. J Med Internet Res 19(8):e289

    Article  Google Scholar 

  8. Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguis 5:135–146

    Article  Google Scholar 

  9. Burdisso SG, Errecalde M, Gómez MM (2019) A text classification framework for simple and effective early depression detection over social media streams. Expert Syst Appl 133:182– 197

    Article  Google Scholar 

  10. Cabral GG, De Oliveira ALI (2014) One-class classification for heart disease diagnosis. In: 2014 IEEE International conference on systems, man, and cybernetics (SMC), pp 2551– 2556

  11. Calvo RA, Milne DN, Hussain MS, Christensen H (2017) Natural language processing in mental health applications using non-clinical texts. Nat Lang Eng 23(5):649–685

    Article  Google Scholar 

  12. Chancellor S, De Choudhury M (2020) Methods in predictive techniques for mental health status on social media: a critical review. npj Digit Med 3(1):43

    Article  Google Scholar 

  13. Chen X, Sykora MD, Jackson TW, Elayan S (2018) What about mood swings: identifying depression on Twitter with temporal measures of emotions. In: Companion proceedings of the web conference 2018, WWW ’18, pp 1653–1660

  14. Coppersmith G, Dredze M, Harman C (2014) Quantifying mental health signals in Twitter. In: Proceedings of the workshop on computational linguistics and clinical psychology: from linguistic signal to clinical reality, pp 51–60

  15. De Choudhury M (2015) Anorexia on Tumblr: a characterization study. In: Proceedings of the 5th international conference on digital health 2015. Association for Computing Machinery, New York, pp 43–50

  16. De Choudhury M, Counts S, Horvitz E (2013) Social media as a measurement tool of depression in populations. In: Proceedings of the 5th annual ACM web science conference. Association for Computing Machinery, New York, pp 47–56

  17. Guntuku SC, Yaden DB, Kern ML, Ungar LH, Eichstaedt JC (2017) Detecting depression and mental illness on social media: an integrative review. Curr Opin Behav Sci 18:43–49

    Article  Google Scholar 

  18. Hussain J, Satti FA, Afzal M, Khan WA, Bilal HSM, Ansaar MZ, Ahmad HF, Hur T, Bang J, Kim J, Park GH, Seung H, Lee S (2020) Exploring the Dominant Features of Social Media for Depression Detection. J Inf Sci 46(6):739–759

    Article  Google Scholar 

  19. Husseini Orabi A, Buddhitha P, Husseini Orabi M, Inkpen D (2018) Deep learning for depression detection of Twitter users. In: Proceedings of the fifth workshop on computational linguistics and clinical psychology: from keyboard to clinic. Association for Computational Linguistics, New Orleans, pp 88–97

  20. Irigoien I, Sierra B, Arenas C (2014) Towards application of one-class classification methods to medical data. Sci World J 2014:730712

    Article  Google Scholar 

  21. Islam MR, Kabir MA, Ahmed A, Kamal ARM, Wang H, Ulhaq A (2018) Depression detection from social network data using machine learning techniques. Health Inform Sci Syst 6(1):8

    Article  Google Scholar 

  22. Itani S, Lecron F, Fortemps P (2020) A one-class classification decision tree based on kernel density estimation. Appl Soft Comput 91:106250

    Article  Google Scholar 

  23. Joffe E, Pettigrew EJ, Herskovic JR, Bearden CF, Bernstam EV (2015) Expert guided natural language processing using one-class classification. J Am Med Inform Assoc 22(5):962–966

    Article  Google Scholar 

  24. Khan SS, Ahmad A (2018) Relationship between variants of one-class nearest neighbors and creating their accurate ensembles. IEEE Trans Knowl Data Eng 30(09):1796–1809

    Article  Google Scholar 

  25. Khan SS, Madden MG (2014) One-class classification: taxonomy of study and review of techniques. Knowl Eng Rev 29(3):345–374

    Article  Google Scholar 

  26. Kim J, Lee J, Park E, Han J (2020) A deep learning model for detecting mental illness from user content on social media. Sci Rep 10(1):11846

    Article  Google Scholar 

  27. Koppel M, Schler J (2004) Authorship verification as a one-class classification problem. In: Proceedings of the twenty-first international conference on machine learning. Association for Computing Machinery, New York, p 62

  28. Li A, Jiao D, Zhu T (2018) Detecting depression stigma on social media: a linguistic analysis. J Affect Disord 232:358–362

    Article  Google Scholar 

  29. Losada DE, Crestani F (2016) A test collection for research on depression and language use. In: Conference labs of the evaluation forum. Springer, pp 28–39

  30. Losada DE, Crestani F, Parapar J (2017) eRISK 2017: CLEF lab on early risk prediction on the internet: experimental foundations. In: Experimental IR meets multilinguality, multimodality, and interaction - proceedings of the 8th international conference of the CLEF association, pp 346–360

  31. Losada DE, Crestani F, Parapar J (2018) Overview of eRisk – early risk prediction on the internet. In: Experimental IR meets multilinguality, multimodality, and interaction. Proceedings of the ninth international conference of the CLEF Association. Avignon

  32. Losada DE, Crestani F, Parapar J (2019) Overview of eRisk 2019. Early risk prediction on the internet. In: 10th International conference of the CLEF association. Springer, pp 340–357

  33. Manevitz LM, Yousef M (2002) One-class SVMs for document classification. J Mach Learn Res 2:139–154

    MATH  Google Scholar 

  34. Martínez-Castaño R, Pichel JC, Losada DE (2020) A big data platform for real time analysis of signs of depression in social media. Int J Environ Res Public Health 17(13):4752

    Article  Google Scholar 

  35. Mazhelis O (2006) One-class classifiers: a review and analysis of suitability in the context of mobile-masquerader detection. South African Comput J 36:29–48

    Google Scholar 

  36. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: Bengio Y, LeCun Y (eds) 1st International conference on learning representations, ICLR 2013. Workshop Track Proceedings

  37. Mirończuk MM, Protasiewicz J (2018) A recent overview of the state-of-the-art elements of text classification. Expert Syst Appl 106:36–54

    Article  Google Scholar 

  38. Mohammadi E, Amini H, Kosseim L (2019) Quick and (maybe not so) easy detection of anorexia in social media posts. In: Working notes of CLEF 2019 - conference and labs of the evaluation forum. Lugano

  39. Mounika N, Vaijayanthi P (2017) Analysis of algorithms for one class classification of heart disease identification. In: 2017 2nd International conference on communication and electronics systems (ICCES), pp 907–912

  40. Norris ML, Boydell KM, Pinhas L, Katzman DK (2006) Ana and the internet: a review of pro-anorexia websites. Int J Eating Disorders 39(6):443–447

    Article  Google Scholar 

  41. Ortega-Mendoza RM, López-Monroy AP, Franco-Arcega A, Montes-y-Gómez M (2018) Emphasizing personal information for author profiling: new approaches for term selection and weighting. Knowl-Based Syst 145:169–181

    Article  Google Scholar 

  42. Park M, McDonald D, Cha M (2013) Perception differences between the depressed and non-depressed users in Twitter. In: Proceedings of the 7th international conference on weblogs and social media (ICWSM 2013), pp 476– 485

  43. Pennington J, Socher R, Manning C (2014) GloVe: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), Doha, pp 1532–1543

  44. Ranganathan AAH, Thenmozhi D, Aravindan C (2019) Early detection of anorexia using RNN-LSTM and SVM classifiers. In: Working notes of CLEF 2019 - conference and labs of the evaluation forum, Lugano

  45. Schölkopf B, Platt JC, Shawe-Taylor JC, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471

    Article  Google Scholar 

  46. Shen JH, Rudzicz F (2017) Detecting anxiety through Reddit. In: Proceedings of the fourth workshop on computational linguistics and clinical psychology — from linguistic signal to clinical reality, Vancouver, pp 58–65

  47. Spinczyk D, Nabrdalik K, Rojewska K (2018) Computer aided sentiment analysis of anorexia nervosa patients’ vocabulary. BioMedical Engineering OnLine, 17

  48. Strous R, Koppel M, Fine J, Nachliel S, Shaked G, Zivotofsky A (2009) Automated characterization and identification of schizophrenia in writing. J Nervous Mental Disease 197:585–8

    Article  Google Scholar 

  49. Swan N, Schmidt U, Tchanturia K (2012) An experimental investigation of verbal expression of emotion in anorexia and bulimia nervosa. European eating disorders review: The journal of the Eating Disorders Association, 20

  50. Tahir B, Amjad K, Firdous S, Mehmood MA (2018) Public health surveillance system for online social networks using one-class text classification. In: 2018 6th international conference on control engineering information technology (CEIT), pp 1–6

  51. Tausczik YR, Pennebaker JW (2010) The psychological meaning of words: LIWC and computerized text analysis methods. J Lang Soc Psychol 29(1):24–54

    Article  Google Scholar 

  52. Trotzek M, Koitka S, Friedrich C (2018) Word embeddings and linguistic metadata at the CLEF 2018 tasks for early detection of depression and anorexia. In: Experimental IR meets multilinguality, multimodality, and interaction. Proceedings of the ninth international conference of the CLEF association (CLEF 2018), Avignon

  53. Wang T, Brede M, Ianni A, Mentzakis E (2017) Detecting and characterizing eating-disorder communities on social media. In: Proceedings of the tenth ACM international conference on web search and data mining, WSDM ’17. Association for Computing Machinery, New York, pp 91–100

  54. Wang YT, Huang HH, Chen HH (2018) A neural network approach to early risk detection of depression and anorexia on social media text. CEUR Workshop Proceedings, p 2125

  55. Wolf M, Theis F, Kordy H (2013) Language use in eating disorder blogs: psychological implications of social online activity. J Lang Soc Psychol 32(2):212–226

    Article  Google Scholar 

  56. Yan H, Fitzsimmons-Craft EE, Goodman M, Krauss M, Das S, Cavazos-Rehg P (2019) Automatic detection of eating disorder-related social media posts that could benefit from a mental health intervention. International Journal of Eating Disorders (July), 1–7

  57. Zhang Y, Zhang B, Coenen F, Xiao J, Lu W (2014) One-class kernel subspace ensemble for medical image classification. EURASIP J Adv Signal Process 2014(1):17

    Article  Google Scholar 

Download references

Funding

This research was partially supported by CONACYT: project grant FC-2016-2410, postdoctoral fellowship CVU-174410, and graduate scholarship CVU-814295.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rosa María Ortega-Mendoza.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Code Availability

Most of the code used in the experimental phase was developed by the authors.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aguilera, J., Farías, D.I.H., Ortega-Mendoza, R.M. et al. Depression and anorexia detection in social media as a one-class classification problem. Appl Intell 51, 6088–6103 (2021). https://doi.org/10.1007/s10489-020-02131-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-020-02131-2

Keywords

Navigation