Abstract
Taking advantage of the increasing amount of user-generated content in social media, some computational methods have already been proposed for detecting people suffering from depression and anorexia. Such complex tasks have been tackled as a binary classification problem using, in most cases, automatically generated training data. Despite its promising results, this approach has some important drawbacks, namely: it suffers from a severely skewed class distribution, the negative class is very diverse since it attempts to model all kinds of healthy users, and, above all, there is not a complete certainty about annotations, especially for the negative cases (i.e., healthy users). Motivated by these issues, in this paper, we propose to face the detection of these disorders following a one-class classification (OCC) approach. Particularly, we introduce two new instance-based OCC methods especially suited to manage the high diversity of content from social media documents. Taking up ideas from the gravitational attraction force, these methods evaluate the relation of documents by their strengths, considering their distances as well as their masses (relevance) with respect to the target task. Experiments were conducted on depression and anorexia benchmark datasets. The obtained results are encouraging; the overall performance was better than the results from other standard OCC methods, and competitive with regard to state-of-the-art results from binary classification approaches.
Similar content being viewed by others
Data Availability
The data exploited for experimental purposes have not been collected by the authors of this manuscript. We kindly refer to the original owners for obtaining them.
Notes
Groups who defend eating disorders as a lifestyle, often denoted as proana.
This component of Formula (1) can be computed using other measures such as the Euclidean distance. Indeed, we carried out experiments with different distance measures, obtaining a better performance when the cosine distance was used.
We define a personal phrase as a sentence that contains a singular first-person pronoun.
For example, the words listed in: https://github.com/first20hours/google-10000-english/blob/master/google-10000-english.txt
The data can be obtained upon request. More information can be found in https://early.irlab.org/
The DB lexicon was discarded because it is computed from the training set, and its quality highly depends on the amount of training instances.
References
Agarwal S, Sureka A (2015) Using KNN and SVM based one-class classifier for detecting online radicalization on Twitter. In: Proceedings of the 11th international conference on distributed computing and internet technology - volume 8956, ICDCIT 2015. Springer, Berlin, pp 431–442
Aguilera J, González LC, Montes-y-Gómez M, Rosso P (2019) A new weighted k-nearest neighbor algorithm based on Newton’s gravitational force. In: Vera-Rodriguez R, Fierrez J, Morales A (eds) Progress in pattern recognition, image analysis, computer vision, and applications. Springer International Publishing, Cham, pp 305–313
Aguilera J, González LC, Montes-y-Gomeź, M. López R, Escalante HJ (2020) From Neighbors to Strengths - The k-Strongest Strengths (kSS) Classification Algorithm. Pattern Recognition Letters 136:301–308
Alam S, Sonbhadra SK, Agarwal S, Nagabhushan P (2020) One-class support vector classifiers: a survey. Knowl-Based Syst 196:105754
Aragón ME, López-Monroy AP, González-Gurrola LC, Montes-y-Gómez M (2019) Detecting depression in social media using fine-grained emotions. In: Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers). Association for Computational Linguistics, Minneapolis, pp 1481–1486
Benavoli A, Mangili F, Corani G, Zaffalon M, Ruggeri F (2014) A Bayesian Wilcoxon signed-rank test based on the Dirichlet process. In: Proceedings of the 31st international conference on international conference on machine learning - volume 32, ICML’14, pp II–1026–II–1034. JMLR.org
Birnbaum ML, Ernala SK, Rizvi AF, De Choudhury M, Kane JM (2017) A collaborative approach to identifying social media markers of schizophrenia by employing machine learning and clinical appraisals. J Med Internet Res 19(8):e289
Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguis 5:135–146
Burdisso SG, Errecalde M, Gómez MM (2019) A text classification framework for simple and effective early depression detection over social media streams. Expert Syst Appl 133:182– 197
Cabral GG, De Oliveira ALI (2014) One-class classification for heart disease diagnosis. In: 2014 IEEE International conference on systems, man, and cybernetics (SMC), pp 2551– 2556
Calvo RA, Milne DN, Hussain MS, Christensen H (2017) Natural language processing in mental health applications using non-clinical texts. Nat Lang Eng 23(5):649–685
Chancellor S, De Choudhury M (2020) Methods in predictive techniques for mental health status on social media: a critical review. npj Digit Med 3(1):43
Chen X, Sykora MD, Jackson TW, Elayan S (2018) What about mood swings: identifying depression on Twitter with temporal measures of emotions. In: Companion proceedings of the web conference 2018, WWW ’18, pp 1653–1660
Coppersmith G, Dredze M, Harman C (2014) Quantifying mental health signals in Twitter. In: Proceedings of the workshop on computational linguistics and clinical psychology: from linguistic signal to clinical reality, pp 51–60
De Choudhury M (2015) Anorexia on Tumblr: a characterization study. In: Proceedings of the 5th international conference on digital health 2015. Association for Computing Machinery, New York, pp 43–50
De Choudhury M, Counts S, Horvitz E (2013) Social media as a measurement tool of depression in populations. In: Proceedings of the 5th annual ACM web science conference. Association for Computing Machinery, New York, pp 47–56
Guntuku SC, Yaden DB, Kern ML, Ungar LH, Eichstaedt JC (2017) Detecting depression and mental illness on social media: an integrative review. Curr Opin Behav Sci 18:43–49
Hussain J, Satti FA, Afzal M, Khan WA, Bilal HSM, Ansaar MZ, Ahmad HF, Hur T, Bang J, Kim J, Park GH, Seung H, Lee S (2020) Exploring the Dominant Features of Social Media for Depression Detection. J Inf Sci 46(6):739–759
Husseini Orabi A, Buddhitha P, Husseini Orabi M, Inkpen D (2018) Deep learning for depression detection of Twitter users. In: Proceedings of the fifth workshop on computational linguistics and clinical psychology: from keyboard to clinic. Association for Computational Linguistics, New Orleans, pp 88–97
Irigoien I, Sierra B, Arenas C (2014) Towards application of one-class classification methods to medical data. Sci World J 2014:730712
Islam MR, Kabir MA, Ahmed A, Kamal ARM, Wang H, Ulhaq A (2018) Depression detection from social network data using machine learning techniques. Health Inform Sci Syst 6(1):8
Itani S, Lecron F, Fortemps P (2020) A one-class classification decision tree based on kernel density estimation. Appl Soft Comput 91:106250
Joffe E, Pettigrew EJ, Herskovic JR, Bearden CF, Bernstam EV (2015) Expert guided natural language processing using one-class classification. J Am Med Inform Assoc 22(5):962–966
Khan SS, Ahmad A (2018) Relationship between variants of one-class nearest neighbors and creating their accurate ensembles. IEEE Trans Knowl Data Eng 30(09):1796–1809
Khan SS, Madden MG (2014) One-class classification: taxonomy of study and review of techniques. Knowl Eng Rev 29(3):345–374
Kim J, Lee J, Park E, Han J (2020) A deep learning model for detecting mental illness from user content on social media. Sci Rep 10(1):11846
Koppel M, Schler J (2004) Authorship verification as a one-class classification problem. In: Proceedings of the twenty-first international conference on machine learning. Association for Computing Machinery, New York, p 62
Li A, Jiao D, Zhu T (2018) Detecting depression stigma on social media: a linguistic analysis. J Affect Disord 232:358–362
Losada DE, Crestani F (2016) A test collection for research on depression and language use. In: Conference labs of the evaluation forum. Springer, pp 28–39
Losada DE, Crestani F, Parapar J (2017) eRISK 2017: CLEF lab on early risk prediction on the internet: experimental foundations. In: Experimental IR meets multilinguality, multimodality, and interaction - proceedings of the 8th international conference of the CLEF association, pp 346–360
Losada DE, Crestani F, Parapar J (2018) Overview of eRisk – early risk prediction on the internet. In: Experimental IR meets multilinguality, multimodality, and interaction. Proceedings of the ninth international conference of the CLEF Association. Avignon
Losada DE, Crestani F, Parapar J (2019) Overview of eRisk 2019. Early risk prediction on the internet. In: 10th International conference of the CLEF association. Springer, pp 340–357
Manevitz LM, Yousef M (2002) One-class SVMs for document classification. J Mach Learn Res 2:139–154
Martínez-Castaño R, Pichel JC, Losada DE (2020) A big data platform for real time analysis of signs of depression in social media. Int J Environ Res Public Health 17(13):4752
Mazhelis O (2006) One-class classifiers: a review and analysis of suitability in the context of mobile-masquerader detection. South African Comput J 36:29–48
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: Bengio Y, LeCun Y (eds) 1st International conference on learning representations, ICLR 2013. Workshop Track Proceedings
Mirończuk MM, Protasiewicz J (2018) A recent overview of the state-of-the-art elements of text classification. Expert Syst Appl 106:36–54
Mohammadi E, Amini H, Kosseim L (2019) Quick and (maybe not so) easy detection of anorexia in social media posts. In: Working notes of CLEF 2019 - conference and labs of the evaluation forum. Lugano
Mounika N, Vaijayanthi P (2017) Analysis of algorithms for one class classification of heart disease identification. In: 2017 2nd International conference on communication and electronics systems (ICCES), pp 907–912
Norris ML, Boydell KM, Pinhas L, Katzman DK (2006) Ana and the internet: a review of pro-anorexia websites. Int J Eating Disorders 39(6):443–447
Ortega-Mendoza RM, López-Monroy AP, Franco-Arcega A, Montes-y-Gómez M (2018) Emphasizing personal information for author profiling: new approaches for term selection and weighting. Knowl-Based Syst 145:169–181
Park M, McDonald D, Cha M (2013) Perception differences between the depressed and non-depressed users in Twitter. In: Proceedings of the 7th international conference on weblogs and social media (ICWSM 2013), pp 476– 485
Pennington J, Socher R, Manning C (2014) GloVe: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), Doha, pp 1532–1543
Ranganathan AAH, Thenmozhi D, Aravindan C (2019) Early detection of anorexia using RNN-LSTM and SVM classifiers. In: Working notes of CLEF 2019 - conference and labs of the evaluation forum, Lugano
Schölkopf B, Platt JC, Shawe-Taylor JC, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471
Shen JH, Rudzicz F (2017) Detecting anxiety through Reddit. In: Proceedings of the fourth workshop on computational linguistics and clinical psychology — from linguistic signal to clinical reality, Vancouver, pp 58–65
Spinczyk D, Nabrdalik K, Rojewska K (2018) Computer aided sentiment analysis of anorexia nervosa patients’ vocabulary. BioMedical Engineering OnLine, 17
Strous R, Koppel M, Fine J, Nachliel S, Shaked G, Zivotofsky A (2009) Automated characterization and identification of schizophrenia in writing. J Nervous Mental Disease 197:585–8
Swan N, Schmidt U, Tchanturia K (2012) An experimental investigation of verbal expression of emotion in anorexia and bulimia nervosa. European eating disorders review: The journal of the Eating Disorders Association, 20
Tahir B, Amjad K, Firdous S, Mehmood MA (2018) Public health surveillance system for online social networks using one-class text classification. In: 2018 6th international conference on control engineering information technology (CEIT), pp 1–6
Tausczik YR, Pennebaker JW (2010) The psychological meaning of words: LIWC and computerized text analysis methods. J Lang Soc Psychol 29(1):24–54
Trotzek M, Koitka S, Friedrich C (2018) Word embeddings and linguistic metadata at the CLEF 2018 tasks for early detection of depression and anorexia. In: Experimental IR meets multilinguality, multimodality, and interaction. Proceedings of the ninth international conference of the CLEF association (CLEF 2018), Avignon
Wang T, Brede M, Ianni A, Mentzakis E (2017) Detecting and characterizing eating-disorder communities on social media. In: Proceedings of the tenth ACM international conference on web search and data mining, WSDM ’17. Association for Computing Machinery, New York, pp 91–100
Wang YT, Huang HH, Chen HH (2018) A neural network approach to early risk detection of depression and anorexia on social media text. CEUR Workshop Proceedings, p 2125
Wolf M, Theis F, Kordy H (2013) Language use in eating disorder blogs: psychological implications of social online activity. J Lang Soc Psychol 32(2):212–226
Yan H, Fitzsimmons-Craft EE, Goodman M, Krauss M, Das S, Cavazos-Rehg P (2019) Automatic detection of eating disorder-related social media posts that could benefit from a mental health intervention. International Journal of Eating Disorders (July), 1–7
Zhang Y, Zhang B, Coenen F, Xiao J, Lu W (2014) One-class kernel subspace ensemble for medical image classification. EURASIP J Adv Signal Process 2014(1):17
Funding
This research was partially supported by CONACYT: project grant FC-2016-2410, postdoctoral fellowship CVU-174410, and graduate scholarship CVU-814295.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
The authors declare that they have no conflict of interest.
Additional information
Code Availability
Most of the code used in the experimental phase was developed by the authors.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Aguilera, J., Farías, D.I.H., Ortega-Mendoza, R.M. et al. Depression and anorexia detection in social media as a one-class classification problem. Appl Intell 51, 6088–6103 (2021). https://doi.org/10.1007/s10489-020-02131-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-020-02131-2