Abstract
We discuss and analyze the process of creating word embedding feature representations specifically designed for a learning task when annotated data is scarce, like depressive language detection from Tweets. We start from rich word embedding pre-trained from a general dataset, then enhance it with embedding learned from a domain specific but relatively much smaller dataset. Our strengthened representation portrays better the domain of depression we are interested in as it combines the semantics learned from the specific domain and word coverage from the general language. We present a comparative analyses of our word embedding representations with a simple bag-of-words model, a well known sentiment lexicon, a psycholinguistic lexicon, and a general pre-trained word embedding, based on their efficacy in accurately identifying depressive Tweets. We show that our representations achieve a significantly better F1 score than the others when applied to a high quality dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
References
Asgari, E., Mofrad, M.R.: Continuous distributed representation of biological sequences for deep proteomics and genomics. PLoS ONE 10(11), e0141287 (2015)
Bengio, S., Heigold, G.: Word embeddings for speech recognition. In: Fifteenth Annual Conference of the International Speech Communication Association (2014)
Boyd, J.H., Weissman, M.M., Thompson, W.D., Myers, J.K.: Screening for depression in a community sample: understanding the discrepancies between depression symptom and diagnostic scales. Arch. Gen. Psychiatry 39(10), 1195–1200 (1982)
Cheng, P.G.F., et al.: Psychologist in a pocket: lexicon development and content validation of a mobile-based app for depression screening. JMIR mHealth uHealth 4(3), e88 (2016)
Coppersmith, G., Dredze, M., Harman, C.: Quantifying mental health signals in Twitter. In: Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, pp. 51–60 (2014)
Coppersmith, G., Dredze, M., Harman, C., Hollingshead, K.: From ADHD to SAD: analyzing the language of mental health on Twitter through self-reported diagnoses. In: Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, pp. 1–10 (2015)
De Choudhury, M.: Role of social media in tackling challenges in mental health. In: Proceedings of the 2nd International Workshop on Socially-Aware Multimedia, pp. 49–52. ACM (2013)
De Choudhury, M., De, S.: Mental health discourse on reddit: self-disclosure, social support, and anonymity. In: Eighth International AAAI Conference on Weblogs and Social Media (2014)
De Choudhury, M., Gamon, M., Counts, S., Horvitz, E.: Predicting depression via social media. In: Seventh International AAAI Conference on Weblogs and Social Media, p. 2 (2013)
Dodds, P.S., Harris, K.D., Kloumann, I.M., Bliss, C.A., Danforth, C.M.: Temporal patterns of happiness and information in a global social network: hedonometrics and Twitter. PLoS ONE 6(12), e26752 (2011)
Faruqui, M., Dodge, J., Jauhar, S.K., Dyer, C., Hovy, E., Smith, N.A.: Retrofitting word vectors to semantic lexicons. In: Proceedings of NAACL (2015)
Hutto, C.J., Gilbert, E.: VADER: a parsimonious rule-based model for sentiment analysis of social media text. In: Eighth International AAAI Conference on Weblogs and Social Media (2014)
Godin, F., Vandersmissen, B., De Neve, W., Van de Walle, R.: Multimedia lab \(@\) ACL WNUT NER shared task: named entity recognition for Twitter microposts using distributed word representations. In: Proceedings of the Workshop on Noisy User-Generated Text, pp. 146–153 (2015)
Greenberg, P.E., Fournier, A.A., Sisitsky, T., Pike, C.T., Kessler, R.C.: The economic burden of adults with major depressive disorder in the United States (2005 and 2010). J. Clin. Psychiatry 76(2), 155–162 (2015)
Gustavson, K., Knudsen, A.K., Nesvåg, R., Knudsen, G.P., Vollset, S.E., Reichborn-Kjennerud, T.: Prevalence and stability of mental disorders among young adults: findings from a longitudinal study. BMC Psychiatry 18(1), 65 (2018)
Jamil, Z., Inkpen, D., Buddhitha, P., White, K.: Monitoring tweets for depression to detect at-risk users. In: Proceedings of the Fourth Workshop on Computational Linguistics and Clinical Psychology-From Linguistic Signal to Clinical Reality, pp. 32–40 (2017)
Kiritchenko, S., Zhu, X., Mohammad, S.M.: Sentiment analysis of short informal texts. J. Artif. Intell. Res. 50, 723–762 (2014)
Kuppens, P., Sheeber, L.B., Yap, M.B., Whittle, S., Simmons, J.G., Allen, N.B.: Emotional inertia prospectively predicts the onset of depressive disorder in adolescence. Emotion 12(2), 283 (2012)
Losada, D.E., Crestani, F.: A test collection for research on depression and language use. In: Fuhr, N., et al. (eds.) CLEF 2016. LNCS, vol. 9822, pp. 28–39. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44564-9_3
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Mikolov, T., Le, Q.V., Sutskever, I.: Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168 (2013)
Milne, D.N., Pink, G., Hachey, B., Calvo, R.A.: CLPsych 2016 shared task: triaging content in online peer-support forums. In: Proceedings of the Third Workshop on Computational Lingusitics and Clinical Psychology, pp. 118–127 (2016)
Mohammad, S.M., Turney, P.D.: NRC emotion lexicon. NRC Technical report (2013)
Moreno, M.A., et al.: Feeling bad on Facebook: depression disclosures by college students on a social networking site. Depress. Anxiety 28(6), 447–455 (2011)
Neuman, Y., Cohen, Y., Assaf, D., Kedma, G.: Proactive screening for depression through metaphorical and automatic text analysis. Artif. Intell. Med. 56(1), 19–25 (2012)
Nguyen, T., Phung, D., Dao, B., Venkatesh, S., Berk, M.: Affective and content analysis of online depression communities. IEEE Trans. Affect. Comput. 5(3), 217–226 (2014)
Nielsen, F.Ã….: A new ANEW: evaluation of a word list for sentiment analysis in microblogs. arXiv preprint arXiv:1103.2903 (2011)
Orabi, A.H., Buddhitha, P., Orabi, M.H., Inkpen, D.: Deep learning for depression detection of Twitter users. In: Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic, pp. 88–97 (2018)
Pennebaker, J., Mehl, M., Niederhoffer, K.: Psychological aspects of natural language use: our words, our selves. Annu. Rev. Psychol. 54, 547–577 (2003)
Reece, A.G., Reagan, A.J., Lix, K.L., Dodds, P.S., Danforth, C.M., Langer, E.J.: Forecasting the onset and course of mental illness with Twitter data. Sci. Rep. 7(1), 13006 (2017)
Resnik, P., Armstrong, W., Claudino, L., Nguyen, T., Nguyen, V.A., Boyd-Graber, J.: Beyond LDA: exploring supervised topic modeling for depression-related language in Twitter. In: Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, pp. 99–107 (2015)
Resnik, P., Garron, A., Resnik, R.: Using topic modeling to improve prediction of neuroticism and depression. In: Proceedings of the 2013 Conference on Empirical Methods in Natural, pp. 1348–1353. Association for Computational Linguistics (2013)
Schwartz, H.A., et al.: Towards assessing changes in degree of depression through Facebook. In: Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, pp. 118–125 (2014)
Shahraki, A.G., Zaïane, O.R.: Lexical and learning-based emotion mining from text. In: International Conference on Computational Linguistics and Intelligent Text Processing (CICLing) (2017)
Smith, S.L., Turban, D.H., Hamblin, S., Hammerla, N.Y.: Offline bilingual word vectors, orthogonal transformations and the inverted softmax. arXiv preprint arXiv:1702.03859 (2017)
Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., Qin, B.: Learning sentiment-specific word embedding for Twitter sentiment classification. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1555–1565 (2014)
Tausczik, Y.R., Pennebaker, J.W.: The psychological meaning of words: LIWC and computerized text analysis methods. J. Lang. Soc. Psychol. 29(1), 24–54 (2010)
Vioulès, M.J., Moulahi, B., Azé, J., Bringay, S.: Detection of suicide-related posts in Twitter data streams. IBM J. Res. Dev. 62(1), 7:1–7:12 (2018)
Yates, A., Cohan, A., Goharian, N.: Depression and self-harm risk assessment in online forums. arXiv preprint arXiv:1709.01848 (2017)
Yu, L.C., Wang, J., Lai, K.R., Zhang, X.: Refining word embeddings using intensity scores for sentiment analysis. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 26(3), 671–681 (2018)
Acknowledgements
We thank Natural Sciences and Engineering Research Council of Canada (NSERC) and Alberta Machine Intelligence Institute (AMII) for their generous support to pursue this research. We thank Prof. Greg Kondrak for his valuable advice and Bradley Hauer for his helpful suggestions. We also thank Roberto Vega and Shiva Zamani for their contribution in implementing standard text classification pipeline and initial baseline experiments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Farruque, N., Zaiane, O., Goebel, R. (2020). Augmenting Semantic Representation of Depressive Language: From Forums to Microblogs. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Lecture Notes in Computer Science(), vol 11908. Springer, Cham. https://doi.org/10.1007/978-3-030-46133-1_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-46133-1_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-46132-4
Online ISBN: 978-3-030-46133-1
eBook Packages: Computer ScienceComputer Science (R0)