Augmenting Semantic Representation of Depressive Language: From Forums to Microblogs

Farruque, Nawshad; Zaiane, Osmar; Goebel, Randy

doi:10.1007/978-3-030-46133-1_22

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11908))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

1974 Accesses
4 Citations
7 Altmetric

Abstract

We discuss and analyze the process of creating word embedding feature representations specifically designed for a learning task when annotated data is scarce, like depressive language detection from Tweets. We start from rich word embedding pre-trained from a general dataset, then enhance it with embedding learned from a domain specific but relatively much smaller dataset. Our strengthened representation portrays better the domain of depression we are interested in as it combines the semantics learned from the specific domain and word coverage from the general language. We present a comparative analyses of our word embedding representations with a simple bag-of-words model, a well known sentiment lexicon, a psycholinguistic lexicon, and a general pre-trained word embedding, based on their efficacy in accurately identifying depressive Tweets. We show that our representations achieve a significantly better F1 score than the others when applied to a high quality dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Asgari, E., Mofrad, M.R.: Continuous distributed representation of biological sequences for deep proteomics and genomics. PLoS ONE 10(11), e0141287 (2015)
Article Google Scholar
Bengio, S., Heigold, G.: Word embeddings for speech recognition. In: Fifteenth Annual Conference of the International Speech Communication Association (2014)
Google Scholar
Boyd, J.H., Weissman, M.M., Thompson, W.D., Myers, J.K.: Screening for depression in a community sample: understanding the discrepancies between depression symptom and diagnostic scales. Arch. Gen. Psychiatry 39(10), 1195–1200 (1982)
Article Google Scholar
Cheng, P.G.F., et al.: Psychologist in a pocket: lexicon development and content validation of a mobile-based app for depression screening. JMIR mHealth uHealth 4(3), e88 (2016)
Article Google Scholar
Coppersmith, G., Dredze, M., Harman, C.: Quantifying mental health signals in Twitter. In: Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, pp. 51–60 (2014)
Google Scholar
Coppersmith, G., Dredze, M., Harman, C., Hollingshead, K.: From ADHD to SAD: analyzing the language of mental health on Twitter through self-reported diagnoses. In: Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, pp. 1–10 (2015)
Google Scholar
De Choudhury, M.: Role of social media in tackling challenges in mental health. In: Proceedings of the 2nd International Workshop on Socially-Aware Multimedia, pp. 49–52. ACM (2013)
Google Scholar
De Choudhury, M., De, S.: Mental health discourse on reddit: self-disclosure, social support, and anonymity. In: Eighth International AAAI Conference on Weblogs and Social Media (2014)
Google Scholar
De Choudhury, M., Gamon, M., Counts, S., Horvitz, E.: Predicting depression via social media. In: Seventh International AAAI Conference on Weblogs and Social Media, p. 2 (2013)
Google Scholar
Dodds, P.S., Harris, K.D., Kloumann, I.M., Bliss, C.A., Danforth, C.M.: Temporal patterns of happiness and information in a global social network: hedonometrics and Twitter. PLoS ONE 6(12), e26752 (2011)
Article Google Scholar
Faruqui, M., Dodge, J., Jauhar, S.K., Dyer, C., Hovy, E., Smith, N.A.: Retrofitting word vectors to semantic lexicons. In: Proceedings of NAACL (2015)
Google Scholar
Hutto, C.J., Gilbert, E.: VADER: a parsimonious rule-based model for sentiment analysis of social media text. In: Eighth International AAAI Conference on Weblogs and Social Media (2014)
Google Scholar
Godin, F., Vandersmissen, B., De Neve, W., Van de Walle, R.: Multimedia lab \(@\) ACL WNUT NER shared task: named entity recognition for Twitter microposts using distributed word representations. In: Proceedings of the Workshop on Noisy User-Generated Text, pp. 146–153 (2015)
Google Scholar
Greenberg, P.E., Fournier, A.A., Sisitsky, T., Pike, C.T., Kessler, R.C.: The economic burden of adults with major depressive disorder in the United States (2005 and 2010). J. Clin. Psychiatry 76(2), 155–162 (2015)
Article Google Scholar
Gustavson, K., Knudsen, A.K., Nesvåg, R., Knudsen, G.P., Vollset, S.E., Reichborn-Kjennerud, T.: Prevalence and stability of mental disorders among young adults: findings from a longitudinal study. BMC Psychiatry 18(1), 65 (2018)
Article Google Scholar
Jamil, Z., Inkpen, D., Buddhitha, P., White, K.: Monitoring tweets for depression to detect at-risk users. In: Proceedings of the Fourth Workshop on Computational Linguistics and Clinical Psychology-From Linguistic Signal to Clinical Reality, pp. 32–40 (2017)
Google Scholar
Kiritchenko, S., Zhu, X., Mohammad, S.M.: Sentiment analysis of short informal texts. J. Artif. Intell. Res. 50, 723–762 (2014)
Article Google Scholar
Kuppens, P., Sheeber, L.B., Yap, M.B., Whittle, S., Simmons, J.G., Allen, N.B.: Emotional inertia prospectively predicts the onset of depressive disorder in adolescence. Emotion 12(2), 283 (2012)
Article Google Scholar
Losada, D.E., Crestani, F.: A test collection for research on depression and language use. In: Fuhr, N., et al. (eds.) CLEF 2016. LNCS, vol. 9822, pp. 28–39. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44564-9_3
Chapter Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Mikolov, T., Le, Q.V., Sutskever, I.: Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168 (2013)
Milne, D.N., Pink, G., Hachey, B., Calvo, R.A.: CLPsych 2016 shared task: triaging content in online peer-support forums. In: Proceedings of the Third Workshop on Computational Lingusitics and Clinical Psychology, pp. 118–127 (2016)
Google Scholar
Mohammad, S.M., Turney, P.D.: NRC emotion lexicon. NRC Technical report (2013)
Google Scholar
Moreno, M.A., et al.: Feeling bad on Facebook: depression disclosures by college students on a social networking site. Depress. Anxiety 28(6), 447–455 (2011)
Article Google Scholar
Neuman, Y., Cohen, Y., Assaf, D., Kedma, G.: Proactive screening for depression through metaphorical and automatic text analysis. Artif. Intell. Med. 56(1), 19–25 (2012)
Article Google Scholar
Nguyen, T., Phung, D., Dao, B., Venkatesh, S., Berk, M.: Affective and content analysis of online depression communities. IEEE Trans. Affect. Comput. 5(3), 217–226 (2014)
Article Google Scholar
Nielsen, F.Å.: A new ANEW: evaluation of a word list for sentiment analysis in microblogs. arXiv preprint arXiv:1103.2903 (2011)
Orabi, A.H., Buddhitha, P., Orabi, M.H., Inkpen, D.: Deep learning for depression detection of Twitter users. In: Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic, pp. 88–97 (2018)
Google Scholar
Pennebaker, J., Mehl, M., Niederhoffer, K.: Psychological aspects of natural language use: our words, our selves. Annu. Rev. Psychol. 54, 547–577 (2003)
Article Google Scholar
Reece, A.G., Reagan, A.J., Lix, K.L., Dodds, P.S., Danforth, C.M., Langer, E.J.: Forecasting the onset and course of mental illness with Twitter data. Sci. Rep. 7(1), 13006 (2017)
Article Google Scholar
Resnik, P., Armstrong, W., Claudino, L., Nguyen, T., Nguyen, V.A., Boyd-Graber, J.: Beyond LDA: exploring supervised topic modeling for depression-related language in Twitter. In: Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, pp. 99–107 (2015)
Google Scholar
Resnik, P., Garron, A., Resnik, R.: Using topic modeling to improve prediction of neuroticism and depression. In: Proceedings of the 2013 Conference on Empirical Methods in Natural, pp. 1348–1353. Association for Computational Linguistics (2013)
Google Scholar
Schwartz, H.A., et al.: Towards assessing changes in degree of depression through Facebook. In: Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, pp. 118–125 (2014)
Google Scholar
Shahraki, A.G., Zaïane, O.R.: Lexical and learning-based emotion mining from text. In: International Conference on Computational Linguistics and Intelligent Text Processing (CICLing) (2017)
Google Scholar
Smith, S.L., Turban, D.H., Hamblin, S., Hammerla, N.Y.: Offline bilingual word vectors, orthogonal transformations and the inverted softmax. arXiv preprint arXiv:1702.03859 (2017)
Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., Qin, B.: Learning sentiment-specific word embedding for Twitter sentiment classification. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1555–1565 (2014)
Google Scholar
Tausczik, Y.R., Pennebaker, J.W.: The psychological meaning of words: LIWC and computerized text analysis methods. J. Lang. Soc. Psychol. 29(1), 24–54 (2010)
Article Google Scholar
Vioulès, M.J., Moulahi, B., Azé, J., Bringay, S.: Detection of suicide-related posts in Twitter data streams. IBM J. Res. Dev. 62(1), 7:1–7:12 (2018)
Article Google Scholar
Yates, A., Cohan, A., Goharian, N.: Depression and self-harm risk assessment in online forums. arXiv preprint arXiv:1709.01848 (2017)
Yu, L.C., Wang, J., Lai, K.R., Zhang, X.: Refining word embeddings using intensity scores for sentiment analysis. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 26(3), 671–681 (2018)
Article Google Scholar

Download references

Acknowledgements

We thank Natural Sciences and Engineering Research Council of Canada (NSERC) and Alberta Machine Intelligence Institute (AMII) for their generous support to pursue this research. We thank Prof. Greg Kondrak for his valuable advice and Bradley Hauer for his helpful suggestions. We also thank Roberto Vega and Shiva Zamani for their contribution in implementing standard text classification pipeline and initial baseline experiments.

Author information

Authors and Affiliations

Department of Computing Science, University of Alberta, Edmonton, AB, T6G 2R3, Canada
Nawshad Farruque, Osmar Zaiane & Randy Goebel

Authors

Nawshad Farruque
View author publications
You can also search for this author in PubMed Google Scholar
Osmar Zaiane
View author publications
You can also search for this author in PubMed Google Scholar
Randy Goebel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nawshad Farruque .

Editor information

Editors and Affiliations

Leuphana University, Lüneburg, Germany
Ulf Brefeld
IRISA/Inria, Rennes, France
Elisa Fromont
University of Würzburg, Würzburg, Germany
Andreas Hotho
Leiden University, Leiden, The Netherlands
Arno Knobbe
ETH Zurich, Zurich, Switzerland
Marloes Maathuis
Institut National des Sciences Appliquées, Villeurbanne, France
Céline Robardet

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Farruque, N., Zaiane, O., Goebel, R. (2020). Augmenting Semantic Representation of Depressive Language: From Forums to Microblogs. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Lecture Notes in Computer Science(), vol 11908. Springer, Cham. https://doi.org/10.1007/978-3-030-46133-1_22

Download citation

DOI: https://doi.org/10.1007/978-3-030-46133-1_22
Published: 30 April 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-46132-4
Online ISBN: 978-3-030-46133-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)