Skip to main content
Log in

Text classification models for personality disorders identification

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

This research focuses on identifying personality disorders in individuals using their social media text. We developed a unique collection of words (PD-Corpus) and a dataset (PD-TXT), which includes texts marked with different personality disorder traits. Our goal was to classify these texts into six types of personality disorders, using Natural Language Processing (NLP) classification models. The results showed that our transformer-based models, especially the BERT-base-uncased model, were more effective than traditional methods, achieving a 74.7% success rate in correctly classifying these disorders. Also, our models consistently outperform existing literature baseline models on the PD-TXT dataset, showcasing significant enhancements. This study presents a new way to predict personality disorders through linguistic analysis and highlights the potential for further research combining language studies with mental health.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

The PD-TXT data are available upon request.

Notes

  1. https://keras.io/.

  2. http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html..

References

  • Adams JM, Florell D, Burton KA et al (2014) Why do narcissists disregard social-etiquette norms? A test of two explanations for why narcissism relates to offensive-language use. Personal Individ Differ 58:26–30

    Article  Google Scholar 

  • Al-Mosaiwi M, Johnstone T (2018) In an absolute state: elevated use of absolutist words is a marker specific to anxiety, depression, and suicidal ideation. Clin Psychol Sci 6(4):529–542

    Article  PubMed  PubMed Central  Google Scholar 

  • Alakrot A, Murray L, Nikolov NS (2018) Dataset construction for the detection of anti-social behaviour in online communication in Arabic. Procedia Comput Sci 142:174–181

    Article  Google Scholar 

  • Baccianella S, Esuli A, Sebastiani F (2010) Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: Proceedings of the seventh international conference on language resources and evaluation (LREC’10)

  • Birnbaum ML, Norel R, Van Meter A et al (2020) Identifying signals associated with psychiatric illness utilizing language and images posted to facebook. NPJ Schizophr 6(1):1–10

    Article  Google Scholar 

  • Black DW, Grant JE (2014) DSM-5® guidebook: the essential companion to the diagnostic and statistical manual of mental disorders. American Psychiatric Pub

  • Bogolyubova O, Panicheva P, Tikhonov R et al (2018) Dark personalities on facebook: harmful online behaviors and language. Comput Hum Behav 78:151–159

    Article  Google Scholar 

  • Boyd RL, Pennebaker JW (2017) Language-based personality: a new approach to personality in a digital world. Curr Opin Behav Sci 18:63–68

    Article  Google Scholar 

  • Boyd RL, Schwartz HA (2021) Natural language analysis and the psychology of verbal behavior: the past, present, and future states of the field. J Lang Soc Psychol 40(1):21–41

    Article  PubMed  Google Scholar 

  • Burdisso SG, Errecalde M, Montes-y Gómez M (2019) A text classification framework for simple and effective early depression detection over social media streams. Expert Syst Appl 133:182–197

    Article  Google Scholar 

  • Calvo RA, Milne DN, Hussain MS et al (2017) Natural language processing in mental health applications using non-clinical texts. Nat Lang Eng 23(5):649–685

    Article  Google Scholar 

  • Cheng J, Danescu-Niculescu-Mizil C, Leskovec J (2015) Antisocial behavior in online discussion communities. In: Proceedings of the international AAAI conference on web and social media, pp 61–70

  • Clarkin JF, Fonagy P, Levy KN, et al (2015) Borderline personality disorder. In: Handbook of psychodynamic approaches to psychopathology. Guilford Publications, p 353

  • Clements C, Jones S, Morriss R et al (2015) Self-harm in bipolar disorder: findings from a prospective clinical database. J Affect Disord 173:113–119

    Article  PubMed  Google Scholar 

  • Cohan A, Desmet B, Yates A, et al (2018) Smhd: a large-scale resource for exploring online language usage for multiple mental health conditions. arXiv preprint arXiv:1806.05258

  • Coppersmith G, Dredze M, Harman C (2014) Quantifying mental health signals in twitter. In: Proceedings of the workshop on computational linguistics and clinical psychology: from linguistic signal to clinical reality, pp 51–60

  • Coppersmith G, Dredze M, Harman C, et al (2015) From adhd to sad: Analyzing the language of mental health on twitter through self-reported diagnoses. In: Proceedings of the 2nd workshop on computational linguistics and clinical psychology: from linguistic signal to clinical reality, pp 1–10

  • Coppersmith G, Leary R, Crutchley P et al (2018) Natural language processing of social media as screening for suicide risk. Biomed Inf Insights 10(1178222618792):860

    Google Scholar 

  • Cutler AD, Carden SW, Dorough HL et al (2021) Inferring grandiose narcissism from text: Liwc versus machine learning. J Lang Soc Psychol 40(2):260–276

    Article  Google Scholar 

  • Dorough HL (2018) Vulnerable narcissism and first-person singular pronoun use. https://digitalcommons.georgiasouthern.edu/cgi/viewcontent.cgi?article=1430&context=honors-theses.

  • Duwairi R, Halloush Z (2023) A multi-view learning approach for detecting personality disorders among Arab social media users. ACM Trans Asian Low-Resour Lang Inf Process 22(4):1–19

    Article  Google Scholar 

  • Ellouze M, Hadrich Belguith L (2022) A hybrid approach for the detection and monitoring of people having personality disorders on social networks. Soc Netw Anal Min 12(1):1–17

    Article  Google Scholar 

  • Fava M, Farabaugh A, Sickinger A et al (2002) Personality disorders and depression. Psychol Med 32(6):1049–1057

    Article  CAS  PubMed  Google Scholar 

  • Gawda B (2013) The emotional lexicon of individuals diagnosed with antisocial personality disorder. J Psycholinguist Res 42(6):571–580

    Article  PubMed  PubMed Central  Google Scholar 

  • Gawda B, Czubak K (2017) Prevalence of personality disorders in a general population among men and women. Psychol Rep 120(3):503–519

    Article  PubMed  Google Scholar 

  • Golbeck J (2016) Negativity and anti-social attention seeking among narcissists on twitter: a linguistic analysis. First Monday. https://doi.org/10.5210/fm.v0i0.6017

  • Haz L, Rodríguez-García MÁ, Fernández A (2022) Detecting narcissist dark triad psychological traits from twitter. In: ICAART (2), pp 313–322

  • Henning AS (2017) Machine learning and natural language methods for detecting psychopathy in textual data. Electronic theses and dissertations, 446. https://egrove.olemiss.edu/etd/446

  • Holtzman NS, Tackman AM, Carey AL et al (2019) Linguistic markers of grandiose narcissism: a LIWC analysis of 15 samples. J Lang Soc Psychol 38(5–6):773–786

    Article  Google Scholar 

  • Homan C, Johar R, Liu T, et al (2014) Toward macro-insights for suicide prevention: analyzing fine-grained distress at scale. In: Proceedings of the workshop on computational linguistics and clinical psychology: from linguistic signal to clinical reality, pp 107–117

  • Howard V (2019) Recognising narcissistic abuse and the implications for mental health nursing practice. Issues Mental Health Nurs. https://doi.org/10.1080/01612840.2019.1590485

    Article  Google Scholar 

  • Janschewitz K (2008) Taboo, emotionally valenced, and emotionally neutral word norms. Behav Res Methods 40(4):1065–1074

    Article  PubMed  Google Scholar 

  • Jashinsky J, Burton SH, Hanson CL et al (2014) Tracking suicide risk factors through twitter in the us. Crisis: J Crisis Interv Suicide Prev 35(1):51

    Article  Google Scholar 

  • Kadkhoda E, Khorasani M, Pourgholamali F et al (2022) Bipolar disorder detection over social media. Inf Med Unlocked 32(101):042

    Google Scholar 

  • Kenton JDMWC, Toutanova LK (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of naacL-HLT, p 2

  • Kessing L (2007) Epidemiology of subtypes of depression. Acta Psychiatr Scand 115:85–89

    Article  Google Scholar 

  • Kessler RC, Bromet EJ (2013) The epidemiology of depression across cultures. Annu Rev Public Health 34:119–138

    Article  PubMed  PubMed Central  Google Scholar 

  • Kovanicova M, Kubasovska Z, Pallayova M (2020) Exploring the presence of personality disorders in a sample of psychiatric inpatients. J Psychiat Psychiatr Disord 4(3):118–129

    Article  Google Scholar 

  • Liu Y, Ott M, Goyal N, et al (2019) Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692

  • Lyons M, Aksayli ND, Brewer G (2018) Mental distress and language use: linguistic analysis of discussion forum posts. Comput Hum Behav 87:207–211

    Article  Google Scholar 

  • McLaren K (2020) Embracing anxiety: how to access the genius of this vital emotion. Sounds True ISBN. 9781683644422. https://books.google.co.in/books?id=A-rdyAEACAAJ

  • Mitchell M, Hollingshead K, Coppersmith G (2015) Quantifying the language of schizophrenia in social media. In: Proceedings of the 2nd workshop on Computational linguistics and clinical psychology: from linguistic signal to clinical reality, pp 11–20

  • Morf CC, Rhodewalt F (2001) Unraveling the paradoxes of narcissism: a dynamic self-regulatory processing model. Psychol Inq 12(4):177–196

    Article  Google Scholar 

  • Nielsen FÅ (2011) A new anew: Evaluation of a word list for sentiment analysis in microblogs. arXiv preprint arXiv:1103.2903

  • Pamungkas EW, Basile V, Patti V (2023) Investigating the role of swear words in abusive language detection tasks. Lang Resour Eval 57(1):155–188

    Article  Google Scholar 

  • Rytting CA, Novak V, Hull JR, et al (2022) Ru-adept: Russian anonymized dataset with eight personality traits. In: Proceedings of the thirteenth language resources and evaluation conference, pp 109–118

  • Scott LN, Wright AG, Beeney JE et al (2017) Borderline personality disorder symptoms and aggression: a within-person process model. J Abnorm Psychol 126(4):429

    Article  PubMed  PubMed Central  Google Scholar 

  • Sekulić I, Gjurković M, Šnajder J (2018) Not just depressed: bipolar disorder prediction on reddit. arXiv preprint arXiv:1811.04655

  • Sheldon P, Rauschnabel P, Honeycutt JM (2019) The dark side of social media: psychological, managerial, and societal perspectives. Academic Press, Cambridge

    Google Scholar 

  • Singh R, Du J, Zhang Y, et al (2020) A framework for early detection of antisocial behavior on twitter using natural language processing. In: Complex, intelligent, and software intensive systems: proceedings of the 13th international conference on complex, intelligent, and software intensive systems (CISIS-2019), Springer, pp 484–495

  • Singh R, Subramani S, Du J et al (2023) Antisocial behavior identification from twitter feeds using traditional machine learning algorithms and deep learning. EAI Endorsed Trans Scalable Inf Syst 10(4):e17–e17

    Article  Google Scholar 

  • Tatay-Manteiga A, Correa-Ghisays P, Cauli O et al (2018) Staging, neurocognition and social functioning in bipolar disorder. Front Psych 9:709

    Article  Google Scholar 

  • Tausczik YR, Pennebaker JW (2010) The psychological meaning of words: LIWC and computerized text analysis methods. J Lang Soc Psychol 29(1):24–54

    Article  Google Scholar 

  • Teh PL, Cheng CB, Chee WM (2018) Identifying and categorising profane words in hate speech. In: Proceedings of the 2nd international conference on compute and data analysis, pp 65–69

  • Trifan A, Antunes R, Matos S, et al (2020) Understanding depression from psycholinguistic patterns in social media texts. In: European conference on information retrieval, Springer, pp 402–409

  • Vaknin S (2020) CPQ neurology and psychology (2020) 3: 3 perspective. Psychology 3(3):01–06

    Google Scholar 

  • Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. Adv Neural Inf Process Syst 30. Long Beach, California, USA, pp, 6000–6010

  • Wang B, Wu Y, Taylor N, et al (2020) Learning to detect bipolar disorder and borderline personality disorder with language and speech in non-clinical interviews. arXiv preprint arXiv:2008.03408

  • Winsper C, Bilgin A, Thompson A et al (2020) The prevalence of personality disorders in the community: a global systematic review and meta-analysis. Br J Psychiatr 216(2):69–78

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the collaboration of the psychologists Dr. Rajat Mitra and Dr. Puneet Jain.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deepti Jain.

Ethics declarations

Conflict of interest

No potential conflict of interest was reported by the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

This section includes additional results related to the experiments carried out in this study. Figures 6 and 7 show the training and validation loss and accuracy curves for each deep learning model employed in the experiments. It is evident that the BiLSTM model with Keras embedding converges properly and provides the highest performance across the training epochs.

Fig. 6
figure 6

Loss and accuracy curves for each deep learning model with Keras embeddings. Depicted values on the curve are averaged over fivefold across 10 epochs

Fig. 7
figure 7

Loss and accuracy curves for each deep learning model with GloVe embeddings. Depicted values on the curve is averaged over fivefold across 10 epochs

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jain, D., Arora, S., Jha, C.K. et al. Text classification models for personality disorders identification. Soc. Netw. Anal. Min. 14, 64 (2024). https://doi.org/10.1007/s13278-024-01219-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-024-01219-8

Keywords

Navigation