Abstract
This research focuses on identifying personality disorders in individuals using their social media text. We developed a unique collection of words (PD-Corpus) and a dataset (PD-TXT), which includes texts marked with different personality disorder traits. Our goal was to classify these texts into six types of personality disorders, using Natural Language Processing (NLP) classification models. The results showed that our transformer-based models, especially the BERT-base-uncased model, were more effective than traditional methods, achieving a 74.7% success rate in correctly classifying these disorders. Also, our models consistently outperform existing literature baseline models on the PD-TXT dataset, showcasing significant enhancements. This study presents a new way to predict personality disorders through linguistic analysis and highlights the potential for further research combining language studies with mental health.
Similar content being viewed by others
Data availability
The PD-TXT data are available upon request.
References
Adams JM, Florell D, Burton KA et al (2014) Why do narcissists disregard social-etiquette norms? A test of two explanations for why narcissism relates to offensive-language use. Personal Individ Differ 58:26–30
Al-Mosaiwi M, Johnstone T (2018) In an absolute state: elevated use of absolutist words is a marker specific to anxiety, depression, and suicidal ideation. Clin Psychol Sci 6(4):529–542
Alakrot A, Murray L, Nikolov NS (2018) Dataset construction for the detection of anti-social behaviour in online communication in Arabic. Procedia Comput Sci 142:174–181
Baccianella S, Esuli A, Sebastiani F (2010) Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: Proceedings of the seventh international conference on language resources and evaluation (LREC’10)
Birnbaum ML, Norel R, Van Meter A et al (2020) Identifying signals associated with psychiatric illness utilizing language and images posted to facebook. NPJ Schizophr 6(1):1–10
Black DW, Grant JE (2014) DSM-5® guidebook: the essential companion to the diagnostic and statistical manual of mental disorders. American Psychiatric Pub
Bogolyubova O, Panicheva P, Tikhonov R et al (2018) Dark personalities on facebook: harmful online behaviors and language. Comput Hum Behav 78:151–159
Boyd RL, Pennebaker JW (2017) Language-based personality: a new approach to personality in a digital world. Curr Opin Behav Sci 18:63–68
Boyd RL, Schwartz HA (2021) Natural language analysis and the psychology of verbal behavior: the past, present, and future states of the field. J Lang Soc Psychol 40(1):21–41
Burdisso SG, Errecalde M, Montes-y Gómez M (2019) A text classification framework for simple and effective early depression detection over social media streams. Expert Syst Appl 133:182–197
Calvo RA, Milne DN, Hussain MS et al (2017) Natural language processing in mental health applications using non-clinical texts. Nat Lang Eng 23(5):649–685
Cheng J, Danescu-Niculescu-Mizil C, Leskovec J (2015) Antisocial behavior in online discussion communities. In: Proceedings of the international AAAI conference on web and social media, pp 61–70
Clarkin JF, Fonagy P, Levy KN, et al (2015) Borderline personality disorder. In: Handbook of psychodynamic approaches to psychopathology. Guilford Publications, p 353
Clements C, Jones S, Morriss R et al (2015) Self-harm in bipolar disorder: findings from a prospective clinical database. J Affect Disord 173:113–119
Cohan A, Desmet B, Yates A, et al (2018) Smhd: a large-scale resource for exploring online language usage for multiple mental health conditions. arXiv preprint arXiv:1806.05258
Coppersmith G, Dredze M, Harman C (2014) Quantifying mental health signals in twitter. In: Proceedings of the workshop on computational linguistics and clinical psychology: from linguistic signal to clinical reality, pp 51–60
Coppersmith G, Dredze M, Harman C, et al (2015) From adhd to sad: Analyzing the language of mental health on twitter through self-reported diagnoses. In: Proceedings of the 2nd workshop on computational linguistics and clinical psychology: from linguistic signal to clinical reality, pp 1–10
Coppersmith G, Leary R, Crutchley P et al (2018) Natural language processing of social media as screening for suicide risk. Biomed Inf Insights 10(1178222618792):860
Cutler AD, Carden SW, Dorough HL et al (2021) Inferring grandiose narcissism from text: Liwc versus machine learning. J Lang Soc Psychol 40(2):260–276
Dorough HL (2018) Vulnerable narcissism and first-person singular pronoun use. https://digitalcommons.georgiasouthern.edu/cgi/viewcontent.cgi?article=1430&context=honors-theses.
Duwairi R, Halloush Z (2023) A multi-view learning approach for detecting personality disorders among Arab social media users. ACM Trans Asian Low-Resour Lang Inf Process 22(4):1–19
Ellouze M, Hadrich Belguith L (2022) A hybrid approach for the detection and monitoring of people having personality disorders on social networks. Soc Netw Anal Min 12(1):1–17
Fava M, Farabaugh A, Sickinger A et al (2002) Personality disorders and depression. Psychol Med 32(6):1049–1057
Gawda B (2013) The emotional lexicon of individuals diagnosed with antisocial personality disorder. J Psycholinguist Res 42(6):571–580
Gawda B, Czubak K (2017) Prevalence of personality disorders in a general population among men and women. Psychol Rep 120(3):503–519
Golbeck J (2016) Negativity and anti-social attention seeking among narcissists on twitter: a linguistic analysis. First Monday. https://doi.org/10.5210/fm.v0i0.6017
Haz L, Rodríguez-García MÁ, Fernández A (2022) Detecting narcissist dark triad psychological traits from twitter. In: ICAART (2), pp 313–322
Henning AS (2017) Machine learning and natural language methods for detecting psychopathy in textual data. Electronic theses and dissertations, 446. https://egrove.olemiss.edu/etd/446
Holtzman NS, Tackman AM, Carey AL et al (2019) Linguistic markers of grandiose narcissism: a LIWC analysis of 15 samples. J Lang Soc Psychol 38(5–6):773–786
Homan C, Johar R, Liu T, et al (2014) Toward macro-insights for suicide prevention: analyzing fine-grained distress at scale. In: Proceedings of the workshop on computational linguistics and clinical psychology: from linguistic signal to clinical reality, pp 107–117
Howard V (2019) Recognising narcissistic abuse and the implications for mental health nursing practice. Issues Mental Health Nurs. https://doi.org/10.1080/01612840.2019.1590485
Janschewitz K (2008) Taboo, emotionally valenced, and emotionally neutral word norms. Behav Res Methods 40(4):1065–1074
Jashinsky J, Burton SH, Hanson CL et al (2014) Tracking suicide risk factors through twitter in the us. Crisis: J Crisis Interv Suicide Prev 35(1):51
Kadkhoda E, Khorasani M, Pourgholamali F et al (2022) Bipolar disorder detection over social media. Inf Med Unlocked 32(101):042
Kenton JDMWC, Toutanova LK (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of naacL-HLT, p 2
Kessing L (2007) Epidemiology of subtypes of depression. Acta Psychiatr Scand 115:85–89
Kessler RC, Bromet EJ (2013) The epidemiology of depression across cultures. Annu Rev Public Health 34:119–138
Kovanicova M, Kubasovska Z, Pallayova M (2020) Exploring the presence of personality disorders in a sample of psychiatric inpatients. J Psychiat Psychiatr Disord 4(3):118–129
Liu Y, Ott M, Goyal N, et al (2019) Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692
Lyons M, Aksayli ND, Brewer G (2018) Mental distress and language use: linguistic analysis of discussion forum posts. Comput Hum Behav 87:207–211
McLaren K (2020) Embracing anxiety: how to access the genius of this vital emotion. Sounds True ISBN. 9781683644422. https://books.google.co.in/books?id=A-rdyAEACAAJ
Mitchell M, Hollingshead K, Coppersmith G (2015) Quantifying the language of schizophrenia in social media. In: Proceedings of the 2nd workshop on Computational linguistics and clinical psychology: from linguistic signal to clinical reality, pp 11–20
Morf CC, Rhodewalt F (2001) Unraveling the paradoxes of narcissism: a dynamic self-regulatory processing model. Psychol Inq 12(4):177–196
Nielsen FÅ (2011) A new anew: Evaluation of a word list for sentiment analysis in microblogs. arXiv preprint arXiv:1103.2903
Pamungkas EW, Basile V, Patti V (2023) Investigating the role of swear words in abusive language detection tasks. Lang Resour Eval 57(1):155–188
Rytting CA, Novak V, Hull JR, et al (2022) Ru-adept: Russian anonymized dataset with eight personality traits. In: Proceedings of the thirteenth language resources and evaluation conference, pp 109–118
Scott LN, Wright AG, Beeney JE et al (2017) Borderline personality disorder symptoms and aggression: a within-person process model. J Abnorm Psychol 126(4):429
Sekulić I, Gjurković M, Šnajder J (2018) Not just depressed: bipolar disorder prediction on reddit. arXiv preprint arXiv:1811.04655
Sheldon P, Rauschnabel P, Honeycutt JM (2019) The dark side of social media: psychological, managerial, and societal perspectives. Academic Press, Cambridge
Singh R, Du J, Zhang Y, et al (2020) A framework for early detection of antisocial behavior on twitter using natural language processing. In: Complex, intelligent, and software intensive systems: proceedings of the 13th international conference on complex, intelligent, and software intensive systems (CISIS-2019), Springer, pp 484–495
Singh R, Subramani S, Du J et al (2023) Antisocial behavior identification from twitter feeds using traditional machine learning algorithms and deep learning. EAI Endorsed Trans Scalable Inf Syst 10(4):e17–e17
Tatay-Manteiga A, Correa-Ghisays P, Cauli O et al (2018) Staging, neurocognition and social functioning in bipolar disorder. Front Psych 9:709
Tausczik YR, Pennebaker JW (2010) The psychological meaning of words: LIWC and computerized text analysis methods. J Lang Soc Psychol 29(1):24–54
Teh PL, Cheng CB, Chee WM (2018) Identifying and categorising profane words in hate speech. In: Proceedings of the 2nd international conference on compute and data analysis, pp 65–69
Trifan A, Antunes R, Matos S, et al (2020) Understanding depression from psycholinguistic patterns in social media texts. In: European conference on information retrieval, Springer, pp 402–409
Vaknin S (2020) CPQ neurology and psychology (2020) 3: 3 perspective. Psychology 3(3):01–06
Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. Adv Neural Inf Process Syst 30. Long Beach, California, USA, pp, 6000–6010
Wang B, Wu Y, Taylor N, et al (2020) Learning to detect bipolar disorder and borderline personality disorder with language and speech in non-clinical interviews. arXiv preprint arXiv:2008.03408
Winsper C, Bilgin A, Thompson A et al (2020) The prevalence of personality disorders in the community: a global systematic review and meta-analysis. Br J Psychiatr 216(2):69–78
Acknowledgements
The authors would like to thank the collaboration of the psychologists Dr. Rajat Mitra and Dr. Puneet Jain.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
No potential conflict of interest was reported by the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
This section includes additional results related to the experiments carried out in this study. Figures 6 and 7 show the training and validation loss and accuracy curves for each deep learning model employed in the experiments. It is evident that the BiLSTM model with Keras embedding converges properly and provides the highest performance across the training epochs.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jain, D., Arora, S., Jha, C.K. et al. Text classification models for personality disorders identification. Soc. Netw. Anal. Min. 14, 64 (2024). https://doi.org/10.1007/s13278-024-01219-8
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-024-01219-8