Read speech voice quality and disfluency in individuals with recent suicidal ideation or suicide attempt

doi:10.1016/j.specom.2021.05.004

Speech Communication

Volume 132, September 2021, Pages 10-20

https://doi.org/10.1016/j.specom.2021.05.004 Get rights and content

Highlights

•
An investigation of suicidal ideation and suicide attempt based on speech-based parameters.
•
Reveals statistically significant differences between the healthy control and inpatients exhibiting suicidal behavior with regards to voice quality and speech disfluency attributes.
•
Demonstrates that voice quality and disfluency information can be applied as a compact feature set for machine learning techniques which can produce suicidal behavior classification with a relatively high degree (i.e. up to 80% classification accuracy).

Abstract

Individuals that have incurred trauma due to a suicide attempt often acquire residual health complications, such as cognitive, mood, and speech-language disorders. Due to limited access to suicidal speech audio corpora, behavioral differences in patients with a history of suicidal ideation and/or behavior have not been thoroughly examined using subjective voice quality and manual disfluency measures. In this study, we examine the Butler-Brown Read Speech (BBRS) database that includes 20 healthy controls with no history of suicidal ideation or behavior (HC group) and 226 psychiatric inpatients with recent suicidal ideation (SI group) or a recent suicide attempt (SA group). During read aloud sentence tasks, SI and SA groups reveal poorer average subjective voice quality composite ratings when compared with individuals in the HC group. In particular, the SI and SA groups exhibit average ‘grade’ and ‘roughness’ voice quality scores four to six times higher than those of the HC group. We demonstrate that manually annotated voice quality measures, converted into a low-dimensional feature vector, help to identify individuals with recent suicidal ideation and behavior from a healthy population, generating an automatic classification accuracy of up to 73%. Furthermore, our novel investigation of manual speech disfluencies (e.g., manually detected hesitations, word/phrase repeats, malapropisms, speech errors, non-self-correction) shows that inpatients in the SI and SA groups produce on average approximately twice as many hesitations and four times as many speech errors when compared with individuals in the HC group. We demonstrate automatic classification of inpatients with a suicide history from individuals with no suicide history with up to 80% accuracy using manually annotated speech disfluency features. Knowledge regarding voice quality and speech disfluency behaviors in individuals with a suicide history presented herein will lead to a better understanding of this complex phenomenon and thus contribute to the future development of new automatic speech-based suicide-risk identification systems.

Introduction

The majority of people who make suicide attempts do not die by suicide; in 2017 there were approximately 47,000 suicide deaths and an estimated 1400,000 suicide attempts in the United States of America (CDC, 2017). Due to the potentially harmful nature of suicide methods^†, according to Costache et al. (2004), Wazeer et al. (2015), and Zabel et al. (2005), survivors of attempted suicide often exhibit a debilitating range of irreversible health concerns, such as neuropsychological, neuropsychiatric, physiological, cognitive, and speech-language disorders. Brodnitz et al. (1971) was one of the first studies to report psychological problems found in patients with voice disorders. Their study of over 2000 patients, involving all forms of voice disorders, found that 80% of all voice disorder cases were attributed to vocal abuse and/or psychogenic factors (e.g., anxiety, depression). Further, Marmor et al. (2016) noted that depressive symptoms in patients were accompanied by nearly a two-fold increase in a reported voice problem in the past year when compared to a healthy population.

Mood disturbances directly impact the speech system, triggering quantifiable divergences in normal healthy speech physiological mechanisms (e.g., respiratory, muscle tension, motor coordination). For individuals with clinical depression and/or those exhibiting suicidal behavior, recorded changes in their voice characteristics are frequently attributed to psychogenic emotional and stress symptoms (Cummins et al., 2015). Examples of psychogenic symptoms include psychomotor retardation and agitation. Disturbances caused by psychomotor retardation include poorer cognitive processing and muscular incoordination, which adversely impact gross/fine motor movement and speech production (Flint et al., 1993; Hoffman et al., 1985; Silverman et al., 1992). Moreover, psychomotor agitation results in abnormal accelerated motor activity and excessive gross/fine motor movements (Day, 1999). In investigations by France et al. (2000), Ellgring and Scherer, (1996), and Yingthawornsuk et al. (2006), careful evaluation of abnormal acoustic vocal manifestations has helped to motivate new ways to automatically identify mood disorders in patients.

Suicidal speech-based literature, such as Cummins et al. (2015), Ozdas et al. (2004), Scherer et al. (2013), and Yingthawornsuk et al. (2006) have indicated that patients with a history of suicidal ideation exhibit lower acoustic energy, unusual glottal control, and breathy voice quality when compared with healthy populations. But, in these aforementioned studies, only a relatively small number of clinically validated patients with suicidal ideation and/or attempts were analyzed (i.e., less than two dozen per study). Furthermore, in many of these studies, patients’ voice quality attributes were reliant on spectral acoustic-based features rather than grounded on a standard set of clinical descriptive pathological qualities.

Scherer et al. (2013) examined ‘breathiness’ and ‘tenseness’ in a narrow demographic of adolescents with and without suicidal ideation and/or behavior. Their study found that the adolescents’ speech exhibited significantly more breathy qualities than adolescents without suicidal behavior based on peak slope and normalized amplitude quotient acoustic feature values. However, Scherer et al. (2013) did not explore other potential common pathological voice quality attributes (i.e., hoarseness, roughness, instability); or verify that these particular acoustic features only captured ‘breathiness’ quality information (i.e., they could also be capturing information from other voice quality attributes). Studies by Brodnitz et al. (1971), Mamor et al. (2016), and Scherer et al. (2013) hint that abnormal voice quality is associated with suicidal behavior. However, automatic speech-based studies have yet to further investigate several distinct pathological voice qualities associated with suicide present in a more sizable suicidal dataset.

Studies by Esposito et al. (2016), Oxman et al. (1988), Rosenberg et al. (1991), and Rubino et al. (2011) have shown that clinical depression can be identified through patients’ spontaneous speech disfluencies. Further, Stasak et al. (2019) found that when compared with healthy controls, patients with clinical depression exhibited significantly greater numbers of speech disfluencies during specific emotionally charged read aloud sentence tasks. For patients with suicidal behavior, it is anticipated that during simple read sentence tasks this population will show an increase in speech disfluencies due to associated depression and cognitive dysfunction (Levens and Gotlib, 2015; Marzuk et al., 2005; Mitterschiffthaler et al., 2008; Roy-Byrne et al., 1986; Rubino et al., 2011; Weingartner et al., 1981).

This research present herein is one of the largest-scale studies of inpatients with suicidal ideation and behavior to-date that investigates voice quality and speech disfluency behaviors found in such samples using text-dependent read aloud elicitation with a range of mood content. As an elicitation protocol, read speech has many advantages over spontaneous speech because it: (1) constrains the phonetic variability; (2) controls the syntactic order of affective word content; (3) isolates a patient's cognitive-processing demands; (4) offers objective clinical repeatability; and (5) reduces potential patient-observer bias caused by interviewer-adaptation, which influences the speaking style of a participant (Bouhuys and Van Den Hoofdakker, 1991).

In this study, we investigate the subjective GRBASI voice pathology quality attributes (e.g., ‘grade’, ‘roughness’, ‘breathiness’, ‘asthenia’, ‘strain’, ‘instability’) to help establish which of these are most associated with the speech of psychiatric inpatients hospitalized for suicidal ideation or suicide attempt. In addition, the speech of healthy controls is compared to the psychiatric inpatients. We hypothesize that voice quality is an important indicator for individuals who are at higher risk for suicide, regardless of whether they exhibit depression. Based on the previous literature (Costache et al., 2004; Wazeer et al., 2015; Zabel et al. 2005), we hypothesize that inpatients with a history of recent suicide attempts will exhibit abnormal voice qualities along with language processing and production difficulties. It is theorized that descriptive voice quality and speech disfluency measures can be applied as discriminative low-dimensional features to help automatically classify individuals with no suicide history and psychiatric inpatients with a recent history of suicidal thoughts or behaviors.

Section snippets

Database

The Butler-Brown Read Speech (BBRS) database is a privately collected speech corpus consisting of recordings of participants reading a set of sentences into a microphone. All participants were recorded at a psychiatric hospital in the northeastern United States of America. The BBRS database was developed to investigate verbal behaviors of inpatients hospitalized for recent suicidal ideation (SI group) or suicide attempts (SA group), along with a group of healthy controls recruited from the

Voice quality assessment measures

Our subjective voice quality scale was based on the GRBASI voice quality evaluation (Yamauchi et al., 2010). The GRBASI perceptual evaluation scale is one of the most common assessments for pathological voice quality. The GRBASI is based on a four-point scale (i.e., 0-normal, 1-mild, 2-moderate, 3-severe), and requires a human listener to subjectively score six voice quality attributes: ‘grade’ (hoarseness), ‘roughness’ (vibration irregularity), ‘breathiness’ (air escaping), ‘asthenia’

Voice quality

An individual voice quality attribute analysis per group is shown below in Table 3. While some of the automatic speech-based literature (Cummins et al., 2015; Scherer et al., 2013) has mentioned individuals exhibiting depression and/or suicidal behavior having increased ‘breathiness’, our manual evaluation on the BBRS database analysis found that inpatients with suicide attempt history had considerably higher ‘grade’ and ‘roughness’ attributes. For instance, both the SI and SA inpatient groups

Conclusion

Our study examined voice quality and speech disfluency behaviors found in psychiatric inpatients with a suicide history and healthy controls with no suicide history in the BBRS database. When compared with a healthy control, an analysis of voice qualities in inpatients exhibiting suicidal ideation or behavior yielded new valuable insights, revealing that they have increased ‘grade’ and ‘roughness’ voice qualities (i.e., not only ‘breathiness’ as indicated in previous literature). Further, we

Declaration of Competing Interest

None.

Acknowledgements

The authors would like to thank Butler Hospital and the Warren Alpert Medical School of Brown University for providing the BBRS dataset. This research was partly made possible by funding from the National Institute of Health (Grants: R01MH108610; R01MH095786; R01MH097741). We also thank Lifeline Harbour to Hawkesbury in Sydney, Australia for their safety-wellness ‘Accidental Counselor’ training certification. Additionally, we thank Nancy Briggs with the UNSW Mark Wainwright Analytic Centre.

References (52)

B. Barsties et al.
Assessment of voice quality: current state-of-the-art
Auris Nasus Larynx
(2015)
I.V. Bele
Reliability in perceptual analysis of voice quality
J. Voice
(2005)
A.L. Bouhuys et al.
The interrelatedness of observed behavior of depressed patients and of a psychiatrist: an ethological study on mutual influence
J. Affect. Disord.
(1991)
V.S. Costache et al.
Complete tracheal rupture after a failed suicide attempt
Ann. Thorac. Surg.
(2004)
N. Cummins et al.
A review of depression and suicide risk assessment using speech analysis
Speech Commun.
(2015)
R.K. Day
Psychomotor agitation: poorly defined and badly measured
J. Affect. Disord.
(1999)
A.J. Flint et al.
Abnormal speech articulation, psychomotor retardation, and subcortical dysfunction in major depression
J. Psych.
(1993)
H. Jiang et al.
Investigation of different speech types and emotions for detecting depression using different classifiers
Speech Comm.
(2017)
S.M. Levens et al.
Updating emotional content in recovering depressed individuals: evaluating deficits in emotion processing following a depressive episode
J. Behav. Ther. Exp. Psych.
(2015)
c.K.W. Schotte et al.
Cluster analytic validation of the DSM melancholic depression. the threshold model: integration of quantitative and qualitative distinctions between unipolar depressive subtypes
Psych. Res.
(1997)

B. Stasak et al.

Automatic depression classification based on affective read sentences: opportunities for text-dependent analysis

Speech Comm.

(2019)

A.T. Beck et al.

Beck Depression Inventory-II

(1996)

D.W. Brook et al.

Drug use and the risk of major depressive disorder, alcohol dependence, and substance use disorders

Arch. Gen. Psychiatry

(2002)

Centers for Disease Control (CDC), 2017. Centers for disease control and prevention data & statistics fatal injury...

C. Cortes et al.

Support-vector networks

Mach. Learn.

(1995)

S. Couch et al.

Vocal effectiveness of speech-language pathology students: before and after voice use during service delivery

S. Afr. J. Comm. Disord.

(2015)

L. de Araújo Pernambuco et al.

Prevalence of voice disorders in the elderly: a systematic review of population-based studies

Eur. Arch. Oto-Rhino-Laryngol.

(2014)

P.H. Dejonckere et al.

Differential perceptual evaluation of pathological voice quality: reliability and correlations with acoustic measurements

Rev. Laryngol. Otol. Rhinol. (Bord.)

(1996)

H. Ellgring et al.

Vocal indicators of mood change in depression

J. Nonverbal. Behav.

(1996)

A. Esposito et al.

On the significance of speech pauses in depressive disorders: results on read and spontaneous narratives

(2016)

D. Fay et al.

Malapropisms and the structure of the mental lexicon

Linguist. Inq.

(1977)

P.A. Frewen et al.

Visual-verbal self/other-referential processing task: direct vs. indirect assessment, valence, and experimental correlates

Pers. Individ. Dif.

(2011)

N.M. Henriksson et al.

Am. J. Psychiatry

(1993)

G.M.A. Hoffman et al.

Speech pause time as a method for the evaluation of psychomotor retardation in depressive illness

British J. Psych.

(1985)

M.-.R. Islam et al.

Detecting depression using k-nearest neighbors (KNN) classification technique

A. Kataria et al.

A review of data classification using k-nearest neighbor algorithm

Intern. J. Emerg. Technol. Adv. Eng.

(2013)

Cited by (10)

Emotion-Triggered impulsivity relates to speech dysfluency during high arousal states
2023, Journal of Research in Personality
Emotion-triggered impulsivity is robustly tied to psychopathologies. We hypothesized that one form of emotion-triggered impulsivity, Feelings Trigger Action, would be correlated with speech disfluencies during high arousal. Participants with a range of internalizing and externalizing symptoms completed a stressful speech task in which they were videorecorded while discussing a controversial topic. Skin conductance was gathered to index arousal. Consistent with hypotheses, Feelings Trigger Action scores related to modestly higher levels of speech repairs when participants were experiencing relatively higher arousal (N = 198). There was some evidence that a second form of emotion-triggered impulsivity also related to more speech errors during high arousal. Findings provide early evidence that speech disfluencies might be one manifestation of emotion-triggered impulsivity. Limitations and direction for future research are considered.
Critical Review of the Potential of Digital Technology in Psychopathology Research: A Psychoanalytical Perspective
2022, Evolution Psychiatrique
Cet article propose une revue critique qui se donne pour objectif de réfléchir à ce qu’engage l’intégration d’outils numériques, dans l’observation des patients, à travers l’exemple du phénotypage numérique qui constitue une méthode émergente en santé mentale. La revue souhaite permettre de situer la place de la psychanalyse dans ce contexte international de la recherche observationnelle en psychopathologie.
Un corpus a été sélectionné portant sur la situation du phénotypage numérique dans le cadre du débat sur les classifications en santé mentale et sur le champ problématique impliqué par le numérique, du point de vue de la littérature psychanalytique internationale. Pour restituer ce champ, nous avons sélectionné les textes critiques permettant de situer le débat actuel des orientations épistémologiques en santé mentale, la littérature disponible concernant le Digital Phenotyping comme méthode d’investigation en recherche en santé mentale, ainsi que la littérature proprement psychanalytique permettant de situer le champ problématique issu de cette orientation à propos des enjeux du numérique, tant du point de vue clinique que méthodologique.
L’utilisation des smartphones fait désormais partie des pistes explorées qu’il convient de suivre tout en pensant aux enjeux éthiques. Il apparaît que la psychanalyse apporte un regard spécifique concernant les enjeux du numérique ainsi que du téléphone portable dans la vie des sujets, ce qui témoigne de sa place dans le débat interdisciplinaire.
Cette réflexion critique permet de discuter ce qui se joue au-delà des traces numériques comportementales. L’exploration des textes permet de discuter le problème du réductionnisme en santé mentale comme un obstacle à la saisie des spécificités des enjeux cliniques.
La méthode en réseau est ainsi convoquée comme modèle d’une approche holistique dans laquelle l’apport de la psychanalyse pourrait être restitué.
This paper proposes a critical review with the objective of reflecting on what is involved in the integration of digital tools in patient observation through the example of digital phenotyping, which is an emerging method in mental health. The review situates the place of psychoanalysis in this international context of observational research in psychopathology.
A corpus has been selected on the situation of numerical phenotyping in the context of the debate on mental health classifications and on the problematic field of the digital, from the point of view of the international psychoanalytical literature. In order to reconstruct this field, we selected critical texts that situate the current debate on epistemological orientations in mental health, the available literature concerning digital phenotyping as a method of investigation in mental health research, as well as the psychoanalytical literature that situates the problematic field resulting from this orientation in relation to the challenges of digital technology, both from a clinical and methodological point of view.
The use of smartphones is now one of the avenues explored that should be considered when thinking about the ethical issues involved. It appears that psychoanalysis has a specific perspective on the challenges of digital technology and mobile phones in the lives of subjects, which demonstrates its place in the interdisciplinary debate.
This critical reflection allows for a discussion that goes beyond subjects’ digital behavior. The exploration of the literature allows us to discuss the problem of reductionism in mental health as an obstacle to grasping the specifics of clinical issues.
The network theory is thus convened as a model of a holistic approach in which the contribution of psychoanalysis could be restituted.
Linguistic features of suicidal thoughts and behaviors: A systematic review
2022, Clinical Psychology Review
Citation Excerpt :
In the special case of developing ML algorithms to distinguish posts with non-suicidal depressive or anxiety-related content from posts with STB content, a more contextual approach might be required considering the severity of STBs to prevent classifier failure (Aladağ et al., 2018). Besides social media data, health records including therapy notes (Bantilan et al., 2020; Cohen et al., 2020; Fernandes et al., 2018; Levis et al., 2021; Xu et al., 2021), text message data (Cook et al., 2016; Glenn et al., 2020; Nobles et al., 2018), suicide notes Pestian et al. (2008), and voice samples (Belouali et al., 2021; Bryan et al., 2018; Cohn et al., 2009; Figueroa Saavedra et al., 2020; France et al., 2000; Gideon et al., 2019; Hashim et al., 2012, 2017; Keskinpala et al., 2007; Landau et al., 2007; Ozdas et al., 2001; Ozdas et al., 2004; Pestian et al., 2016; Pestian et al., 2017; Scherer et al., 2013; Stasak et al., 2021; Yingthawornsuk et al., 2006, 2007) have proven useful in predicting STBs. With advances in modern (mobile) technology, ecological momentary assessments can be exploited to collect real-time, real-world data on individuals at risk, which is useful considering the fluctuating nature of suicidal thoughts (Kleiman et al., 2019).
Language is a potential source of predictors for suicidal thoughts and behaviors (STBs), as changes in speech characteristics, communication habits, and word choice may be indicative of increased suicide risk. We reviewed the current literature on STBs that investigated linguistic features of spoken and written language. Specifically, we performed a search in linguistic, medical, engineering, and general databases for studies that investigated linguistic features as potential predictors of STBs published in peer-reviewed journals until the end of November 2021.We included 75 studies that investigated 279,032 individuals with STBs (age = 29.53 ± 10.29, 35% females). Of those, 34 (45%) focused on lexicon, 20 (27%) on prosody, 15 (20%) on lexicon and first-person singular, four (5%) on (morpho)syntax, and two (3%) were unspecified. Suicidal thoughts were predicted by more intensifiers and superlatives, while suicidal behaviors were predicted by greater usage of pronouns, changes in the amount of verb usage, more prepend and multifunctional words, more nouns and prepositions, and fewer modifiers and numerals. A diverse field of research currently investigates linguistic predictors of STBs, and more focus is needed on their specificity for either suicidal thoughts or behaviors.
Investigating Generalizability of Speech-based Suicidal Ideation Detection Using Mobile Phones
2024, Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
Audio- and Video-Based Human Activity Recognition Systems in Healthcare
2024, IEEE Access
Artificial intelligence for suicide assessment using Audiovisual Cues: a review
2023, Artificial Intelligence Review

View all citing articles on Scopus

View full text

Read speech voice quality and disfluency in individuals with recent suicidal ideation or suicide attempt

Highlights

Abstract

Introduction

Section snippets

Database

Voice quality assessment measures

Voice quality

Conclusion

Declaration of Competing Interest

Acknowledgements

Auris Nasus Larynx

J. Voice

J. Affect. Disord.

Ann. Thorac. Surg.

Speech Commun.

J. Affect. Disord.

J. Psych.

Speech Comm.

J. Behav. Ther. Exp. Psych.

Psych. Res.

Speech Comm.

Beck Depression Inventory-II

Drug use and the risk of major depressive disorder, alcohol dependence, and substance use disorders

Arch. Gen. Psychiatry

Support-vector networks

Mach. Learn.

Vocal effectiveness of speech-language pathology students: before and after voice use during service delivery

S. Afr. J. Comm. Disord.

Prevalence of voice disorders in the elderly: a systematic review of population-based studies

Eur. Arch. Oto-Rhino-Laryngol.

Differential perceptual evaluation of pathological voice quality: reliability and correlations with acoustic measurements

Rev. Laryngol. Otol. Rhinol. (Bord.)

Vocal indicators of mood change in depression

J. Nonverbal. Behav.

On the significance of speech pauses in depressive disorders: results on read and spontaneous narratives

Malapropisms and the structure of the mental lexicon

Linguist. Inq.

Visual-verbal self/other-referential processing task: direct vs. indirect assessment, valence, and experimental correlates

Pers. Individ. Dif.

Am. J. Psychiatry

Speech pause time as a method for the evaluation of psychomotor retardation in depressive illness

British J. Psych.

Detecting depression using k-nearest neighbors (KNN) classification technique

A review of data classification using k-nearest neighbor algorithm

Intern. J. Emerg. Technol. Adv. Eng.