Keywords

1 Introduction

According to The American Foundation for Suicide Prevention (ASFP), suicide is the 10th leading cause of death in the USA [1]. Many organizations, such as the Trevor Project and the National Suicide Prevention Hotline provide services over phone, chat, and text-message with hopes of deescalating someone’s suicidal state or directing them to other available mental health and suicide prevention service providers [17]. According to Internet World Statistics there are over 320,000,000 internet users in the United States alone [7], and AFSP states 44,193 Americans die by suicide each year [1].

Sentiment Analysis is the field of study that analyzes opinions, emotions and attitudes, of a person toward some entity and its attributes in written text [10]. Sentiment Analysis tools such as Linguistic Inquiry and Word Count (LIWC) are able to perform analysis of text and return an analysis to identify the psychological state of that text [9]. The application of sentiment analysis for the detection of suicidal thoughts, otherwise known as suicidal ideation, has been carried out with publicly available social media data from Twitter, Facebook, Tumblr, etc. as well as in clinical studies [4, 13]. To the best of the authors’ knowledge, we are the first study to predict the veracity of suicidal ideation from counseling transcripts.

Our study is a continuation in the use of sentiment analysis for suicidal ideation detection. The authors of [12] define suicide as the act of intentionally ending one’s own life. Suicidal ideation refers to thoughts of engaging in behavior intended to end one’s life. [12] also makes the distinction between suicide and self-injury, which the latter refers to a person who does not have the intent to die, such as one engaging in self-mutilation and self-harm. We perform sentiment analysis on a corpus of therapy transcripts from Alexander Street categorized by symptoms such as depression and suicidal ideation, use mutual information to select relevant sentiment analysis features from LIWC’s Receptiviti Application Programming Interface (API), and train a machine learning classifier to automatically predict if a text sample belongs to one of Alexander street’s suicide-related categories: suicidal ideation, self-harm, self-mutilation and suicidal behavior.

2 Motivation and Related Work

A number of suicide awareness and prevention organizations are in existence in an effort to reduce suicide rate [17], yet many people do not seek the appropriate help due to social stigmatization. This lack of reporting, along with geographical and juridical factors lead to unreliable statistics on suicide, and even more for nonfatal suicidal behavior (such as ideation) [5]. In [12], it states that people tend to underreport behavior related to sensitive or shameful topics such as drug use and suicidal behavior. When offering alternatives such as anonymous surveys, rates of reported suicidal behavior increase.

The increase in communication through the internet has created diverse online communities. In [4], there are references to multiple studies related to suicide and the world wide web. Some studies focus on websites dedicated to engagement in suicidal behavior, while in [8], there is a review of online resources for suicide intervention and prevention, resulting in a need for more evaluation of and development of such resources. The reality of communication about suicide on the Internet is evident by the existence of bulletin boards, chat rooms, web forums and newsgroups [4]. We hope that our research can provide a means of improvement to suicide prevention via the Internet.

The literature on Sentiment Analysis on suicide related text has made use of social media data. In Burnap et al. [4], a number of machine classification models were developed with the goal of classifying text on Twitter relating to communications around suicide. Their classifier distinguished between suicidal ideation, defined as having thoughts of suicide [12], reporting of a suicide, memorial (condolence), campaigning (such as petitions), and support (information on suicide). The researchers built a set of baseline classifiers using the following features extracted from Twitter posts: (1) lexical features such as frequency of Parts of Speech (POS) labels per tweet such as nouns and verbs, (2) structural features such as the use of negations in a sentence and the use of mention symbols in tweets (indicators of replies or reposts on Twitter), and (3) emotive and psychological features that could be found in statements expressing emotional distress [4]. This was achieved with the use of annotators, suicide researchers, and software tools like Linguistic Inquiry and Word Count (LIWC). By creating an ensemble classifier, a combination of base classifiers, they were able to achieve a 0.728 overall F1-measure for all of their classes, and a 0.69 F1-measure for the suicidal ideation class. For the F1 measure, higher is better.

Our research is novel in that we are not using social media text or annotators. Instead, we have acquired a corpus of therapy session transcripts from Alexander Street [2], which have already been categorized by their symptom, suicidal ideation, suicidal behavior, self-harm, etc. We then applied sentiment analysis with LIWC’s Receptiviti API to those transcripts, performed feature selection to obtain the most relevant features, and trained a number of machine learning classifiers to predict whether one of our sample transcripts belongs to one of Alexander Street’s suicidal categories.

3 Technical Approach

Our approach consisted of Data Collection and Text Processing, Sentiment Analysis, Feature Selection with Mutual Information, and Machine Learning (see Fig. 1).

Fig. 1.
figure 1

System overview.

3.1 Data Collection and Text Processing

We collected a total of 745 Transcripts from Alexander Street’s Counseling and Psychotherapy Transcripts Series [2]. These transcripts were plain text and categorized by symptoms and topics such as Anxiety, Depression, Shame, Suicidal Ideation, Suicidal Behavior, Self Harm and Self Mutilation (see Fig. 2).

Fig. 2.
figure 2

Overview of transcripts by suicidal and not suicidal.

The transcripts contained conversations between a therapist and their client in the following format:

These transcripts were not always in the therapist-client order as shown above, but they always had a therapist and client in the conversation. For our experiment we required the client conversation text only, which we achieved with a set of regular expression (RegEx) and pattern matching rules. RegEx is like a mini programming language, which allows one to describe and parse text. With RegEx we only grab the text following a ‘CLIENT:’ pattern and stopping when reaching a ‘THERAPIST:’ pattern. We performed this process on each of our transcripts categorized by their symptom.

3.2 Sentiment Analysis

Once we collected all of the transcripts and grabbed the client data only, we then used Linguistic Inquiry and Word Count’s Receptiviti API to perform sentiment analysis [15]. Receptiviti generates all of LIWC’s variables with an additional 50 validated measures of psychology, emotion, tone, sentiment, and more. For every transcript, each word is compared against a dictionary, and each dictionary identifies which words are associated with a certain psychology-related category [9]. Once an entire transcript has been processed, the percentage of total words that match each of the dictionary categories is calculated. These categories are the features used in our experiment. Complete transcripts were processed one at a time, and used the returned 124 features as our samples for each transcript.

3.3 Feature Selection

Receptiviti provides 124 features for each transcript submitted. These features range from parts of speech labels and word count, to sentiment oriented features such as depression, health-oriented, and extraversion (outgoing). In order to identify the most relevant features for our classification task, we performed Feature Selection with Mutual Information using the scikit-learn Python library.

Mutual Information is a non-negative value between two variables which measures the dependency between the variables. The result is zero only if two random variables are independent, and higher values mean higher dependency [16]. Mutual Information will allow us to select features that have a better effect on classifying our text as suicidal.

$$ E\left( {X;Y} \right) = \sum\nolimits_{x} {\sum\nolimits_{y} {{\text{p}}\left( {{\text{x}},{\text{y}}} \right)\log \frac{{{\text{p}}({\text{x}},{\text{y}})}}{{{\text{p}}\left( {\text{x}} \right){\text{p}}({\text{y}})}}} } $$
(1)
$$ E(X;Y) \ge 0\,{\text{iff}}\,{\text{p(x}},{\text{y)}} = {\text{p(x)p}}({\text{y}}) $$
(2)

where \( X\,{\text{and}}\,Y \) are random variables, \( {\text{p}}({\text{x}},{\text{y}}) \) is the joint distribution, and \( {\text{p(x)p}}({\text{y}}) \) is the factored distribution [11].

In our experiment we perform Feature Selection with Mutual information, then select the features that are in the top 30th percentile of the highest scores returned. This was implemented with the scikit-learn SelectPercentile function, which uses a specified function (Mutual Information) and a specified percentage N as its parameters in order to return the top N th percentile from the specified function. After performing Mutual Information, we reduced our 124 features from Receptiviti to 37 features. Some of these features include depression, openness, extraversion, and agreeable. Some of the removed features include word count, positive emotion, and drives.

3.4 Machine Learning

Similar to the work in [4], we used Support Vector Machine (SVM), Decision Trees (C4.5), and Naïve Bayes. [4, 6] have both listed SVM classifier as yielding promising results when classifying depression. The following classifiers were used for our Machine Learning step from scikit-learn: logistic regression, linear discriminant analysis, K-NN, C4.5 [14], Naïve Bayes and a linear SVM.

4 Experimental Results

In order to test our data, we performed a 10-fold Cross Validation. Cross Validation is used to test our classifiers on the same data that we train them on. A fold represents how many partitions the data will be split to be trained and tested in every iteration. When randomly generating a fold, we ensured that the a-priori rate of both classes was even because of over-representation of the non-suicidal class (Fig. 2). The following figure shows our metrics for each classifier (Fig. 3).

Fig. 3.
figure 3

Comparison of cross validation results for each classifier

For each classifier, Accuracy measures the percentage of inputs that were correctly labeled in the test set. Figure 3 shows our highest accuracy percentages around 89% for Logistic Regression, Linear Discriminant Analysis(LDA), K-Nearest Neighbors (K-NN) and Linear Support Vector Machine (Linear SVM). All of the classifiers have an Accuracy of more than 80%.

Precision is a measure of how many items we identified were relevant, and Recall is a measure of how many of the relevant items were identified. These metrics take into account the False Positives (FP) which are the irrelevant items incorrectly identified as relevant, False Negatives (FN) which are the relevant items incorrectly identified as irrelevant, and True Positives (TP) which are the relevant items correctly identified as relevant [3, 11].

Our classifiers show a Precision of 74% for Logistic Regression and Linear SVM, 79% for LDA and K-NN. For Recall our classifiers show 86% for Logistic Regression, 83% for LDA and K-NN, and 86% for Linear SVM.

The F1-score is a harmonic mean of Precision and Recall [3, 11]. Our classifiers show an F1-score of 79% for Logistic Regression and Linear SVM, 80% for LDA, and 81% for K-NN.

Our results can be improved by obtaining sentiment analysis features for every paragraph in each transcript, and by keeping our ratio of suicidal to non-suicidal samples from being skewed. With such modifications we hope to increase the performance of our classifiers.

5 Conclusion

Electronic means of suicide prevention can be improved with research on the detection of suicidal ideation in text. In this paper we collected a corpus of therapy transcripts from Alexander Street, performed Sentiment Analysis with help of LIWC’s Receptiviti API, and trained machine learning classifiers to automatically predict if a text belongs to a suicide-related category. Our classifiers performed with accuracies ranging from 80–89%, showing promise of detecting suicidal ideation in therapy transcripts. We hope our work will advance the effectiveness of electronic suicide prevention strategies in today’s society.