A Two-Step Approach for the Prediction of Mood Levels Based on Diary Data

Bremer, Vincent; Becker, Dennis; Genz, Tobias; Funk, Burkhardt; Lehr, Dirk

doi:10.1007/978-3-030-10997-4_39

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11053))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

2733 Accesses

Abstract

The analysis of diary data can increase insights into patients suffering from mental disorders and can help to personalize online interventions. We propose a two-step approach for such an analysis. We first categorize free text diary data into activity categories by applying a bag-of-words approach and explore recurrent neuronal networks to support this task. In a second step, we develop partial ordered logit models with varying levels of heterogeneity among clients to predict their mood. We estimate the parameters of these models by employing MCMC techniques and compare the models regarding their predictive performance. This two-step approach leads to an increased interpretability about the relationships between various activity categories and the individual mood level.

You have full access to this open access chapter, Download conference paper PDF

Automated mood disorder symptoms monitoring from multivariate time-series sensory data: getting the full picture beyond a single number

Article Open access 26 March 2024

Identifying patient-specific behaviors to understand illness trajectories and predict relapses in bipolar disorder using passive sensing and deep anomaly detection: protocol for a contactless cohort study

Article Open access 22 April 2022

An Unexpected Connection from Our Personalized Medicine Approach to Bipolar Depression Forecasting

Keywords

1 Introduction

Mental issues are increasing around the world and access to healthcare programs are limited. Internet-based interventions provide additional access and can close the gap between treatment and demand [8]. In these interventions, participants often provide diary data in which they rank, for example, their mood levels and simultaneously report daily activities. Because various activities from walking a dog, to volunteering, cleaning the house, or having a drink out with friends affect mood in different and complex ways [9], we attempt to analyze the effects that different activities have on the mood level.

In this study, we propose a two-step approach for the analysis of free text diary data that is provided by participants of an online depression treatment [4]. The dataset consists of 440 patients who provided 9,192 diary entries. We utilize text-mining techniques in order to categorize the free text into defined activity categories (exercise, sickness, rumination, work related, recreational, necessary, social, and sleep related activities) and use individualized partial ordered logit models to predict the mood level. This two-step approach allows for interpretability of the effects between the activity categories and the mood level. Thus, besides studying these relationships, we contribute to the field of machine learning by proposing a mixed method approach to analyze diary data. This short paper is based on a full paper already published in [3]. Here, more information about the methods, results, and discussion including a full list of references can be found.

2 Method

Figure 1 illustrates the two-step approach. In the first step, we utilize bag-of-words (BoW) categorization and extent the results by applying recurrent neuronal networks (RNN) [5] in order to categorize the free text into activity categories. We split all diary entries into sentences and identify the most frequent ($\ge $10 occurrences) 1- and 2-grams. Next, two of the authors manually associated the frequent 1- and 2-grams with an activity category. Only the 1- and 2-grams that are assigned identically by both authors are utilized for the BoW categorization. The sentences are then assigned to one or multiple activities based on the categorized n-grams. Since 8,032 sentences do not contain any of the n-grams, they cannot be categorized. We then train an Elman network (RNN) on the categorized sentences. The RNN classifies sentences that are not already assigned by the BoW categorization. Some sentences are not associated because these consist of words that do not appear in the training corpus. The results of the BoW categorization and the merged results of both approaches are then utilized as input for the second step.

Because the mood level is ranked on a scale from one to ten, we use a partial ordered logit model for the prediction and the analysis of the effects between the assigned activity categories and the mood level. The ordered logit model is based on the proportional odds assumption (POA), which means that independent variables have the same effect on the outcome variable across all ranks of the mood level [7]. The partial ordered logit model, however, allows variables that violate this assumption to vary among the ranks. We test the assumption by a likelihood ratio test. The logit is then calculated as follows:

$$\begin{aligned} \begin{aligned} \ln (\theta _{ijt}) = \alpha _{ij}&- \left( \underbrace{\sum _{a \in A_1}^{} \beta _{aj} \, x_{ajt}}_{\text {POA holds}}\ + \underbrace{\sum _{a \in A_2}^{} \beta _{aij} \, x_{ajt}}_{\text {POA violated}}\right) , \end{aligned} \end{aligned}$$

where $\alpha _{ij}$ represents the threshold between the ranks of the mood level for $i=1, \ldots , I = 9$ and $j=1, \ldots , J = 440$. The activities of participant j at time t are represented by $x_{ajt}$, where $A_1 =$ {sleep related, recreational activities} and $A_2 =$ {exercise, sickness, rumination, social, work related, necessary activities}. The parameters to be estimated are $\beta _{[...]}$. The index j in $\alpha _{ij}$ addresses the problem of scale usage heterogeneity [6]. Additionally, we hypothesize that the effects of the activities vary among participants. Thus, we also include client specific $\beta $-parameters. For a robustness check, we also implement the partial ordered logit model without the consideration of heterogeneity among the participants (Model 1), only implement the individual $\alpha $-parameters (Model 2), only client specific $\beta $-terms (Model 3), and the above specified model including both heterogeneity terms (Model 4). Therefore, we obtain four different models, which we compare regarding their predictive performance.

3 Results and Discussion

We compare the models by using the Deviance Information Criterion (DIC), which is especially suited for Bayesian models that are estimated by MCMC methods [2]. The results of the DIC indicates a superior performance for the model that includes both heterogeneity terms. According to [1], however, the DIC can be prone to select overfitted models. Thus, for applying an out-of-sample test, we randomly extract mood entries (680 sentences) and their corresponding activities from the data before training the model. We then predict the mood level of the individuals in the test data and utilize the Root Mean Square Error (RMSE) as well as the Mean Absolute Error (MAE) as performance indicators. We also report performance measures for a so called Mean Model; here, we use the average mood level of the training set as predictions for the test dataset (in this case the mood level 6).

Table 1. Model comparison with levels of heterogeneity for each text-mining approach.

Full size table

As illustrated in Table 1, an increasing degree of heterogeneity reduces the prediction error. The additionally classified activities by the RNN do not contribute to an increased performance. This can potentially arise because the training data used for the RNN, which is based on the BoW categorization, might not be accurate enough for the RNN to generate new knowledge. Model 4 for the BoW categorization shows the best predictive performance. Thus, we utilize this model for revealing the relationships between the activities and the mood level.

We find that the category sickness has a strong negative and significant effect on mood. Furthermore, our analysis suggests that the category rumination affects the mood level in a negative way and social activities have a positive effect on the mood level. The other activities are not significant. These results are consistent with literature in the field [9]. During the ECML, we will additionally present the results of a model that directly predicts the mood levels based on the free text data.

References

Ando, T.: Bayesian predictive information criterion for the evaluation of hierarchical Bayesian and empirical Bayes models. Biometrika 94(2), 443–458 (2007)
Article MathSciNet Google Scholar
Berg, A., Meyer, R., Yu, J.: Deviance information criterion for comparing stochastic volatility models. J. Bus. Econ. Stat. 22(1), 107–120 (2004)
Article MathSciNet Google Scholar
Bremer, V., Becker, D., Funk, B., Lehr, D.: Predicting the individual mood level based on diary data. In: 25th European Conference on Information Systems, ECIS 2017, Guimarães, Portugal, 5–10 June 2017, p. 75 (2017)
Google Scholar
Buntrock, C., et al.: Evaluating the efficacy and cost-effectiveness of web-based indicated prevention of major depression: design of a randomised controlled trial. BMC Psychiatry 14, 25–34 (2014)
Article Google Scholar
Elman, J.L.: Finding structure in time. Cogn. Sci. 14(2), 179–211 (1990)
Article Google Scholar
Johnson, T.R.: On the use of heterogeneous thresholds ordinal regression models to account for individual differences in response style. Psychometrika 68(4), 563–583 (2003)
Article MathSciNet Google Scholar
McCullagh, P.: Regression models for ordinal data. J. R. Stat. Soc. 42(2), 109–142 (1980)
MathSciNet MATH Google Scholar
Saddichha, S., Al-Desouki, M., Lamia, A., Linden, I.A., Krausz, M.: Online interventions for depression and anxiety - a systematic review. Health Psychol. Behav. Med. 2(1), 841–881 (2014)
Article Google Scholar
Weinstein, S.M., Mermelstein, R.: Relations between daily activities and adolescent mood: the role of autonomy. J. Clin. Child Adolesc. Psychol. 36(2), 182–194 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Information Systems, Leuphana University, Lüneburg, Germany
Vincent Bremer, Dennis Becker, Tobias Genz & Burkhardt Funk
Institute of Psychology, Leuphana University, Lüneburg, Germany
Dirk Lehr

Authors

Vincent Bremer
View author publications
You can also search for this author in PubMed Google Scholar
Dennis Becker
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Genz
View author publications
You can also search for this author in PubMed Google Scholar
Burkhardt Funk
View author publications
You can also search for this author in PubMed Google Scholar
Dirk Lehr
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vincent Bremer .

Editor information

Editors and Affiliations

Leuphana University, Lüneburg, Germany
Ulf Brefeld
National University of Ireland, Galway, Ireland
Edward Curry
IBM Research - Ireland, Dublin, Ireland
Elizabeth Daly
University College Dublin, Dublin, Ireland
Brian MacNamee
Nokia (Ireland), Dublin, Ireland
Alice Marascu
Vodafone, Milan, Italy
Fabio Pinelli
IBM Research - Ireland, Dublin, Ireland
Michele Berlingerio
University College Dublin, Dublin, Ireland
Neil Hurley

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bremer, V., Becker, D., Genz, T., Funk, B., Lehr, D. (2019). A Two-Step Approach for the Prediction of Mood Levels Based on Diary Data. In: Brefeld, U., et al. Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2018. Lecture Notes in Computer Science(), vol 11053. Springer, Cham. https://doi.org/10.1007/978-3-030-10997-4_39

Download citation

DOI: https://doi.org/10.1007/978-3-030-10997-4_39
Published: 18 January 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-10996-7
Online ISBN: 978-3-030-10997-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

A Two-Step Approach for the Prediction of Mood Levels Based on Diary Data

Abstract

Similar content being viewed by others

Automated mood disorder symptoms monitoring from multivariate time-series sensory data: getting the full picture beyond a single number

Identifying patient-specific behaviors to understand illness trajectories and predict relapses in bipolar disorder using passive sensing and deep anomaly detection: protocol for a contactless cohort study

An Unexpected Connection from Our Personalized Medicine Approach to Bipolar Depression Forecasting

Keywords

1 Introduction

2 Method

3 Results and Discussion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

A Two-Step Approach for the Prediction of Mood Levels Based on Diary Data

Abstract

Similar content being viewed by others

Automated mood disorder symptoms monitoring from multivariate time-series sensory data: getting the full picture beyond a single number

Identifying patient-specific behaviors to understand illness trajectories and predict relapses in bipolar disorder using passive sensing and deep anomaly detection: protocol for a contactless cohort study

An Unexpected Connection from Our Personalized Medicine Approach to Bipolar Depression Forecasting

Keywords

1 Introduction

2 Method

3 Results and Discussion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation