Modelling Naturalistic Work Stress Using Spectral HRV Representations and Deep Learning

Hagad, Juan Lorenzo; Fukui, Ken-ichi; Numao, Masayuki

doi:10.1007/978-3-030-39878-1_24

Modelling Naturalistic Work Stress Using Spectral HRV Representations and Deep Learning

Juan Lorenzo Hagad²²,
Ken-ichi Fukui²³ &
Masayuki Numao²³

Conference paper
First Online: 04 February 2020

529 Accesses
1 Citations

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1128))

Abstract

This is an extension from a selected paper from JSAI2019. With the proliferation of wearable devices and the inflow of new health data, artificial intelligence is expected to revolutionize the field of wellness and health management by providing potential tools for analyzing harmful conditions like prolonged stress. Currently, one of the standard measurements used by medical practitioners to measure stress is heart rate variability (HRV), a set of numerical indices that reflect autonomic balance. However, recent advances in machine learning have shown that learned features tend to outperform hand-crafted features. In this work we propose a more expressive intermediate data representation based on Lomb-Scargle periodograms combined with the feature learning capabilities of deep learning. Using stress data from naturalistic work activities, we tested different shallow and deep learning architectures and show that significant improvements can be achieved compared to traditional HRV indices. Results show that models trained on our spectral-temporal representation significantly outperform models trained on traditional HRV indices for predicting naturalistic work stress.

Download conference paper PDF

1 Introduction

Findings from a large number of clinical studies [7, 10] have linked cardio-vascular disorders to psychological stressors. In particular, long-term exposure to stressful environments, such as those commonly found in offices, has been shown to be a major risk factor for major depressive disorder (MDD), a disorder that affects an estimated 350 million people across the world [11]. In response to this, we have witnessed a surge in the development of ambient intelligences for the purpose of mental health and stress monitoring [1].

1.1 HRV Stress Analysis

Historically, heart rate variability (HRV) spectrograms were used by medical practitioners as the primary tool to measure cardiac neural regulation externally. This was brought out by findings that mental and physical states have certain effects on cardiac rhythm that could easily and unintrusively be measured with electrocardiographs (ECG) [8]. In the past few decades, studies have established the connection between HRV and the sympatho-vagal stress response as being visible in both the time and frequency domains. Through experiments and empirical anlaysis, these works have identified indices which show high correlations with physical and mental stress, particulary with the high and low frequency ranges [12].

Despite our growing understanding of physiological signals, even now there is no widely accepted standard for stress evaluation based on HRV features. In recent surveys [9, 16] it has been found that neurobiological evidence links some components of HRV to activity in cortical regions related to stress appraisal as well as other brain mechanisms. As it stands, it seems that one of the main challenges is identifying which components of HRV may be relevant to evaluating stress.

1.2 Deep Learning and Heart Rate Analysis

Traditionally, affective analysis such as those done for stress have mostly focused towards using visual or auditory measures [17], where deep learning has been found to be effective. In recent years, others works in the field of ECG analysis for medical applications have also pushed for end-to-end applications of deep learning.

In one of the most prominent works [14] they used a 34-layer residual convolutional neural network (CNN) to analyze raw ECG sequences in order to identify 14 different types of arrhythmia. Using a very large dataset of over 60,000 recordings they were able to train a model that performed better on average than actual cardiologists recruited for the experiment. Their work demonstrates how deep learning is able to identify structural patterns in heart rate data.

In the current work, we attempt to exclude morphological patterns and instead focus on frequency-related features of heart beats. Towards this end we introduce the use of short-time Lomb-Scargle (LS) spectrograms to train CNNs.

2 Methodology

This work aims to use spectral HRV data representations together with deep learning models to predict mental stress. Specifically, we use a sliding window Lomb-Scargle approach to build a spectrogram of heart rate in lieu of the traditional short-term Fourier transform (STFT). Next, we use pre-built deep models designed for image recognition and compared them with more simplified architectures and train them to recognize stress. In the following sections we describe the experiments and tools used in this research.

2.1 Dataset

For the following experiments we use the same work stress dataset introduced in a previous work [5]. It consists of ECG recordings of subjects performing naturalistic desk work activities in front of a PC. The physiological recording device used was a wearable ECG sensor by Imec^{Footnote 1}. This allowed unconstrained and continuous measurement of participants’ heart rate. The sensor is worn with leads placed diagonally over the heart as shown in Fig. 1. RR beats were then extracted from the raw ECG signal using a QRS detection toolkit [4], which were then used to build the HRV spectrograms.

The dataset was gathered from four subjects each contributing five separate hour-long sessions annotated with activity and stress labels as shown in Table 1. This provided us with a total of 20 h of ECG data that was used for training and testing.

Table 1. Stress annotations distribution

Full size table

2.2 Signal Processing

Previous works on spectral anlaysis of HRV applied the fast fourier transform (FFT) [19] to produce periodograms from which the power of different frequency bands can be measured and compared. However, one limitation of the FFT algorithm is that it assumes that all samples used for analysis are evenly spaced in time. This poses a fundamental issue for heart rate since the RR intervals used to represent detected heart beats are not evenly sampled. A study [2] has shown that data resampling methods used to align RR interval data for FFT analysis can lead to artifacting and higher errors in the final PSD estimation. A better alternative has been put forward in the form of the Lomb-Scargle periodogram, a method based on FFT that does not require evenly sampled data.

Lomb-Scargle Periodogram. Periodograms are some of the most basic tools used for spectral analysis, the most common of which is the discrete Fourier transform (DFT). The DFT can be defined for any sampled dataset,

${X(t_i),i=1,2,...,N_0}$ as,

$$\begin{aligned} FT_X(\omega )=\sum _{j=1}^{N_0} X(t_j)\exp (-i \omega t_j). \end{aligned}$$

(1)

The periodogram can then be defined as,

$$\begin{aligned} \begin{aligned} P_X(\omega )&=\frac{1}{N_0}|FT_x(\omega )|^2 \\&= \frac{1}{N_0}\Bigg |\sum _{j=1}^{N_0} X(t_j)\exp (-i \omega t_j)\Bigg |^2 \\&= \frac{1}{N_0}\Bigg [\bigg (\sum _{j} X_j \cos \omega t_j\bigg )^2+\bigg (\sum _{j} X_j \sin \omega t_j\bigg )^2\Bigg ] \end{aligned}, \end{aligned}$$

(2)

[3] which while sufficient for calculating an accurate estimation of the FFT of most signals was found to be sensitive to noisy signals and other artifacts [18]. In practice, it is necessary to apply additional spectral smoothing equations, such as spectral window functions, to reduce the variances of the spectrum. Most of these methods are designed to be applied on evenly sampled data. Though some may also be applied to the unevenly sampled case, such as in RR intervals, it has been found that it can lead to higher error and increased sensitivity to signal noise [13]. This can also be mitigated by the addition of more data for calculating the spectra as a way to taper off the signal-to-noise ratio, but this may not be an option for data-scarce or time-intensive domains. A reasonable alternative is the Lomb-Scargle periodogram [15], a notable method designed for use in calculating frequency spectrograms from unevenly sampled data streams such as heart beats. The equation is based on the FFT and is wholly similar but addresses some of the noise sensitivity and spectral leakage issues of the standard FFT equation. Here, the periodogram is defined as,

$$\begin{aligned} P_X(\omega )=\frac{1}{2}\left\{ \frac{\left[ \sum _j X_j \cos \omega (t_j-\tau )\right] ^2}{\sum _j \cos ^2 \omega (t_j - \tau )}\right\} + \frac{\left[ \sum _j X_j \sin \omega (t_j-\tau )\right] ^2}{\sum _j \sin ^2 \omega (t_j-\tau )} \end{aligned}$$

(3)

with $\tau $ defined as,

$$\begin{aligned} \tan (2\omega \tau ) = \left( \sum _j \sin 2 \omega t_j \right) \Bigg / \left( \sum _j \cos 2 \omega t_j \right) . \end{aligned}$$

(4)

This modified equation, outputs similar values to the original FFT in most cases, however it has a simpler statistical behavior and is equivalent to the reduction of the sum of squares in least-squares fitting of sin waves to data [15]. It is also time-translation invariant and reduces to a similar representation as DFT in cases where data is evenly spaced. This makes it ideal for HRV analysis and is the method used in this research. The next issue is how to tackle the non-stationary nature of heart rate. For this, we construct spectrograms from the periodograms.

Spectrogram Analysis. Spectrograms are visual representations of frequency spectra over time. They are tools that are used extensively in applications that deal with audio and radio signals. Recently, they have also seen frequent application in the analysis of physiological signals such as brainwaves and heart rate. The most common method used for building spectrograms is the short-time Fourier transform (STFT). Put simply it is built by applying the DFT over a pre-defined window of data and sliding the windowed function over the data series using a fixed amount of overlap. Formally, for any series x(n) we can defined the STFT as,

$$\begin{aligned} X_w(mS,\omega ) = \sum _{n=-\infty }^{\infty } x(n)w(n-mS)\exp (-j \omega n) \end{aligned}$$

(5)

for analysis window w, frame index m and step size S [20]. The internal equation $\sum _{n=-\infty }^{\infty } x(n)w(n-mS)\exp (-j \omega n)$ represents the Fourier transform of $x_w(mS,\omega )$ and is stored in the matrix $\hat{X}$ whose columns are Fourier periodograms $P_X(\omega )$. To adapt the spectrogram to the unevenly sampled heart rate data, we replace this equation with the Lomb-Scargle equation,

$$\begin{aligned} \begin{aligned} X_w(mS,\omega ) = \frac{1}{2}\left\{ \frac{\left[ \sum _j x(n) \cos \omega (t_j-\tau )\right] ^2}{\sum _j \cos ^2 \omega (t_j - \tau )}\right\} \\ + \frac{\left[ \sum _j x(n) \sin \omega (t_j-\tau )\right] ^2}{\sum _j \sin ^2 \omega (t_j-\tau )} \end{aligned} \end{aligned}$$

(6)

In this way we mitigate the sensitivity of the standard STFT while maintaining its ability to capture the nonstationary patterns in the temporal data. In effect, we end up with a very high dimensional representation that could potentially have more expressive and appropriate features for detecting stress. The next step is to use machine learning to discover these features and compare the performance difference with models based on traditional HRV features. Due to the 2D spatial nature of data representation, convolutional neural networks should provide an efficient learning architecture.

3 Experiments and Validation

3.1 Dataset and Models

For the following experiments, we applied a sliding window method to the stress dataset in order to segment the data. This method also lends itself to real-time processing in the future. While deciding the appropriate window length, we observed that shorter segments tended to be more stable but not very expressive. Meanwhile, longer segments were more expressive but tended to overlap across samples with different labels. Through empirical testing, we found that 505-beat windows ($\tilde{6}$ mins) showed the best balance between stability and expressiveness while staying above the 5-minute minimum length recommended for accurate short-term HRV analysis (See Fig. 2).

We also applied a modal labelling scheme for window segments which overlapped between multiple labels. A threshold of 70% was set in order to mark stable segments which could be labelled using the modal value of the contained samples and separate them “transitional” segments which were considered to be more ambiguous. The final experiment dataset was comprised of 1,500 windowed samples which were either labelled based on the 4-level stress annotation or as transitions.

For modeling, we trained perceptron and autoencoder models using the stress HRV data to serve as baselines and compared these to convolutional neural network (CNN) models trained on our spectral HRV data representation. The convolutional layers were topped with a 128-unit dense layer to match the ones used for the non-deep HRV stress models. For the CNN models, we tested both standard deep learning models typically used for image classification tasks as well as shallower CNN architectures to verify the hierarchical feature requirements of the dataset. Performance was validated using stratified 10-fold cross-validation.

3.2 Results and Discussion

First, we analyze the performance results for models trained on the standard HRV features, which were 14 time and frequency domain features, and compared them to the deep ResNet [6] model trained on the LS spectrum data. As seen in Fig. 3, the baseline HRV-MLP model achieved 52% test accuracy which was improved by up to 60% by applying denoising autoencoders for more efficient latent feature selection. These results are well-above the 20% random classification on this 5-class dataset.

Deep Spectral Model. For the spectral models, ResNet50 was able to achieve over 80% testing accuracy, which is well above both of the traditional HRV baselines. These significant gains could be explained by the effective combination of the more expressive spectral data with the feature learning ability of deep learning algorithms. The traditional indices used in traditional HRV analysis help humans understand physical conditions, but they may also obfuscate some intricate features that may be necessary for deeper affective analysis. By allowing the model to learn from a spatial-spectral representation, we are able to train a more effective model. However, this comes at the cost of increased complexity since the standard 50-residual unit model requires over 25 million parameters. This not only leads to higher data requirements, but may also lead to more challenges in interpretation and optimization.

Shallow Spectral Model. To help alleviate the problem of increased complexity, we attempted to train a more compact model. For these tests, the goal was to reduce the depth and overall complexity of our model without compromising performance. This ultimately culminated in a model featuring 5 intermediate 128-filter CNN layers with BatchNorm for regularization. The relatively shallow model also meant residual units were unnecessary and training was much faster than the deeper model. It should also be noted that tests were also performed on shallower models below 5 layers, but this typically lead to larger performance variations and more noticeable overfitting. Figure 4 shows a comparison of the learning process of both the deep and shallow models. Despite using less than 1 million parameters, as opposed to the over 25 million of ResNet50, the shallow model is able to match the performance of the much larger ResNet model while also avoiding overfitting. Actual training time could also reduced by up to 5x which may be useful in situations where occasional retraining is necessary.

3.3 ROC Analysis

Next, we wish to analyze how well the model is able to discriminate between the different stress classes. For this we look at the receiver operating characteristic (ROC) curves for each of the classes, using a one versus all methodology. As seen in the in Fig. 5, using the deep ResNet model has a high rate of discrimination for the edge labels (i.e., very low and very high). Meanwhile, it is slightly more challenging to correctly distinguish the median labels (low and high). This could indicate that there is more ambiguity with regards to the physiological signals shown for moderate stress levels. Given that the stress levels were expected to be ordinal, it would be easier to separate classes on either side of the spectrum rather than in the middle. This may be a sign that the features discovered by the model follow the same intended ordinality as the stress labels.

Alternatively, this could also be due to labelling inconsistencies common to psychological labels. Simply, it is easier to write accurate labels for extreme emotions rather than moderate ones. Future work should focus on exploring this further in order to determine a more efficient annotation methodology, or possibly even correct or omit inconsistent labels.

4 Summary and Future Work

In this work we trained a deep model of work stress using a spectral representation of heart rate extracted using short-time Lomb-Scargle transform. The experiments showed that the deep models trained on the spectral representation could easily outperform those trained on the traditional HRV features for detecting naturalistic stress levels. Additionally, we show that even relatively shallow models featuring at least 5 convolutional layers can match the performance of ResNet50, this could indicate that at least some shallow hierarchical features may be important for effectively modelling naturalistic stress from heart rate.

In future work we plan to investigate further the specific features learned by our model through feature activation analysis. We also plan to train and compare the performance of the model on other affective computing datasets for emotion. Finally, it would benefit the affective computing community greatly if we can develop optimal representations that can enable the aggregation of data using different devices and dataset to build a single model that contains all the richest features for HRV analysis.

Notes

1.
https://www.imec-int.com/nl/expertise/wearables.

References

Acampora, G., Cook, D., Rashidi, P., Vasilakos, A.: A survey on ambient intelligence in healthcare. Proc. IEEE 101(12), 2470–2494 (2013)
Article Google Scholar
Clifford, G.D., Tarassenko, L.: Quantifying errors in spectral estimates of hrv due to beat replacement and resampling. IEEE Trans. Biomed. Eng. 52(4), 630–638 (2005)
Article Google Scholar
Deeming, T.J.: Fourier analysis with unequally-spaced data. Astrophys. Space Sci. 36(1), 137–158 (1975)
Article MathSciNet Google Scholar
Goldberger, A.L., Amaral, L.A.N., Glass, L., Hausdorff, J.M., Ivanov, P.C., Mark, R.G., Mietus, J.E., Moody, G.B., Peng, C.K., Stanley, H.E.: PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101(23), e215–e220 (2000)
Google Scholar
Hagad, J.L., Fukui, K.I., Numao, M.: A hierarchical model of authentic work stress using physiological signals and stress coping profiles. In: Proceedings of 30th Annual Conference of the Japanese Society for Artificial Intelligence, pp. 3E34–3E34. The Japanese Society for Artificial Intelligence (2016)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Iso, H., Date, C., Yamamoto, A., Toyoshima, H., Tanabe, N., Kikuchi, S., Kondo, T., Watanabe, Y., Wada, Y., Ishibashi, T., Suzuki, H., Koizumi, A., Inaba, Y., Tamakoshi, A., Ohno, Y.: Perceived mental stress and mortality from cardiovascular disease among Japanese men and women: the Japan collaborative cohort study for evaluation of cancer risk sponsored by monbusho (JACC study). Circulation 106(10), 1229–1236 (2002)
Article Google Scholar
Kamath, M.V., Ghista, D.N., Fallen, E.L., Fitchett, D., Miller, D., McKelvie, R.: Heart rate variability power spectrogram as a potential noninvasive signature of cardiac regulatory system response, mechanisms, and disorders. Heart Vessels 3(1), 33–41 (1987)
Article Google Scholar
Kim, H.G., Cheon, E.J., Bai, D.S., Lee, Y.H., Koo, B.H.: Stress and heart rate variability: a meta-analysis and review of the literature. Psychiatry Invest. 15(3), 235 (2018)
Article Google Scholar
Kowata, E., Hozawa, A., Kakizaki, M., Tomata, Y., Nagai, M., Sugawara, Y., Kuriyama, S., Tsuji, I.: Perceived stress and cardiovascular disease mortality. The Ohsaki Cohort Study. Nihon Koshu Eisei Zasshi 59(2), 82–91 (2012)
Google Scholar
Marcus, M., Yasamy, M., van Ommeren, M., Chisholm, D.: Depression: a global public health concern. World Health Organization, Technical report (2012)
Google Scholar
Melillo, P., Bracale, M., Pecchia, L.: Nonlinear heart rate variability features for real-life stress detection. case study: students under stress due to university examination. Biomed. Eng. Online 10, 96+ (2011). http://dx.doi.org/10.1186/1475-925x-10-96
Article Google Scholar
Moody, G.: Spectral analysis of heart rate without resampling. In: Proceedings Computers in Cardiology 1993, pp. 715–718, September 1993
Google Scholar
Rajpurkar, P., Hannun, A.Y., Haghpanahi, M., Bourn, C., Ng, A.Y.: Cardiologist-level arrhythmia detection with convolutional neural networks. arXiv preprint arXiv:1707.01836 (2017)
Scargle, J.D.: Studies in astronomical time series analysis. ii-statistical aspects of spectral analysis of unevenly spaced data. Astrophys. J. 263, 835–853 (1982)
Article Google Scholar
Thayer, J.F., Åhs, F., Fredrikson, M., Sollers III, J.J., Wager, T.D.: A meta-analysis of heart rate variability and neuroimaging studies: implications for heart rate variability as a marker of stress and health. Neurosci. Biobehav. Rev. 36(2), 747–756 (2012)
Article Google Scholar
Tzirakis, P., Trigeorgis, G., Nicolaou, M.A., Schuller, B.W., Zafeiriou, S.: End-to-end multimodal emotion recognition using deep neural networks. IEEE J. Sel. Top. Sig. Process. 11(8), 1301–1309 (2017)
Article Google Scholar
VanderPlas, J.T.: Understanding the lomb-scargle periodogram. Astrophys. J. Suppl. Series 236(1), 16 (2018)
Article Google Scholar
Welch, P.: The use of fast fourier transform for the estimation of power spectra: a method based on time averaging over short, modified periodograms. IEEE Trans. Audio Electroacoust. 15(2), 70–73 (1967)
Article Google Scholar
Zhu, X., Beauregard, G.T., Wyse, L.L.: Real-time signal estimation from modified short-time fourier transform magnitude spectra. IEEE Trans. Audio Speech Lang. Process. 15(5), 1645–1653 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Information Science and Technology, Osaka University, Suita, Japan
Juan Lorenzo Hagad
Institute of Scientific and Industrial Research, Osaka University, Suita, Japan
Ken-ichi Fukui & Masayuki Numao

Authors

Juan Lorenzo Hagad
View author publications
You can also search for this author in PubMed Google Scholar
Ken-ichi Fukui
View author publications
You can also search for this author in PubMed Google Scholar
Masayuki Numao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Juan Lorenzo Hagad .

Editor information

Editors and Affiliations

Department of Systems Innovation, University of Tokyo, Tokyo, Japan
Yukio Ohsawa
Faculty of Business and Commerce, Kansai University, Osaka, Japan
Katsutoshi Yada
Nagoya Institute of Technology, Nagoya, Japan
Takayuki Ito
Graduate School of System Design, Tokyo Metropolitan University, Tokyo, Japan
Yasufumi Takama
Department of Information and Communication, Tokyo Metropolitan University, Tokyo, Japan
Eri Sato-Shimokawara
Faculty of Letters, Chiba University, Chiba, Japan
Akinori Abe
School of Engineering, The University of Tokyo, Tokyo, Japan
Junichiro Mori
Graduate School of Economics, Osaka University, Toyonaka, Osaka, Japan
Naohiro Matsumura

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hagad, J.L., Fukui, Ki., Numao, M. (2020). Modelling Naturalistic Work Stress Using Spectral HRV Representations and Deep Learning. In: Ohsawa, Y., et al. Advances in Artificial Intelligence. JSAI 2019. Advances in Intelligent Systems and Computing, vol 1128. Springer, Cham. https://doi.org/10.1007/978-3-030-39878-1_24

Download citation

DOI: https://doi.org/10.1007/978-3-030-39878-1_24
Published: 04 February 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-39877-4
Online ISBN: 978-3-030-39878-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics