Keywords

1 Introduction

The world’s population is aging rapidly. The number of people aged 60 and above has doubled its size since 1980. The number of people aged 80 and above is estimated to increase by more than 4 times (to 395 million) by 2050 (P. Dines (2014)). At old age, people tend to suffer from various physical and mental diseases. Depression is common in late life, affecting around 1 in 7 elderly population aged 65+ in America and United Kingdom (JA. Yesavage (1983); Anderson (2001)). Depression without in-time treatment can lead to severe consequences, including increased cancer rates, increased mortality rates and substantially-higher risk of suicide. To identify an elderly’s depression level, current methods require care-givers or community workers to carry out depression screenings/surveys manually and periodically. Due to the labor cost, time consumption and possible difficulty in communication, the monitoring of elderly depression still remains inefficient and delayed (S. Crystal (2003); A. Akincigil (2012)).

In this work, we consider an ideal depression-monitoring system that automatically infers elderlies’ depression level changes so as to efficiently identify those that are at risk of becoming more depressed. To this end, we capture the daily activity patterns of elderlies through the use of Internet-of-Things (IoT), and derive their depression level changes from collected behavior features using machine learning. Our system enables care-givers to keep track of potential depression level changes of each elderly, without involving as much labor. In-time intervention can be provided as necessary to at-risk elderlies, and rapid feedback on related treatments can be obtained.

We implement and evaluate our system with real data collected from wireless sensors, installed at a group of elderlies’ individual homes. Surveys are conducted to record these elderlies’ Geriatric Depression Scale (GDS). For the purpose of evaluation, we divide the data into two time periods. We train predictive models based on data from the previous period, and test the models using data from the later period. Experimental results show that our system can correctly identify >80% of the elderly who have increased GDS scores (with false positive rate <20%). The main contributions of this paper are summarized as followsFootnote 1:

  • We demonstrate IoT-enabled smart homes with the deployment of motion-detection and reed-switch wireless sensors and a backend server. The system aims at continuously and unobtrusively monitoring residents’ behaviors while maximizing their privacy.

  • We derive meaningful behavior features from time-series sensor readings and investigate the impact of different behavior features on the elderly depression level changes.

  • We build predictive machine learning models to identify elderlies who have increased depression levels, based on their behavior features. Experimental evaluations are conducted using practical, real-world data.

To the best of our knowledge, this is the first work that predicts depression level changes based on behavior patterns, and conducts evaluation in a real-world system. The rest of the paper is organized as follows. We summarize related work in Sect. 2. We introduce our dataset collection methodology in Sect. 3. We extract behavior features in Sect. 4 and analyze the impact of these features in Sect. 5. We explain our machine learning framework in Sect. 6. We present experimental evaluation results in Sect. 7, and conclude this paper in Sect. 8.

Fig. 1.
figure 1

Implementation of our IoT monitoring system.

2 Related Work

Our interdisciplinary work is inspired by prior research work in the study of depression, IoT monitoring systems and data mining applications. Previous depression studies provide a theoretical basis for measuring depression levels, and a reference for selecting and understanding activity features indicative of depression. Existing IoT monitoring systems and data mining applications gives us techniques on collecting activity data and making use of data features to develop early alert systems.

Study of Depression. Much research efforts have been made to diagnose and mitigate depression in the elderly (Anderson (2001); S. Crystal (2003); P. Dines (2014)). The Geriatric Depression Scale (GDS) was designed specifically for elderly depression screening (JA. Yesavage (1983)) and is now used worldwide. Various activities are found to be related to depression. For example, sleep disturbances, late bedtime and short sleep durations are reported to be associated with increased depressive symptoms (N. Sakamoto (2013); G. Livingston (1993); Y. Kaneita (2006)). There is also an association between social isolation and late-life depression (Alexopoulos (2005, 2010)). Moreover, the association between toileting patterns (nocturia, urinary incontinence, overactive bladder etc.) and depression has been studied and verified (KS. Coyne (2008); Rosen (2013); BH. Zorn (1999)). These works provide valuable insights for us to design suitable activity features to monitor. Our work differs from these prior works that solely depends on questionnaires, as we utilize Internet-of-Things for unobtrusive and real-time monitoring.

IoT Monitoring Systems. Many works also explore monitoring an elderly’s status through the use of Internet-of-Things (IoT). They deploy a variety of wireless sensors to infer different conditions of individuals. For example, anomalous situations are studied using motion sensors and water flow sensors, with pre-defined rules (Tsukiyama (2015)). The statuses of elderlies performing essential daily activities are investigated based on force sensors attached to furniture (J. Shin (2011)). Moreover, falls are detected based on floor vibration measured by sensors (M. Alwan (2006)). These prior systems, however, do not monitor the mental status of elderlies. Unlike these works, our work focuses on the early identification of elderlies at risk of depression, and analyzes the impact of different activity features on depression level change.

Recently, the role of IoT in monitoring mental status has started to attract research attention (T. Glenn (2014); M. Gawannavar (2016); A. Londral (2013)). The benefit of IoT to improve business interaction with depression prevention and treatment is examined (Nejati (2012)). For smartphone users, mental conditions can be analyzed according to a subject’s social network feed (A. Ghose (2013)). The use of camera and image processing for emotion detection has been discussed (L. Y. Mano (2016)). A conceptual approach is described for detecting depression in older adults through gesture recognition (E. Almeida (2014)). All these works demonstrate promising potential of inferring one’s mental status through utilizing IoT systems. Motivated by them, our work implements this system in many homes, specifically targeting elderly depression through monitoring daily activities, and provides evaluation results based on practical data.

Data Mining. Our work is also inspired by some emerging research in data mining applications (I. Milho (2000); A.L. Gomes (2015)). (H. Lakkaraju (2015)) predicts students at risk of not graduating on time using metrics including GPA, tardiness rates, absence rates etc. (R. Wang (2015)) predicts GPA scores based on students’ behaviors inferred from smartphone sensing data. (B. Du (2016)) detects pickpocket suspects on public transportation by analyzing large scale transit records. These works and our work share similar rationale but focus on different areas.

Fig. 2.
figure 2

(a) In both periods, around half of the elderly have increased GDS scores (indicative of being more depressed), while the other half have decreased or non-changed GDS scores. None of the changing trend is obviously dominant. (b) GDS changing trends of the two periods are not necessarily identical or different. The changing trend of period 2 can hardly be predicted solely based on that of period 1.

3 Dataset Collection

The dataset we work on consists of two parts, namely (1) IoT sensor readings that indicate position and activity information for each elderly, given in time-series, and (2) survey data that contains assessment of depressive symptoms of the elderly. We will introduce the collection method of these two types of datasets in the rest of the section.

3.1 IoT Deployment

We deployed PIR (Passive Infra-Red) sensors and reed switches at 40 elderlies’ homes, where they live by themselves, and are served by a small pool of community care-givers on a as-needed basis. The nominal apartment and sensor layout is shown in Fig. 1(a). PIR sensors detect a resident’s motion in different rooms, and can be used to track the resident’s location in the apartment. Reed switches detect the opening and closing of the main door. By combining information from the main door sensor and absence of PIR triggers in the home, we can detect going-out time slots when the resident is away.

Our system infrastructure is demonstrated in Fig. 1(b). Wireless sensors are configured to sense and log their status every 10 s. A gateway is installed to aggregate the sensor log and send the data to a central server every 2 min. The central server processes and analyzes the data with programmed algorithms and updates an elderly’s status into a database. Care-givers are able to log in to view the elderly’s status or get notifications for certain pre-defined cases. The above sensor data has been continuously collected for more than one year. We will utilize these collected readings to calculate and extract an elderly’s daily behaviors (described in Sect. 4).

3.2 Depression Survey

To understand our elderlies’ depression statuses, we assess their depressive symptoms using the GDS assessment (JA. Yesavage (1983)). The GDS long form is a 30-item questionnaire designed to identify depression in the elderly. The possible range of GDS is from 0 to 30, and higher scores are indicative of a higher level of depression. In order to track our elderlies’ GDS changes, we have conducted three periodic surveys in March 2016, September 2016 and March 2017, respectively, with an interval of around 6 months. During the survey, elderly were asked questions about their statuses and feelings according to the questionnaire. The total GDS scores of the three surveys are calculated and recorded for each elderly.

In this paper, we focus on the GDS change between two consecutive surveys. We define the time period from the 1st survey to the 2nd survey as “period 1”, and the period from 2nd survey to the 3rd survey as “period 2”. The cdf of GDS changes for period 1 and period 2 are shown in Fig. 2(a). We see that for both periods, around half of our elderlies have increased GDS, i.e., exhibited more depressive symptoms. These elderlies with increased GDS are the target group we aim to identify and pay attention to, as they are at risk of becoming more depressed. More specifically, Fig. 2(b) demonstrates the respective GDS changes in period 1 and period 2 for 10 randomly-selected elderly. We find that, for two adjacent half-year time periods, some elderlies have the same changing trends, while others have opposite changing trends. That is to say, the current GDS changes are not simply dependent on historical changes. Therefore, we need to continuously monitor the elderlies’ behaviours to keep track of their depression statuses.

4 Data Processing and Feature Extraction

The raw sensor readings only provide low-level information, i.e., the time stamps of different sensor triggerings, but offer no high-level information on the behavior of the elderly that are likely to indicate their GDS changes.

In what follows, we interpret raw sensor readings, so as to infer activity patterns in two steps: firstly, we process the sensor triggerings to obtain an elderly’s everyday location movement; secondly, we make use of the time-stamped location of the elderly to extract activity features, such as sleeping patterns (e.g. sleep duration and interruptions), toileting habits (e.g. nocturia symptoms and long-duration toileting count) and social behaviors (e.g. going out duration). We describe the details in the rest of this section.

Fig. 3.
figure 3

An example of sensor data processing for one elderly on a typical day.

4.1 Processing Sensor Triggerings

In our IoT system, we sample the sensors every 10 s to record which sensor is triggered if there is one. With the raw sensor data, we first filter out those timestamps with no sensor triggerings and focus on the remaining informative timestamps. Figure 3(a) gives an example of the timestamps along with the corresponding triggered sensors for one elderly on a typical day.

We then determine the time spent in each room and outside of home. To do so, we first identify the “transition timestamps”, where the triggered sensor changes, e.g. from the bedroom sensor to living room sensor. We then calculate the time difference for each paired “transition timestamps” to obtain the time duration spent in each room. According to the home layout in Fig. 1(a), we note that an elderly may have to pass the living room and kitchen if she/he goes to the bathroom from the bedroom. To rule out such short passing-by stays, a time duration in any room shorter than 30 s is combined with the nearest next valid time duration. To find out the time slot when an elderly is not home, we detect the pairs of door sensor triggerings coupled with no resident motion in-between for longer than 30 min. We apply the above process to the example data in Fig. 3(a) and obtain the processed data shown in Fig. 3(b). By comparing these two figures, we find that some sensor triggerings in Fig. 3(a) are filtered out, as they are suspected to be passing-bys or irrelevant door switches.

4.2 Extracting Activity Features

Based on the time duration spent in each location, we now extract patterns for specific activities which might suggest depression, including sleeping, toileting and going-out (N. Sakamoto (2013); KS. Coyne (2008); Alexopoulos (2005)). We calculate the features for a daily basis.

Fig. 4.
figure 4

Distribution of selected activity features among all the elderlies in 2016 (each bin plots the percentage of elderlies whose features fall in the range indicated by x axis).

Sleeping. Enough quality sleep plays an important role in physical and mental health. We calculate sleep durations according to time slots spent in the bedroom. Specifically, we consider time slots shorter than 30 min as less likely to be real sleep durations. In this way, we calculate both day-time sleep durations and night-time sleep durations. Figure 4(a) gives an illustration on the distribution of night-time sleep durations among all the elderly. On average, a large percentage of the elderly sleep 6–10 h each night, but some elderly sleep less than 5 h per night. These may require care-givers’ attention. Based on sleep slots, we also calculate going-to-bed time, getting-up time, sleep interruptions and details about the longest non-interrupted sleep slot.

Toileting. Certain toileting habits, such as frequent urination, can be annoying and may have a profound impact on life quality (KS. Coyne (2008)). Based on the time slots in the bathroom, we estimate the time and frequency that each elderly visits the bathroom. We consider time durations in the bathroom longer than 1 min as valid toileting events. Accordingly, we calculate the frequency of bathroom usage when the resident is at home and the number of long toileting usage (duration >10 minutes). As an example, the distribution of toileting count during sleeptime is shown in Fig. 4(b). We see that most elderlies get up to go to the bathroom less than 2 times at night, which is considered to be normal. However, some elderlies visit the bathroom more than 3 times, and even up to 8 times during sleep hours, which may severely affect their sleep quality and call for their care-givers’ attention.

Going-out. As these elderlies live alone, we consider going-out activities as rough indicators for social activity. According to the empty-flat time slot, we estimate going-out durations and the number of going-out times. We demonstrate distribution for hours of the day when the elderly goes out in Fig. 4(c). In the figure, we see that the peak away hours are around 9–12 am, possibly when the elderlies go out for shopping or meals in local morning markets.

5 Feature Correlation Analysis and Selection

To understand the relationship between behavior changes and GDS changes, we conduct Pearson correlation analysis. During period 1 of our experiments, we conducted two GDS surveys for each elderly at the beginning and at the end. We calculate the differences of these two GDS scores to obtain the GDS changes. With regards to sleeping, toileting and going out behaviors, we have designed and extracted 14 features in total, each computed on a daily basis. We calculate the average values of the features over the three months before the survey dates respectively for two GDS surveys. Similarly, to quantify behavior changes, we calculate the difference of averaged feature values between the two surveys. We now present the correlation results in Table 1.

Table 1. Pearson correlations between change of activity features and change of GDS. (Shadowed features show relatively stronger correlations and are selected for our predictive models.)

According to Table 1, we identify relatively-strong correlations and select the corresponding features for our GDS predictive models. We tentatively select the top 5 relevant activity features as illustrated by the shadowed lines. We find that becoming more depressed is correlated with shorter and later non-interrupted sleep, less time and motion in the bedroom during sleep hours, more time spent in the bathroom and more going-out timesFootnote 2. Although these selected features do not all exhibit significantly low p-values, we argue that such activity measurements provide a new unobtrusive indication of elderly depression level changes. The effectiveness of the combination of these features will be evaluated in Sect. 7. We also note that the data used for feature analysis and selection is from period 1, while the models will be tested based on period 2 for the purpose of cross-validation.

6 The Machine Learning Framework

In this section, we first provide a problem overview and then briefly introduce the models that we’ll be using to solve the problem. We also describe the experiment setup that we designed to closely match practical requirements.

Problem Overview: In order to provide early intervention to elderlies at risk of becoming more depressed, we first need to identify such elderlies with increased GDS scores. To do so, we use algorithms that can learn from the GDS and behavior data in the past, and make predictions. For each elderly, we have records on his/her GDS change and behavior changes between the two surveys. Thus the problem of identifying elderlies at risk of becoming more depressed can be formulated as a binary classification problem. Whether the GDS score has increased for each elderly is the target outcome, while the selected features are predictors.

Predictive Models: To predict whether an elderly is at risk of becoming more depressed, we experiment with the following machine learning algorithms.

  • K-Nearest Neighbors (KNN) Altman (1992). This algorithm assigns a test sample to a class according to the majority voting of its k closest training samples in the feature space.

  • Random Forests (RF) Ho (1995). This algorithm constructs multiple decision trees at training time and outputs the classification results combined by voting.

  • Logistic Regression (LR) Duncan (1967). This algorithm outputs binary classification outcomes based on a generalized linear model.

We use Python’s scikit-learn library to implement all these algorithms.

Experiment Setup: Our datasets include the activity features and GDS change outcomes for period 1 (March – September 2016) and period 2 (October 2016 – March 2017). The goal here is to develop an early-warning system to predict the future GDS outcomes. Thus, an apt way of evaluating an algorithm is to train the model using data from a previous time period and test it using data from a later time period. We perform all the evaluations in this manner, training models using period 1 data and testing the models using period 2 data. In order to take into account randomness of the models (e.g., RF), we carry out 1,000 runs with each model and average the predictions to compute the final results.

Fig. 5.
figure 5

Performance evaluation of different GDS change predictive models

7 Evaluation Results

We now present the evaluation results. We first demonstrate basic classification performance using traditional ROC curves. We then consider practical requirements of community care-givers and evaluate the effectiveness of our system when only a limited number of risky elderlies can be taken care of. Finally, we investigate the relationship between estimated risk scores and the actual GDS change values.

ROC Curves: Since our predictive models are binary classifiers, we use ROC (Receiver Operating Characteristic) curves to evaluate the performance of this binary classifier. We plot the ROC curves for KNN, RF and LR in Fig. 5(a). Generally speaking, the KNN model demonstrates the best performance. Only when the false positive rate is around 0.15, RF shows a slightly higher true positive rate than KNN. To be more specific, for KNN, if we can tolerate <20% false positive rate, we can correctly identify >80% of the elderly who have become more depressed, and if the false positive rate is around 40%, we can identify 100% of the target elderly.

Fig. 6.
figure 6

Actual GDS change demonstrates a similar increasing trend with risk scores (KNN).

Precision, Recall and F-score at Top K Percentage Highest-Risk Elderly: In practice, the resource of community care-givers is limited. As a result, the care-givers may only be able to intervene for the top K percentage of elderlies who are at higher risk of becoming more depressed than the rest. Therefore, we also measure the effectiveness of our classifiers on retrieving the top K percentage of elderlies at risk. To this end, we calculate recall, precision and F-score values of different algorithms for various values of K. Amongst them, recall is the number of target elderlies successfully identified by the classifier divided by the total number of existing target elderlies, while precision is defined as the number of target elderlies correctly identified by the classifier divided by the total number of identified elderlies. F-score is a combination of recall and precision. To rank the at-risk level of the elderly, we represent risk scores by the decision function or probability values of classifiers.

Figure 5(b), (c) and (d) respectively illustrates the recall, precision and F-score values for the top K% risky elderly. Similar to the ROC curves, KNN generally performs better than the other classifiers (except when K% is around 40%, RF performs slightly better than KNN). It can be seen that the performance gain of KNN is large when K% is between 40% and 90%. In Fig. 5(b), particularly, we can observe that when top 65% risky elderly are picked up for further attention, all 100% of the target elderly are successfully identified using KNN. That is to say, in this case, compared to traditional methods of conducting GDS screenings for each elderly, our method can be 35% more efficient.

Risk Scores vs. GDS Change Values: Besides the binary classification results, community care-givers are also interested in more details on the values of GDS changes. We thus investigate the relationship of the actual GDS change and the risk scores output from classifiers. We take KNN as an example. To plot Fig. 6, we first put the risk scores into 7 equally spaced bins in ascending order (bin size = 1/7 for normalized risk scores), and then calculate the average GDS change for the elderlies categorized in each bin. In Fig. 6, we can observe a similar trend between the risk scores and averaged actual GDS changes. That is, higher risk scores roughly indicate larger GDS increase, which call for more urgent intervention from community care-givers.

8 Conclusion and Future Work

In this paper, we have proposed a novel scheme to unobtrusively and efficiently monitor elderly depression status change, and identify the elderlies at risk of becoming more depressed, using IoT systems. We have implemented our monitoring system in 40 elderlies’ homes and conducted experiments for more than one year. Extensive evaluation results have shown effectiveness of the proposed scheme. We aim to build a scalable depression early-warning system so that community care-givers can identify and prioritize the elderlies at risk. Although this work is designed to track depression level change, we believe the principle and rationale can apply to other health conditions as well and we plan to generalize our monitoring system in future. Our hope is to demonstrate the potential of data-driven healthcare, especially in the area of mental health, and benefit more people in need.