Physical activity classification in free-living conditions using smartphone accelerometer data and exploration of predicted results

https://doi.org/10.1016/j.compenvurbsys.2017.09.012Get rights and content

Highlights

  • The aim of this study is to propose an approach to physical activity classification in free-living conditions.

  • Two supervised machine learning classifiers are tested to classify four physical activity types.

  • Two learning models identifies different daily activities with a series of plausible physical activity types.

  • Publicly released accelerometer data are found to be valid in the physical activity classification in free-living conditions.

Abstract

In recent decades, decreasing physical activity has emerged as one of the major issues affecting human health since people increasingly engaged in sedentary behavior in their homes and workplaces. In physical activity research, using GPS trajectories and advanced GIS methods has a potential for greatly enhancing our understanding of the association between objectively measured moderate and vigorous physical activity and physical and social environments. Relying only on objectively measured physical activity intensity, however, ignores the role of different places and types of physical activity on people's health outcomes. The aim of this study is to propose an approach to classifying physical activity in free-living conditions for physical activity research using published smartphone accelerometer data. Random forest and gradient boosting are used to predict jogging, walking, sitting, and standing. Generated training models based on the two classifiers are tested on accelerometer data collected from the smartphones of two subjects in free-living conditions. GPS trajectories with predicted physical activity labels are visually explored on a map to offer new insight on the assessment of the predicted results of daily activities and the identification of any difference in the results between random forest and gradient boosting. The findings of this study indicate that random forest and gradient boosting enable accurate physical activity classification in free-living conditions. GPS trajectories linked with predicted labels on a map assist the visual exploration of the erroneous prediction in daily activities including in-vehicle activities.

Introduction

In recent decades, decreasing physical activity (PA) has emerged as one of the major issues affecting human health since people increasingly engaged in sedentary behavior in their homes and workplaces. Moreover, obese people spend less time on moderate to vigorous physical activity (MVPA) when compared to non-obese people (Hagströmer, Troiano, Sjöström, & Berrigan, 2010). MVPA, such as brisk walking, bicycling, and jogging, contributes to reduced risk of physical and mental health problems (e.g., cardiovascular diseases, type II diabetes, obesity, depression, anxiety, and well-being) (Fox, 1999, Gordon-Larsen et al., 2006, Physical Activities Guidelines Advisory Committee, 2008, Wei et al., 2000). Metabolic syndrome is associated with a number of psychiatric disorders (Ho, Zhang, Mak, & Ho, 2014), and depression is a common comorbidity (Quek, Tam, Zhang, & Ho, 2017). Depression and obesity share a common pathological mechanism (Yang, De Xiang Liu, Pan, Ho, & Ho, 2016). PA and exercise lead to significant reduction of stress levels as compared to short-term pharmacological treatment (Lu et al., 2017). A number of scholars have attempted to identify factors in the physical and social environments that have significant positive or negative effects on people's PA. Many studies have examined the relationship between PA and specific characteristics of the built environment, including green spaces, based on neighborhood areas using geographic information systems (GIS) (Cohen et al., 2006, Coombes et al., 2010, McGinn et al., 2007, Nagel et al., 2008, Saelens and Handy, 2008, Sallis et al., 2016).

In PA research, using GPS trajectories and advanced GIS methods has a potential for greatly enhancing our understanding of the association between objectively measured MVPA and physical and social environments (Browning & Lee, 2017). Through taking into account people's daily activities and travel, insights may be obtained to better inform policies or measures that seek to promote PA (Almanza et al., 2012, Boruff et al., 2012, Cooper et al., 2010, Helbich et al., 2016, Lachowycz et al., 2012, Rodríguez et al., 2012, Troped et al., 2010). More important, recent studies using GPS trajectories confirm the importance of non-residential contexts (e.g., workplaces or locations for routine activities) in people's daily life as well as areas around their residential neighborhoods (Diez Roux and Mair, 2010, Inagami et al., 2007, Kwan, 2012a, Kwan, 2012b, Perchoux et al., 2013). In most existing research, MVPA and sedentary behavior are determined using some thresholds based on the count per minute, an automatically calculated measure from commercial accelerometers (Berlin et al., 2006, Frank et al., 2005, Freedson et al., 1998, Jones et al., 2009, Saelens et al., 2003).

Relying only on the objectively measured intensity of PA, however, ignores the role of different places and types of PA on people's health outcomes. As a result, our knowledge about what types of PA were undertaken and what contextual characteristics are associated with healthy behaviors is limited when using intensity. In this context, Jankowska, Schipperijn, and Kerr (2015) highlighted the importance of understanding individual health behaviors over space and time related to PA and the need to go beyond using intensity. Thus, the specific types rather than the intensity of PA are critical to a better understanding of the association between PA and certain environmental influences, and accurate classification of PA types needs to be studied.

Many studies in the healthcare domain have been conducted to recognize different types of daily activities and PA using raw accelerometer data collected from subjects under controlled conditions (Anguita et al., 2012, Arif et al., 2014, Kwapisz et al., 2011, Yin et al., 2008, Zhang et al., 2010, Zhu and Sheng, 2011). Mobile phone applications provide a low-cost technology, which allows clinicians to monitor PA of their patients without any technical knowledge (Zhang et al., 2014). The identification of different activity types enables the detection of abnormal behaviors when monitoring elderly people for their healthcare, the examination of the association between PA and its health effects on people, and the provision of feedback through mobile applications to encourage people to engage in PA. Further smartphone innovations are also helpful to care givers who are caring for individuals with dementia to improve not only their PA (Zhang et al., 2016) but also rehabilitation (Zhang, Yeo, & Ho, 2015). In the existing studies, machine learning techniques played an important role in building models based on a set of features derived from raw accelerometer data to predict various PA types. Machine learning is a branch of artificial intelligence and helps to predict outcomes after models/algorithms are trained using a large amount of input data. Features in machine learning are informative quantifiable attributes, derived from the input data, such as mean and standard deviation, used to determine different labels (or classes) within an acceptable range in models. The classification algorithms in existing studies, however, are mostly tested on accelerometer data collected under controlled situations. In other words, how daily activities and PA, which might include a variety of uncontrolled activities, can be represented by restricted types in the training accelerometer data needs to be understood. Therefore, a validation process needs to be performed to unveil how the accelerometer data collected in a laboratory setting provide convincing predicted results in various daily activities.

The aim of this study is to propose an approach to PA classification in free-living conditions for PA research using published smartphone accelerometer data. Free-living conditions refer to the natural everyday settings in people's daily lives in contrast to artificial laboratory conditions. Two supervised machine learning classifiers – random forest and gradient boosting – are used to generate training models based on publicly released accelerometer data for predicting different PA types and comparing their performance. The PA types identified by the proposed classification algorithm are jogging, walking, sedentary status, and standing. The performance of the generated predictive models is assessed in two different ways – 1) with an approach of cross-validation and 2) using test accelerometer data collected from a smartphone of one adult subject in free-living conditions. Because the published accelerometer data used to train models were collected under controlled conditions, the assessment of the learning models using data recorded in uncontrolled daily life is critical for this study. For more thorough examination of the models, a visual exploration of classified PA types over space and time is performed on a map using a set of GPS and accelerometer data collected from the smartphones of two subjects.

The process of building models developed in this study contributes to improving the performance of PA classification by highlighting practical strategies and considerations for collecting GPS and accelerometer data from human subjects. Further, it will be helpful to future studies that seek to advance the examination of the association between PA and environmental factors. The construction of learning models using publicly available accelerometer data with labels of PA types also will enable researchers to address the daunting challenge of requiring subjects in PA research to record the labels for every activity. Visualization of the classified types on GPS trajectories offers new insight into the assessment of the predicted results of daily activities and the identification of any difference in the results between the random forest and gradient boosting classifiers.

The sections in this paper are structured as follows: Section 2 summarizes existing studies on PA classification algorithms using accelerometer data. Section 3 describes the accelerometer data used in this study, preprocessing of the accelerometer data, and a classification algorithm using random forest and gradient classifiers taking into account extracted features from accelerometer data instances (samples). Section 4 demonstrates the performance of random forest and gradient boosting and the application of the developed classification algorithm to the accelerometer data collected from two subjects in free-living conditions. Lastly, discussion and conclusions of the findings in this study are presented in Section 5.

Section snippets

Related work

Due to the widespread use of smartphones, researchers are paying increasing attention to the sensing capabilities of smartphones. As one of the sensors in smartphones, tri-axial accelerometer records the accelerations of x, y, and z-axes, which may allow the recognition of different types of human activities. Many scholars in the public health and computer science domains have highlighted the potential of the built-in accelerometer sensor in smartphones to recognize different types of PA using

A physical activity classification algorithm

Because the WISDM data used in this study are labeled with six PA types, some types (e.g., going upstairs) are merged together with similar PA types (e.g., walking) (Section 3.1). The raw form of accelerometer data needs to be grouped into an analytic unit, called example, for feature extraction (Section 3.2). In this study, a set of 200 consecutive raw instances (acceleration records during 10 s) of the identical PA-type label forms one example (subset of accelerometer data), which showed the

Classification results using WISDM accelerometer data

10-fold cross-validation evaluated the performance of the learning models generated using random forest and gradient boosting. The number of the total examples is 10,243, and among them, 10% is randomly sampled to make test data in each iteration for 10-fold cross-validation, taking into account the original proportion of each PA type. Random forest and gradient boosting classifiers are run and predictive accuracy is 99.03% and 99.22 respectively. Table 3 shows the predictive accuracy when

Discussion and conclusions

In this study, an approach to classifying PA in free-living conditions was proposed using random forest and gradient boosting, and the predictive accuracy of the two classifiers was assessed. These two classifiers achieved highly accurate classification with respect to jogging, walking, sitting, and standing activities in both controlled and free-living conditions. Particularly, the high accuracy obtained through the algorithm developed using accelerometer data collected under laboratory

References (61)

  • D.A. Rodríguez et al.

    Out and about: association of the built environment with physical activity behaviors of adolescent females

    Health & Place

    (2012)
  • J.F. Sallis et al.

    Physical activity in relation to urban environments in 14 cities worldwide: A cross-sectional study

    The Lancet

    (2016)
  • P.J. Troped et al.

    The built environment and location-based physical activity

    American Journal of Preventive Medicine

    (2010)
  • C. Zhu et al.

    Motion-and location-based online human daily activity recognition

    Pervasive and Mobile Computing

    (2011)
  • D. Anguita et al.

    Human activity recognition on smartphones using a multiclass hardware-friendly support vector machine

  • M. Arif et al.

    Better physical activity classification using smartphone acceleration sensor

    Journal of Medical Systems

    (2014)
  • L. Bao et al.

    Activity recognition from user-annotated acceleration data

  • J.E. Berlin et al.

    Using activity monitors to measure physical activity in free-living conditions

    Physical Therapy

    (2006)
  • B.J. Boruff et al.

    Using GPS technology to (re)-examine operational definitions of ‘neighbourhood’in place-based health research

    International Journal of Health Geographics

    (2012)
  • L. Breiman

    Random forests

    Machine Learning

    (2001)
  • M. Browning et al.

    Within what distance does “Greenness” best predict physical health? A systematic review of articles with GIS buffer analyses across the lifespan

    International Journal of Environmental Research and Public Health

    (2017)
  • T. Chen et al.

    xgboost: Extreme gradient boosting. R package version 0.4–4

  • D.A. Cohen et al.

    Public parks and physical activity among adolescent girls

    Pediatrics

    (2006)
  • A.V. Diez Roux et al.

    Neighborhoods and health

    Annals of the New York Academy of Sciences

    (2010)
  • K. Ellis et al.

    Identifying active travel behaviors in challenging environments using GPS, accelerometers, and machine learning algorithms

    Public Health

    (2014)
  • K.R. Fox

    The influence of physical activity on mental well-being

    Public Health Nutrition

    (1999)
  • P.S. Freedson et al.

    Calibration of the Computer Science and Applications, Inc. accelerometer

    Medicine and Science in Sports and Exercise

    (1998)
  • J.H. Friedman

    Greedy function approximation: A gradient boosting machine

    Annals of Statistics

    (2001)
  • P. Gordon-Larsen et al.

    Associations among active transportation, physical activity, and weight status in young adults

    Obesity Research

    (2005)
  • P. Gordon-Larsen et al.

    Inequality in the built environment underlies key health disparities in physical activity and obesity

    Pediatrics

    (2006)
  • Cited by (56)

    View all citing articles on Scopus
    View full text