Elsevier

Information Fusion

Volume 53, January 2020, Pages 80-87
Information Fusion

Imaging and fusing time series for wearable sensor-based human activity recognition

https://doi.org/10.1016/j.inffus.2019.06.014Get rights and content

Highlights

Abstract

To facilitate data-driven and informed decision making, a novel deep neural network architecture for human activity recognition based on multiple sensor data is proposed in this work. Specifically, the proposed architecture encodes the time series of sensor data as images (i.e., encoding one time series into a two-channel image), and leverages these transformed images to retain the necessary features for human activity recognition. In other words, based on imaging time series, wearable sensor-based human activity recognition can be realized by using computer vision techniques for image recognition. In particular, to enable heterogeneous sensor data to be trained cooperatively, a fusion residual network is adopted by fusing two networks and training heterogeneous data with pixel-wise correspondence. Moreover, different layers of deep residual networks are used to deal with dataset size differences. The proposed architecture is then extensively evaluated on two human activity recognition datasets (i.e., HHAR dataset and MHEALTH dataset), which comprise various heterogeneous mobile device sensor combinations (i.e., acceleration, angular velocity, and magnetic field orientation). The findings demonstrate that our proposed approach outperforms other competing approaches, in terms of accuracy rate and F1-value.

Introduction

In our data-driven and data-rich society (e.g., real-time video feeds from CCTVs, and other sensing data from different data sources), Body Sensor Networks (BSN) [1] have gained widespread attention, including in the academic literature, such as those of human-computer interaction and ubiquitous computing (e.g., user identification [2] and human activity recognition [3]). Constant advances in hardware and software (e.g., inexpensive mobile devices with embedded powerful sensors and wireless technology [4], [5]) have also eased human activity recognition using sensor data from BSN, and example applications include healthcare [6], [7], heart-rate-based emotion reactions [8], [9], activity monitoring [10], [11], and commercial applications such as fitness tracking [12] and signal processing in node environment [13]. For example, human activity recognition can leverage time series signal of mobile device sensor data, where representative data features are extracted for classification and discrimination using various algorithms.

Traditionally, human activity recognition using mobile device sensors has been defined as a multivariate time series classification problem. To solve the problem, a key step is feature extraction, for example relying on some statistical features of the raw signal (e.g., variance, mean, entropy, and correlation coefficients) [14], or including some cross-formal coding (e.g., signals with Fourier transform and wavelet transform). These heuristic features are widely used in analyzing time series data.

However, in the deep learning framework, we can build a multi-layer deep structure to automatically extract relevant features. A deep learning model can train data in both supervised and unsupervised manner, and it has significant effect in processing graphical data. Moreover, the representation of features of time series has recently attracted widespread attention. The most successful way is to describe features as visual cues [15]. Depending on supervisory and non-hyper-visual learning techniques in computer vision, time series can be re-coded into images to enable machines to perform image recognition. This technology has been applied in speech recognition [16], classification [17] and radio frequency identification [18], and has shown to be more effective.

Therefore, the proposed architecture in this work integrates a method that transforms sensor data into some visual images, and a framework that enables human activity recognition to be carried out by using deep residual networks in image recognition. Specifically, we summarize the key contributions of this paper to be as follows.

  • A feature engineering method is developed to transfer sensor-based time series data into different images, by unifying the global and local features from time series.

  • A fusion framework is proposed to automatically extract image features from the generated images and to recognize user behavior by distinguishing different image features.

Now, we will describe the layout for the remaining of this paper. In the next section, we will briefly review related work. Our proposed approach is described in Section 3. In Section 4, we evaluate the proposed framework on both HHAR [19] and MHEALTH datasets [20] and describe the findings. Specifically, the findings demonstrate that our proposed approach works well with most types of heterogeneous multi-dimensional time series measurements. Finally, we conclude the paper in the last section.

Section snippets

Related work

In this section, we will briefly review the related literature on imaging time series, image recognition search, and heterogeneous sources processing.

Imaging Time Series. Encoding time series as images plays an important role in many classification tasks. In [21], for example, the authors investigated the use of recurrence plots as data representation for time series classification. In their approach, texture features are extracted on recurrence patterns from time series by applying visual

Proposed architecture

In this section, we will first perform a preliminary investigation on GAF and ResNet. Then for our activity recognition challenge, we will explore the feature engineering in GAF Images and present the proposed fusion ResNet framework.

Evaluation

The following datasets are used in our evaluation.

HHAR. The Heterogeneity Human Activity Recognition (HHAR) dataset from smartphones and smartwatches is a dataset, which has been devised to benchmark human activity recognition algorithms (classification, automatic data segmentation, sensor fusion, feature extraction, etc.) in real-world contexts; specifically, the dataset is gathered with a variety of device models and use-scenarios, in order to reflect sensing heterogeneity to be expected in

Conclusion

In this paper, a deep learning network architecture was proposed for human activity recognition based on mobile sensor data. Specifically, we proposed a novel method to encode time series into GAF images by unifying global and local time series features. This new processing method can be trained in mainstream image recognition residual networks. We designed a number of experiments to verify the feasibility of imaging time series. And we proposed a fusion ResNet to solve the problem of

Acknowledgment

This work was supported in part by the National Natural Science Foundation of China (No. 61672135), the National Science Foundation of China - Guangdong Joint Foundation (No. U1401257), the Sichuan Science-Technology Support Plan Program (No. 2018GZ0236 and No. 2017FZ0004), and the Fundamental Research Funds for the Central Universities (No. 2672018ZYGX2018J057 and No. ZYGX2015KYQD136).

References (43)

  • G. Fortino et al.

    A framework for collaborative computing and multi-sensor data fusion in body sensor networks

    Inform. Fusion

    (2015)
  • G. Fortino et al.

    Spine2: developing bsn applications on heterogeneous sensor nodes.

    SIES

    (2009)
  • H. Ghasemzadeh et al.

    Power-aware activity monitoring using distributed wearable sensors

    IEEE Trans. Human Mach. Syst.

    (2014)
  • A. Bulling et al.

    A tutorial on human activity recognition using body-worn inertial sensors

    ACM Comput. Surv. (CSUR)

    (2014)
  • G. Fortino et al.

    Enabling effective programming and flexible management of efficient body sensor network applications

    IEEE Trans. Human Mach. Syst.

    (2013)
  • L. Bao et al.

    Activity recognition from user-annotated acceleration data

  • D.F. Silva et al.

    Time series classification using compression distance of recurrence plots

    2013 IEEE 13th International Conference on Data Mining, Dallas, TX, USA, December 7–10, 2013

    (2013)
  • H. Hermansky

    Perceptual linear predictive (plp) analysis of speech

    J. Acoust. Soc. Am.

    (1990)
  • Z. Wang et al.

    Imaging time-series to improve classification and imputation

    International Conference on Artificial Intelligence

    (2015)
  • G. Baldini et al.

    Imaging time series for internet of things radio frequency fingerprinting

    International Carnahan Conference on Security Technology

    (2017)
  • A. Stisen et al.

    Smart devices are different: Assessing and mitigatingmobile sensing heterogeneities for activity recognition

    ACM Conference on Embedded Networked Sensor Systems

    (2015)
  • Cited by (0)

    View full text