Elsevier

Information Systems

Volume 57, April 2016, Pages 207-222
Information Systems

Labeling sensing data for mobility modeling

https://doi.org/10.1016/j.is.2015.09.001Get rights and content

Abstract

In urban environments, sensory data can be used to create personalized models for predicting efficient routes and schedules on a daily basis; and also at the city level to manage and plan more efficient transport, and schedule maintenance and events. Raw sensory data is typically collected as time-stamped sequences of records, with additional activity annotations by a human, but in machine learning, predictive models view data as labeled instances, and depend upon reliable labels for learning. In real-world sensor applications, human annotations are inherently sparse and noisy. This paper presents a methodology for preprocessing sensory data for predictive modeling in particular with respect to creating reliable labeled instances. We analyze real-world scenarios and the specific problems they entail, and experiment with different approaches, showing that a relatively simple framework can ensure quality labeled data for supervised learning. We conclude the study with recommendations to practitioners and a discussion of future challenges.

Introduction

The availability and penetration of smart mobile devices is increasing; smartphone penetration in Europe is already more than 50% [1], and is forecast to continue at a double-digit annual rate through to the end of 2017. Mobile sensing systems are finding their way in many application areas, such as monitoring human behavior, social interactions, commerce, health monitoring, traffic monitoring, and environmental monitoring [2].

Pervasiveness of mobile phones and the fact that they are equipped with many sensor modalities makes them ideal sensing devices. Since mobile phones are personal devices, we can use the idea of mobile sensing to probe the owner of the phone and the environment, in which the user is moving. Our general interest is to use mobile phones to learn about the mobility patterns of people and to reason and predict about their mobility patterns in urban traffic environment.

The idea of using mobile phones as sensors is not new: mobile phones have been used for context recognition (e.g., [3]) and for measuring social interactions (e.g., [4]) in complex social systems already about a decade ago.

Nowadays, smart phones are equipped with a wide range of sensors, including motion, location and environment sensors, that allow collecting rich observational data about human mobility in urban areas. Various predictive modeling tasks can be formulated based on such data. For example, one can be interested in recognizing the current activity of a person [5], their levels of stress or depression [6] or other metrics of health, predicting the next location [7], or predicting a trajectory of movement [8], [9].

In this study, we present a methodology for preprocessing such sensory data for machine learning purposes and its use for analyzing, modeling and predicting human mobility in urban areas. Note that although our experiments involve activity recognition, solving this particular task is not our focus. There is already considerable literature on this topic (see, e.g., [5], [10], [11]). Rather, we focus on cleaning partially labeled data, and general analytics and classification of this data, in particular with respect to the manual annotations. The goal is to ensure a degree of reliability such that the data can be used by supervised learning algorithms.

The main contributions of this study are a survey of tasks involved with mobile sensing in urban environments, via case study identification of issues that arise in this domain, and formulation of a methodology for preprocessing and cleaning sensory data for predictive modeling, in particular to creating reliable labeled instances, as well as highlighting important questions for future research. We focus on the need to automate a process of cleaning and pre-processing, rather than relying on human analysis. This paper extends the preliminary report of [12].

We continue the paper with Section 2, giving an overview of our methodological approach. The sections following are organized with respect to the plates of Fig. 1(b): Section 3 deals with preprocessing for aggregation and fusion of both the input data and the output data (the latter case we term simply ‘labeling’), to form a set of time-indexed instances. Section 4 outlines a general methodology for the intermixed process of cleaning and classification of data. Section 5 deals with some of the analytical and modeling issues that can be approached once with reliable labeled data. Section 6 discusses overall results obtained from the experiments throughout the paper, offers recommendations to practitioners, and comments on future work. Finally, Section 7 provides conclusions.

Section snippets

Preprocessing methodology

We begin by presenting a methodological approach at a conceptual level. Then the following sections we discuss corresponding algorithmic techniques to be used at different steps of the preprocessing process.

Data fusion/aggregation and labeling

We start our discussion with the first steps of data preprocessing methodology from Fig. 1(b) – namely fusion/aggregation and labeling – covered in 3.1 Fusion and alignment: from raw sensor readings to instances, 3.2 Labeling: from human annotations to instance labels, respectively. We deal with these steps in the same section on account of their relatedness: sensor fusion involves data alignment, and labeling is simply data alignment in the label space; both processes output a time-indexed

Data cleaning

The task of data cleaning involves taking the noisily labeled input instances X,Y˜, and producing a clean version. We can also view the data as a stream {xt,y˜t}t=1T (where, possibly T=), on account of the strong time context, and because in our analysis we consider incremental algorithms that work either incrementally online or inside a moving window. Processing can rarely be totally offline in ongoing mobile sensor applications.

As we discussed earlier, labels based on human annotation are

Modeling and analytics

In this section we review the classification and prediction tasks of interest, that can be carried out with cleaned data.

Discussion: recommendations and open challenges

Our study addressed different challenges with data preprocessing, cleaning, and modeling. Our methodology for cleaning and learning from partially labeled sensor data from an urban environment was able to deal with most challenges presented by the case studies we looked at. It was flexible enough to incorporate a range of classification schemes, and thus to allow combination of several of the best practices from the literature. The approach was relatively simple, but it proved effective, and

Summary and conclusions

We presented a methodological framework for preprocessing sensory data for predictive modeling, and explored various possibilities for aggregation and cleaning of this data, and the challenges presented, from a machine learning point of view – in particular dealing with inherently sparse and noisy human labeling.

Our methodology for cleaning and learning from partially labeled sensor data was able to deal with most challenges presented by the case studies we looked at, involving activity

Acknowledgments

This work was supported by the Aalto University AEF research programme (http://energyefficiency.aalto.fi/en/), and Academy of Finland Grant 118653 (ALGODAN).

References (38)

  • Mobile Economy Europe 2014, Report, GSMA,...
  • W.Z. Khan et al.

    Mobile phone sensing systemsa survey

    IEEE Commun. Surv. Tutor.

    (2013)
  • J. Himberg, K. Korpiaho, H. Mannila, J. Tikanmäki, H. Toivonen, Time series segmentation for context recognition in...
  • N. Eagle et al.

    Reality miningsensing complex social systems

    Pers. Ubiquitous Comput.

    (2006)
  • J.R. Kwapisz et al.

    Activity recognition using cell phone accelerometers

    SIGKDD Explor. Newslett.

    (2011)
  • R. Wang, F. Chen, Z. Chen, T. Li, G. Harari, S. Tignor, X. Zhou, D. Ben-Zeev, A.T. Campbell, Studentlife: Assessing...
  • H. Gao, J. Tang, H. Liu, Mobile location prediction in spatio-temporal context, in: The Procedings of Mobile Data...
  • A. Monreale, F. Pinelli, R. Trasarti, F. Giannotti, Wherenext: a location predictor on trajectory pattern mining, in:...
  • O. Mazhelis, I. Žliobaitė, M. Pechenizkiy, Context-aware personal route recognition, in: Proceedings of the 14th...
  • L. Chen et al.

    Sensor-based activity recognition

    IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev.

    (2012)
  • T. Plötz, N.Y. Hammerla, P. Olivier, Feature learning for activity recognition in ubiquitous computing, in: Proceedings...
  • I. Žliobaitė, J. Hollmén, Mobile sensing data for urban mobility analysis: a case study in preprocessing, in:...
  • R. Kohavi, F. Provost, Glossary of terms. Editorial for the special issue on applications of machine learning and the...
  • D. Figo et al.

    Preprocessing techniques for context recognition from accelerometer data

    Pers. Ubiquitous Comput.

    (2010)
  • J. Zhang et al.

    Aggregating and sampling methods for processing GPS data streams for traffic state estimation

    IEEE Trans. Intell. Transp. Syst.

    (2013)
  • M.-L. Zhang et al.

    A review on multi-label learning algorithms

    IEEE Trans. Knowl. Data Eng.

    (2014)
  • P. Mannonen, K. Karhu, M. Heiskala, An approach for understanding personal mobile ecosystem in everyday context, in:...
  • N. Aharony et al.

    Social fMRIinvestigating and shaping social mechanisms in the real world

    Pervas. Mob. Comput.

    (2011)
  • Cited by (0)

    View full text