An approach to evaluating motion pattern detection techniques in spatio-temporal data

https://doi.org/10.1016/j.compenvurbsys.2005.09.001Get rights and content

Abstract

This paper presents a method to evaluate a geographic knowledge discovery approach for exploring the motion of point objects. The goal is to provide a means of considering the significance of motion patterns, described through their interestingness. We use Monte-Carlo simulations of constrained random walks to generate populations of synthetic lifelines, using the statistical properties of real observational data as constraints. Pattern occurrence in the synthetic data is then compared with observational data to assess the potential interestingness of the found patterns. We use motion data from wildlife biology and spatialisation in political science for the evaluation. The results of the numerical experiments show that the interestingness of found motion patterns is largely dependant on the configuration of the pattern matching process, which includes the pattern extent, the temporal granularity, and the classification schema used for the motion attributes azimuth and speed. The results of the numerical experiments allow interestingness to be attached only to some of the patterns found—other patterns were suggested to be not interesting. The evaluation method helps in estimating useful configurations of the pattern detection process. This work emphasises the need to further investigate the statistical aspects of the problem under study in (geographic) knowledge discovery.

Introduction

Location aware devices are becoming ubiquitous and will increase our capability to collect spatio-temporal motion data by many orders of magnitude. The ubiquity of such devices is reflected by the fact that in the summer of 2004 the Japanese Telecommunications Council declared that all mobile phones introduced in Japan after 2007 should have self-locating functionality. Studies of so-called moving point objects (MPO), incorporating information about changing positions of discrete objects in time and space, have been identified as a key emerging research area in GIScience (Miller, 2003). By studying MPOs through time, individual geospatial lifelines can be derived from large datasets collected, for example, from people carrying GPS-enabled phones and PDAs (e.g., Dykes & Mountain, 2003), tracked animals in field studies (e.g., Wentz, Campell, & Houston, 2003), or even tracked football players in sports scene analysis (e.g., Iwase & Saito, 2003).

It has been recognised that not only is spatial data special, but also the handling and analysing of spatio-temporal data, and above all motion data, requires the development of new concepts (Frank, 2001, Mark, 2003). Traditional analytical methods for spatial and spatio-temporal data were developed in an era when data collection was expensive and computational power was weak (Miller & Han, 2001). Miller and Han thus reason that “traditional spatial analytical techniques cannot easily discover new and unexpected patterns, trends and relationships that can be hidden deep within very large and diverse geographic datasets” (Miller & Han, 2001, p. 3).

The integration of knowledge from the field of GIScience about space–time together with the emerging field of knowledge discovery in databases opens up the possibility for geographic knowledge discovery (GKD). Applications which generate large volumes of spatio-temporal data, such as high-resolution (in time and space) satellite-based systems (Griffiths & Mather, 2000), and in our case the analysis of geospatial lifelines (Hornsby and Egenhofer, 2002, Mark, 1998) face multiple challenges in the storage and exploration of high-volume spatio-temporal datasets and thus present excellent cases for the application of GKD.

It has been recognized in the knowledge discovery in databases (KDD) literature that discovery systems can generate a glut of patterns, most of which are of no interest to the user (Silberschatz and Tuzhilin, 1996, Padmanabhan, 2004). Thus, it is recommended that data mining be carried out with regard to the statistical aspects of the problem (Fayyad, Piatetsky-Shapiro, & Smyth, 1996).

In this paper work is presented which extends the relative motion (REMO) GKD approach developed to identify motion patterns in groups of MPOs (Laube and Imfeld, 2002, Laube et al., 2005, Laube et al., 2004). The REMO approach proposed so far (Section 2.2) allows the user to search large volumes of data for instances of pre-defined motion patterns, which are constructed on the basis of existing knowledge about the motion of the objects under study. However, the user has no means by which to estimate the significance (e.g., the uniqueness) of the extracted patterns. The number and the extent of patterns found may depend significantly on both the motion data and the parameterisation of the pattern detection process. Many more patterns may be identified in a space where motion is constrained, for example, on a football pitch, than in the seemingly chaotic motion of children in a playground. A central question for GKD is therefore, how can we assess the significance of patterns extracted from such cases? The central issue of this paper is therefore to provide a means of considering the significance of REMO patterns.

Our approach focuses on the use of Monte-Carlo simulations to generate synthetic lifelines constrained by the statistical properties of real observational data. Pattern occurrence in the simulated data is then compared with observational data to assess the potential significance of the patterns. Finally, having identified potentially interesting patterns we return to the observational data to investigate whether these patterns have meaning in terms of the system under investigation.

The paper is structured as follows. Section 2 provides a literature overview on geographic knowledge discovery in general, the REMO GKD approach, pattern interestingness, and on different potential approaches to simulating geospatial lifelines. In Section 3 the central ideas of our evaluation approach are introduced. The data used in the study are introduced in Section 4. Section 5 describes the methodology for producing constrained random walks to produce synthetic lifelines, and their use in Monte-Carlo simulation for numerical experiments. The results of these experiments are presented in Section 6, and their meaning and application to the observational data is discussed in Section 7 before examining the general applicability of these results to the field of GKD.

Section snippets

Geographic knowledge discovery

Knowledge discovery in databases (KDD) is a set of methods for identifying high-level knowledge from low-level data in the context of large datasets. As an interdisciplinary approach it integrates methods from machine learning, pattern recognition, databases, statistics, artificial intelligence, knowledge acquisition for experts systems, data visualisation, and high-performance computing (Fayyad et al., 1996). KDD moves beyond the traditional domain of statistics to accommodate data normally

Evaluating the REMO approach

Hand (2004) defined pattern discovery as the search for anomalous features of the data, departing from the expected. However, the challenge lies in defining the expected. If an unexpected pattern is also unique, then the user can reasonably examine its interrelationships and infer meaningfulness. If, on the other hand, an unexpected pattern occurs many times how can we first assess if it is not only unexpected, but unusual given the parameterisation of the data? The first step in assessing such

Data

Two contrasting datasets are used in evaluating the REMO GKD approach. One of these consists of wildlife data, with a relatively small number of objects, but well understood behaviour. The other dataset are a classic example of spatialisation, where aspatial attributes are projected into a geographic space. This second dataset consists of many more objects and time steps than our wildlife data, and is a typical example of a dataset where GKD might be expected to reveal hitherto unseen patterns.

Simulating lifelines using constrained random walk

To generate simulations with properties as close as possible to the observed motion phenomena, we use constraints derived from the observation data. The constraints are given as frequency distributions of step length and direction change (Fig. 4, Fig. 5, Fig. 6, Fig. 7). This establishes an empirical link between the simulated and real data, without which the utility of the random walk model is considered as being severely restricted (Turchin, 1998).

For the construction of synthetic

Results of the numerical experiments

In this paper we present results from experiments performed, with varying pattern extension and attribute granularity, in order to evaluate the interestingness of the following patterns: constancy, concurrence and trend-setter (Table 1).

The results we present in this section involve the motion properties motion azimuth and speed, focussing on azimuth. Although having performed experiments with a wide range of different attribute granularities, we selected for this paper the two most commonly

Discussion

In discussing the results shown in Section 6, we consider three different aspects of the study. We firstly seek to interpret the plots derived from comparison of the properties of the observed data with the simulations, without consideration of the context of the data. That is to say, we do not take note of whether our MPOs are caribou or Swiss political districts in this first discussion. Secondly, the results from the simulations are used to identify potentially interesting patterns within

Main contributions

In this paper we presented a method to assess qualitative data mining measures of interestingness in geographic knowledge discovery. The GKD approach under study is the REMO GKD, developed to detect motion patterns in the lifelines of moving point objects. We propose a procedure to estimate the interestingness of the motion patterns found in lifeline observation data. Therefore, we first generate populations of constrained random walks using Monte-Carlo simulations. Secondly, we compare the

Acknowledgements

The authors wish to thank Michael Hermann and Heiri Leuthold, University of Zurich, for providing the Swiss districts data, and the Porcupine Caribou Technical Committee, the Porcupine Caribou Management Board and the Wildlife Management Advisory Council (North Slope) for providing the excellent caribou data. The authors would furthermore like to acknowledge invaluable input from Nikos Koutsias, Robert Weibel, and Stephan Imfeld, all University of Zurich.

References (39)

  • J. Dijkstra et al.

    A multi-agent cellular automata system for visualising simulated pedestrian activity

  • S.G. Fancy et al.

    Seasonal movement of caribou in arctic Alaska as determined by satellite

    Canadian Journal of Zoology

    (1989)
  • S.G. Fancy et al.

    Selection of calving sites by Porcupine herd caribou

    Canadian Journal of Zoology

    (1991)
  • U. Fayyad et al.

    From data mining to knowledge discovery in databases

    AI Magazine

    (1996)
  • A.U. Frank

    Socio-economic units: their life and motion

  • Griffith, B., Douglas, D. C., Walsh, N. E., Yound, D. D., McCabe, T. R., Russell, D. E., et al. (2002). The Porcupine...
  • G.H. Griffiths et al.

    Remote sensing and landscape ecology: landscape patterns and landscape change

    International Journal of Remote Sensing

    (2000)
  • D.J. Hand

    Pattern discovery

    Journal of Applied Statistics

    (2004)
  • D. Hand et al.

    Principles of data mining

    (2001)
  • Cited by (32)

    • Identifying urban spatial structure and urban vibrancy in highly dense cities using georeferenced social media data

      2019, Habitat International
      Citation Excerpt :

      This unfolding relationship between online social networks and urban structure needs to be examined using empirical analyses of large-scale datasets. The emerging source of social media data, such as Facebook data, equips us with a new lens to identify urban spatial structures and different spatial distributions of population, urban economy, land use and human activities (e.g., Jendryke et al., 2017; Laube & Purves, 2006), as well as to understand the vibrancy of a city (Huang & Wong, 2016a,b). In addition, social media data are accessible and convenient for gathering crucial information in a short time, and the distribution includes human activities such as where we live, work, and shop; how we spend our leisure time; and how we travel.

    • Analysis and Modeling of Movement

      2017, Comprehensive Geographic Information Systems
    • Analysing hiker movement patterns using GPS data: Implications for park management

      2014, Computers, Environment and Urban Systems
      Citation Excerpt :

      Representativity also plays a role in the itinerary data, as the reason for the hike could differ from person to person. Combining research with individual respondents, such as that done by Orellana et al. (2012) and Dias et al. (2008), with group analysis methods such as those introduced by Laube and Purves (2006), could improve representativeness, and will also provide insight into group dynamics. With this paper, we have described a case study in which easily obtainable GPS data and itinerary trip information was translated to information usable by park managers of a specific natural area.

    • Revealing the physics of movement: Comparing the similarity of movement characteristics of different types of moving objects

      2009, Computers, Environment and Urban Systems
      Citation Excerpt :

      They tried to extract typical parameters of data obtained from animal telemetry studies. Laube and Purves (2006) considered modeling relative movement within groups of objects in order to evaluate extracted movement patterns by simulation through correlated random walk procedures. Hornsby and Cole (2007) focused on modeling moving objects from an event-based perspective and tried to detect movement patterns by analysis of different events.

    View all citing articles on Scopus
    1

    Tel.: +41 44 635 6531; fax: +41 44 635 6848.

    View full text