Detecting anomalies in spatio-temporal flow data by constructing dynamic neighbourhoods

https://doi.org/10.1016/j.compenvurbsys.2017.08.010Get rights and content

Highlights

  • The proposed algorithm is designed for anomaly detection in spatio-temporal flow data.

  • The spatio-temporal neighbourhoods are constructed based on the modelling of dynamic flow.

  • The proposed algorithm can accurately detect global and local spatio-temporal flow anomalies.

Abstract

In massive spatio-temporal datasets, anomalies that deviate from the global or local distributions are not just useless noise but possibly imply significant changes, surprising patterns, and meaningful insights, and because of this, detection of spatio-temporal anomalies has become an important research hotspot in spatio-temporal data mining. For spatio-temporal flow data (e.g., traffic flow data), the existing anomaly detection methods cannot handle the embedded dynamic characteristic. Therefore, this paper proposes the approach of constructing dynamic neighbourhoods to detect the anomalies in spatio-temporal flow data (called spatio-temporal flow anomalies). In this approach, the dynamic spatio-temporal flow is first modelled based on the real-time attribute values of the flow data, e.g., the velocity of vehicles. The dynamic neighbourhoods are then constructed by considering attribute similarity in the spatio-temporal flow. On this basis, global and local anomalies are detected by employing the idea of the G statistic and the problem of multiple hypothesis testing is further addressed to control the false discovery rate. The effectiveness and practicality of our proposed approach are demonstrated through comparative experiments on traffic flow data from the central road network of central London for both weekdays and weekends.

Introduction

The wide usage of geo-location sensors and network connectivity makes it easier to capture enormous amounts of spatio-temporal data spanning certain spatial regions over a period of time. Spatio-temporal anomalies are a collection of records that significantly deviate from the global or local distributions with the consideration of non-spatial attributes (Shekhar, Lu, & Zhang, 2001). In most geographical fields, spatio-temporal anomalies are not just useless noises but imply significant changes, surprising patterns, and meaningful insights. Therefore, the detection of spatio-temporal anomalies has become a research hotspot in the field of spatio-temporal data mining (Miller and Han, 2009, Shekhar et al., 2009, Tan et al., 2006) and has received more attention in the detection of disease and crime hotspots (Kulldorff et al., 2005, Delmelle et al., 2014, Brunsdon et al., 2007, Nakaya and Yano, 2010, Shiode and Shiode, 2013, Cheng and Adepejue, 2013), the discovery of climate change (Sun et al., 2005, Barua and Alhajj, 2007, Wu et al., 2010, Liu et al., 2011, Telang et al., 2014), the monitoring of environment change (Birant and Kut, 2006, Cheng and Li, 2006), the extraction of anomalous trajectories (Ge et al., 2010, Lee et al., 2008, Zhang et al., 2011) and the detection of traffic congestion (Li et al., 2007, Liu et al., 2011, Pang et al., 2011, Chawla et al., 2012, Pan et al., 2013).

The types of spatio-temporal data can vary with different applications. In general, the detection methods of existing spatio-temporal anomalies are mostly designed for spatio-temporal sequence data, e.g., climate spatio-temporal sequences. Spatial and non-spatial attributes are both embedded in the spatio-temporal sequences. Specifically, spatial attributes determine the geographical locations of entities by X-Y coordinates or latitude-longitude, while non-spatial attributes are described by time series. Taking Fig. 1(a) as an example, the entity P1 can be denoted as P1 = (x1, y1, P1 · nsat1, P1 · nsat2, …), where x1, y1 and P1 · nsat1 , P1 · nsat2 , … represent the spatial location and non-spatial attribute values of P1, respectively. In the real world, there exists a kind of special spatio-temporal sequence data that has the characteristics of directionality and being dynamic, e.g., traffic flow data from a road network (also called ‘spatio-temporal flow data’ in this paper). In traffic flow data, the vehicles keep running on the roads, as shown in Fig. 1(b), and the traffic flow in the upstream direction can interact with that in the downstream direction. This paper specifically focuses on spatio-temporal flow data and presents the approach of constructing dynamic neighbourhoods to detect spatio-temporal flow anomalies. The major contributions are as follows:

  • Constructing dynamic spatio-temporal neighbourhoods.

  • Detecting various anomalies by integrating dynamic spatio-temporal neighbourhoods.

The rest of this paper is organized as follows. Section 2 reviews the related work about the detection of spatio-temporal anomalies and proposes our strategy for detecting spatio-temporal flow anomalies. In Section 3, the proposed method for detecting spatio-temporal flow anomalies is fully elaborated. In Section 4, extensive experiments on real-life data are analysed to demonstrate the effectiveness and practicability of the proposed method. The interesting findings are summarized and future research works are highlighted in Section 5.

Section snippets

Related work

Hawkins first defined an anomaly as ‘an observation that deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism’ (Hawkins, 1980). Because there are enormous collections of spatio-temporal datasets, Shekhar proposed a method of detecting anomalies in spatio-temporal datasets. He defined a spatio-temporal anomaly as ‘an entity whose non-spatial attribute values are significantly different from those in its spatio-temporal neighbourhood’ (

Detection of spatio-temporal flow anomalies based on dynamic neighbourhoods

In this section, the proposed method will be elaborated by taking traffic flow data as an example. Section 3.1 introduces the structure of traffic flow data on the road network. Section 3.2 gives the process of modelling dynamic spatio-temporal flow. The construction of dynamic spatio-temporal neighbourhoods and detection of spatio-temporal flow anomalies are consecutively described in Sections 3.3 and 3.4. The implementation of the proposed method is given in Section 3.5.

Experimental comparisons and analysis

By performing experiments on real-life datasets, the effectiveness and practicality of the proposed method will be demonstrated in this section. Section 4.1 describes the real-life datasets utilized in the experiments. In Section 4.2, the selection of parameters is elaborated. Sections 4.3 and 4.4 perform comparisons with other methods and give an analysis of the experimental results obtained from two groups of datasets.

Conclusions and future work

In this paper, a novel approach of detecting spatio-temporal flow anomalies is proposed by constructing dynamic spatio-temporal neighbourhoods. A process of dynamic spatio-temporal flow is first modelled by combining topological relationships in the network with the real-time status of spatio-temporal flow. With the consideration of spatio-temporal reachability and spatio-temporal connectivity in the dynamic flow, the dynamic spatio-temporal neighbourhoods for all spatio-temporal cells can be

Acknowledgements

This work was supported by the National Natural Science Foundation of China (NSFC), Nos. 41601424, 41730105 and 41471385; the National Key Research and Development Foundation of China, No. 2017YFB0503601.

References (40)

  • D. Birant et al.

    ST-DBSCAN: An algorithm for clustering spatial-temporal data

    Data & Knowledge Discovery

    (2007)
  • C. Brunsdon et al.

    Visualising space and time in crime patterns: A comparison of methods

    Computers, Environment and Urban Systems

    (2007)
  • Q. Liu et al.

    Spatio-temporal outliers detection within the space-time framework

    Journal of Remote Sensing

    (2011)
  • S. Barua et al.

    Parallel wavelet transform for spatio-temporal outlier detection in large meteorological data

    Intelligent Data Engineering and Automated Learning-IDEAL

    (2007)
  • Y. Benjamini et al.

    Controlling the false discovery rate: A practical and powerful approach to multiple testing

    Journal of the Royal Statistical Society

    (1995)
  • D. Birant et al.

    Spatio-temporal outlier detection in large databases

    Journal of Computing and Information Technology

    (2006)
  • C. Brunsdon et al.

    An assessment of the effectiveness of multiple hypothesis testing for geographical anomaly detection

    Environment and Planning B: Planning and Design

    (2011)
  • M. Caldas de Castro et al.

    Controlling the false discovery rate: A new application to account for multiple and dependent tests in local statistics of spatial association

    Geographical Analysis

    (2006)
  • S. Chawla et al.

    Inferring the root cause in road traffic anomalies

  • T. Cheng et al.

    Detecting emerging space-time crime patterns by prospective STSS

  • T. Cheng et al.

    Spatio-temporal autocorrelation of road network data

    Journal of Geographical Systems

    (2012)
  • T. Cheng et al.

    A multiscale approach for spatio-temporal outlier detection

    Transactions in GIS

    (2006)
  • T. Cheng et al.

    A dynamic spatial weight matrix and localized space–time autoregressive integrated moving average for network modeling

    Geographical Analysis

    (2014)
  • E. Delmelle et al.

    Visualizing the impact of space-time uncertainties on dengue fever patterns

    International Journal of Geographical Information Science

    (2014)
  • M. Deng et al.

    A general method of spatio-temporal clustering analysis

    Science China Information Sciences

    (2013)
  • Y. Ge et al.

    TOP-EYE: Top-k evolving trajectory outlier detection

  • A. Getis et al.

    The analysis of spatial association by use of distance statistics

    Geographical Analysis

    (1992)
  • D.M. Hawkins

    Identification of outliers

    (1980)
  • J.M. Kang et al.

    Discovering teleconnected flow anomalies: A relationship analysis of dynamic neighborhoods (RAD) approach

  • J.M. Kang et al.

    Discovering flow anomalies: A SWEET approach

  • Cited by (27)

    • A space-time flow LISA approach for panel flow data

      2023, Computers, Environment and Urban Systems
    • Revealing spatiotemporal matching patterns between traffic flux and road resources using big geodata - A case study of Beijing

      2022, Cities
      Citation Excerpt :

      Temporally, traffic flow was much higher at specific times (e.g., morning and evening peak) (Caceres et al., 2012; Tang et al., 2020), manifesting clusters which were found to demonstrate a temporal regularity caused by regular commuting (Ahas et al., 2010; Sevtsuk & Ratti, 2010). Spatially, based on the detection of traffic anomalies (Chawla et al., 2012; Djenouri et al., 2019; Shi et al., 2018) and high traffic flow patterns (Kharrat et al., 2009; Zheng et al., 2016), anomalies or clusters of traffic flow were discovered on arterial roads and in densely populated areas. The aforementioned studies revealed the disequilibrium of distribution of traffic flow and road resources respectively, which could partially explain traffic congestion.

    • Intelligent deep fusion network for urban traffic flow anomaly identification

      2022, Computer Communications
      Citation Excerpt :

      Mahalanobis distance is used to determine the similarity of individual traffic flows recorded at each second within different time frames. Shi et al. [18] proposed a dynamic neighborhood-based technique to identify local anomalies in spatiotemporal traffic flow data. The dynamic flow is first represented by the real-time vehicle speed data.

    • Estimating congestion zones and travel time indexes based on the floating car data

      2021, Computers, Environment and Urban Systems
      Citation Excerpt :

      The summary of the reviewed papers is presented in Table 1. Most of the state-of-the-art methods for traffic congestion estimation are based on the Global Positioning System (GPS) data (Kan et al., 2019; Kong et al., 2016; Zhao & Hu, 2019), but there are also the ones that use automatic number plate recognition system (video) (Shi, Deng, Yang, & Gong, 2018), connected vehicle technology (Zheng & Liu, 2017), Bluetooth (Beliakov, Gagolewski, James, Pace, Pastorello, Thilliez, & Vasa, 2018), loop detectors, etc. Regarding the acquisition cost, FCD have relatively low acquisition cost compared to the other data sources and cover a larger road network, which was suitable for the methodology proposed in our study.

    • Detecting anomalous spatial interaction patterns by maximizing urban population carrying capacity

      2021, Computers, Environment and Urban Systems
      Citation Excerpt :

      From the dynamic microscopic perspective, some researchers modeled the moving behavior on single or multiple objects mathematically to recognize anomalous inter-area interactions (Huang, 2015; Yuan, Liu, & Wei, 2017). Shi et al. (2018) designed an approach to identify anomalous patterns from traffic flow data by constructing dynamic neighborhoods (Shi, Deng, et al., 2018). Liu, Wu, et al. (2020) developed a network-constrained clustering method to statistically detect significant source or sink areas.

    • Discovering traffic congestion through traffic flow patterns generated by moving object trajectories

      2020, Computers, Environment and Urban Systems
      Citation Excerpt :

      Additionally, we consider a group of papers about the discovery of traffic anomalies and outliers. These can be discovered by comparing the flow inside the trajectories (Shi, Deng, Yang, & Gong, 2018; Wang, Wen, Yi, Zhu, & Sun, 2017) or between extracted regions (Chawla, Zheng, & Hu, 2012). Other examples are a search algorithm based on proposed distance metrics (Lu, Chen, & Hancock, 2009), the discovery of gathering patterns (Zheng, Zheng, Yuan, & Shang, 2013), or traffic outliers according to subnetworks (Dang, Silva, Singh, Swami, & Basu, 2016).

    View all citing articles on Scopus
    View full text