Elsevier

Knowledge-Based Systems

Volume 189, 15 February 2020, 105076
Knowledge-Based Systems

Mining distinct and contiguous sequential patterns from large vehicle trajectories

https://doi.org/10.1016/j.knosys.2019.105076Get rights and content

Abstract

We focus on the problem of using contiguous SPM to extract succinct, redundancy controlled patterns from large vehicle trajectories. Although there exist several techniques to reduce the contiguous sequential pattern output such as closed and max SPM, they still produce massive redundant pattern outputs when the input sequence database is sufficiently large and homogeneous — as is often the case for vehicle trajectories. Therefore, in this work we propose DC-SPAN: a distinct contiguous SPM algorithm. DC-SPAN mines a set of sequential patterns where the maximum redundancy of the pattern output is controlled by a user-specified parameter. Through various experiments using real world trajectory datasets we show DC-SPAN effectively controls the redundancy of the pattern output with trade-offs in pattern distinctness. Additionally, our experiments also indicate that DC-SPAN efficiently computes these patterns, incurring only a marginal running time cost over existing state-of-the-art contiguous SPM approaches. Lastly, due to the less redundant and more succinct pattern output we also briefly explore visualisation as a useful technique to interpret the discovered vehicle routes.

Introduction

Due to the affordability and widespread availability of GPS technology, the generation and collection of vehicle movements, or trajectories, is relatively straightforward and cost effective. These vehicle trajectories present a valuable opportunity to extract knowledge in domains such as urban planning [1], route planning [2], and traffic congestion [3]. In this work we focus on extracting this knowledge from vehicle trajectories through Sequential Pattern Mining (SPM). SPM is the process of finding frequently occurring sequences within a sequence database. However, SPM of vehicle trajectories is difficult for two reasons: (1) SPM requires sequences of discrete items; however, because GPS technology suffers from spatial uncertainty and urban black holes the trajectory recordings are often quite noisy and far from discrete; (2) vehicle trajectories commonly contain hundreds of thousands, if not millions, of recordings — which cause many existing SPM approaches to have massive, redundant, and therefore incomprehensible pattern outputs [4], [5].

The first problem is solved in other work [6] that constrains the vehicle trajectories to the appropriate road network as a pre-processing step. Doing so removes spatial uncertainty and converts the trajectories into discrete sequences of road node visitations. However, the second problem of mining a smaller, less redundant, set of sequential patterns from the vehicle trajectories still remains. One approach towards alleviating this problem is to only mine a set of patterns that does not allow any sub-patterns in the output, in other words, to mine the so-called max patterns.

Additionally, the pattern output can be further reduced by enforcing a constraint on the discovered sequential patterns that requires the items in the candidate patterns to exist contiguously in the underlying sequence database. This constraint is called contiguous SPM [7], [8] and is well suited to vehicle trajectories because it guarantees that the resulting vehicle patterns will always travel along real-world routes. Whereas, without this constraint sequential patterns may be discovered that jump from place to place.

However, even with max contiguous SPM, the pattern output can still become highly redundant if the vehicle sequences are sufficiently homogeneous. To illustrate this problem we present an example in Fig. 1, which is a simplified scenario containing six vehicles moving through an intersection. In Table 1, we present the result of mining the set contiguous max patterns from this example using a support of two (i.e minSup=2).

From Table 1, we observe that even the frequent contiguous sequential patterns can become quite redundant. Specifically, three-quarters of each pattern is repeated redundantly. We highlight that our scenario is not an exceptional or contrived case; but rather, demonstrates an issue that is even further exacerbated when mining large real-world vehicle trajectories. Therefore, to solve this problem we present our algorithm to mine a set of Distinct Contiguous Sequential PAtterNs (DC-SPAN).

Section snippets

Literature review

We briefly discuss main differences between Association Rules Mining (ARM) and SPM, and review some recent approaches. Since our work has a strong basis in both SPM and trajectory data mining, and therefore we review relevant works from both of these fields subsequently.

Problem statement

Our problem is that given a highly repetitious database of vehicle sequences and their underlying road network, we wish to find the most travelled segments of the underlying road network. Using existing data mining approaches, we can easily find the set of closed-contiguous or max-contiguous sequential patterns that exist at some user-specified minimum support. However, because real-world vehicle sequences databases are so large, with many shared roads that branch off, the output contains too

Methodology

In Fig. 2, we present our overall framework for mining distinct contiguous sequential patterns from vehicle trajectories. Briefly, each stage in Fig. 2 of our framework is follows:

  • 1.

    Raw vehicle trajectories. The purpose of our framework is to extract a set of patterns that represents frequent routes that vehicles have taken within their relevant road networks. The vehicle trajectories we consider in this work are all recorded using GPS and are stored as plain-text files as sequences of

Results and discussion

In order to gauge the efficiency and effectiveness of DC-SPAN at mining distinct contiguous sequential patterns from large vehicle trajectories, we conducted experiments measuring running time, compression, distinctness, and redundancy. Where appropriate we compared DC-SPAN against other contiguous SPM algorithms that mined all, closed, and max patterns. One problem we faced is that we planned to use CM-SPAM [23] to mine the set of all contiguous patterns and VMSP [39] to mine the set of all

Conclusion

Although there exist many efficient and effective SPM algorithms, none of them are particularly well suited for mining large vehicle trajectories where the patterns should ideally both be contiguous and have their redundancy controlled. In this work, we have presented our approach, DC-SPAN, to solve this problem. Through experimentation we have shown that DC-SPAN is able to mine distinct, non-redundant, contiguous sequential patterns from large and varied real-world vehicle trajectories with

References (44)

  • ChenZ. et al.

    Discovering popular routes from trajectories

  • AtevS. et al.

    Clustering of vehicle trajectories

    IEEE Trans. Intell. Transp. Syst.

    (2010)
  • ChenZ. et al.

    Discovering popular routes from trajectories

  • WangY. et al.

    Travel time estimation of a path using sparse trajectories

  • YangC. et al.

    Mining and visual exploration of closed contiguous sequential patterns in trajectories

    Int. J. Geogr. Inf. Sci.

    (2018)
  • BogornyV. et al.

    Reducing uninteresting spatial association rules in geographic databases using background knowledge: A summary of results

    Int. J. Geogr. Inf. Sci.

    (2008)
  • BogornyV. et al.

    Semantic-based pruning of redundant and uninteresting frequent geographic patterns

    GeoInformatica

    (2010)
  • LeeI. et al.

    Urban crime analysis through areal categorized multivariate associations mining

    Appl. Artif. Intell.

    (2008)
  • LeeI. et al.

    Exploration of massive crime data sets through data mining techniques

    Appl. Artif. Intell.

    (2011)
  • SrikantR. et al.

    Mining sequential patterns: Generalizations and performance improvements

  • YangZ. et al.

    Lapin: Effective sequential pattern mining algorithms by last position induction for dense databases

  • AyresJ. et al.

    Sequential pattern mining using a bitmap representation

  • Cited by (22)

    • An efficient parallel algorithm for mining weighted clickstream patterns

      2022, Information Sciences
      Citation Excerpt :

      Besides using weights, other researchers suggested using multiple constraints [16,25,29,33,36,37]. Clickstream pattern mining has numerous applications [8,14,23] (e.g. web log analysis and intrusion detection). However, most studies applied SPM algorithms to mine clickstream patterns rather than using or developing specialized CPM algorithms.

    • Mining truck platooning patterns through massive trajectory data

      2021, Knowledge-Based Systems
      Citation Excerpt :

      There is a myriad of successful stories applying varying data mining techniques such as support vector machine [36], group method of data handling [37], euro-fuzzy inference system [38,39], fuzzy Delphi technique [40] and Bayesian network [41]. Among them, trajectory data mining techniques have been widely adopted in sequential pattern mining [42], frequent trajectory pattern mining [43], traffic signal timing learning [44], and user similarity computation [45], and is thus considered the most promising tool to address the issue of connected and autonomous vehicle (CAV) trajectory planning. However, limit effort has been made to investigate how truck trajectory data can automate freight industry.

    • Damped sliding based utility oriented pattern mining over stream data

      2021, Knowledge-Based Systems
      Citation Excerpt :

      Frequent pattern mining (FPM) [2–4], a traditional pattern mining technique, is suitable for analyzing binary data, which contains only information about the presence or absence of items, and it finds frequent patterns with support no less than a given threshold. Studies expanded pattern mining include high average utility pattern mining [5,6], top-k pattern mining [7–9], uncertain pattern mining [10–12], erasable pattern mining [13–15], sequential pattern mining [16,17], and weighted pattern mining [18–20]. Among these, high utility pattern mining (HUPM) extracts meaningful pattern by processing the non-binary data containing information about the features of the items.

    View all citing articles on Scopus

    No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.knosys.2019.105076.

    View full text