Mining distinct and contiguous sequential patterns from large vehicle trajectories☆
Introduction
Due to the affordability and widespread availability of GPS technology, the generation and collection of vehicle movements, or trajectories, is relatively straightforward and cost effective. These vehicle trajectories present a valuable opportunity to extract knowledge in domains such as urban planning [1], route planning [2], and traffic congestion [3]. In this work we focus on extracting this knowledge from vehicle trajectories through Sequential Pattern Mining (SPM). SPM is the process of finding frequently occurring sequences within a sequence database. However, SPM of vehicle trajectories is difficult for two reasons: (1) SPM requires sequences of discrete items; however, because GPS technology suffers from spatial uncertainty and urban black holes the trajectory recordings are often quite noisy and far from discrete; (2) vehicle trajectories commonly contain hundreds of thousands, if not millions, of recordings — which cause many existing SPM approaches to have massive, redundant, and therefore incomprehensible pattern outputs [4], [5].
The first problem is solved in other work [6] that constrains the vehicle trajectories to the appropriate road network as a pre-processing step. Doing so removes spatial uncertainty and converts the trajectories into discrete sequences of road node visitations. However, the second problem of mining a smaller, less redundant, set of sequential patterns from the vehicle trajectories still remains. One approach towards alleviating this problem is to only mine a set of patterns that does not allow any sub-patterns in the output, in other words, to mine the so-called max patterns.
Additionally, the pattern output can be further reduced by enforcing a constraint on the discovered sequential patterns that requires the items in the candidate patterns to exist contiguously in the underlying sequence database. This constraint is called contiguous SPM [7], [8] and is well suited to vehicle trajectories because it guarantees that the resulting vehicle patterns will always travel along real-world routes. Whereas, without this constraint sequential patterns may be discovered that jump from place to place.
However, even with max contiguous SPM, the pattern output can still become highly redundant if the vehicle sequences are sufficiently homogeneous. To illustrate this problem we present an example in Fig. 1, which is a simplified scenario containing six vehicles moving through an intersection. In Table 1, we present the result of mining the set contiguous max patterns from this example using a support of two (i.e ).
From Table 1, we observe that even the frequent contiguous sequential patterns can become quite redundant. Specifically, three-quarters of each pattern is repeated redundantly. We highlight that our scenario is not an exceptional or contrived case; but rather, demonstrates an issue that is even further exacerbated when mining large real-world vehicle trajectories. Therefore, to solve this problem we present our algorithm to mine a set of Distinct Contiguous Sequential PAtterNs (DC-SPAN).
Section snippets
Literature review
We briefly discuss main differences between Association Rules Mining (ARM) and SPM, and review some recent approaches. Since our work has a strong basis in both SPM and trajectory data mining, and therefore we review relevant works from both of these fields subsequently.
Problem statement
Our problem is that given a highly repetitious database of vehicle sequences and their underlying road network, we wish to find the most travelled segments of the underlying road network. Using existing data mining approaches, we can easily find the set of closed-contiguous or max-contiguous sequential patterns that exist at some user-specified minimum support. However, because real-world vehicle sequences databases are so large, with many shared roads that branch off, the output contains too
Methodology
In Fig. 2, we present our overall framework for mining distinct contiguous sequential patterns from vehicle trajectories. Briefly, each stage in Fig. 2 of our framework is follows:
- 1.
Raw vehicle trajectories. The purpose of our framework is to extract a set of patterns that represents frequent routes that vehicles have taken within their relevant road networks. The vehicle trajectories we consider in this work are all recorded using GPS and are stored as plain-text files as sequences of
Results and discussion
In order to gauge the efficiency and effectiveness of DC-SPAN at mining distinct contiguous sequential patterns from large vehicle trajectories, we conducted experiments measuring running time, compression, distinctness, and redundancy. Where appropriate we compared DC-SPAN against other contiguous SPM algorithms that mined all, closed, and max patterns. One problem we faced is that we planned to use CM-SPAM [23] to mine the set of all contiguous patterns and VMSP [39] to mine the set of all
Conclusion
Although there exist many efficient and effective SPM algorithms, none of them are particularly well suited for mining large vehicle trajectories where the patterns should ideally both be contiguous and have their redundancy controlled. In this work, we have presented our approach, DC-SPAN, to solve this problem. Through experimentation we have shown that DC-SPAN is able to mine distinct, non-redundant, contiguous sequential patterns from large and varied real-world vehicle trajectories with
References (44)
- et al.
Urban traffic congestion estimation and prediction based on floating car trajectory data
Future Gener. Comput. Syst.
(2016) - et al.
Ccspan: Mining closed contiguous sequential patterns
Knowl.-Based Syst.
(2015) - et al.
Mining co-distribution patterns for large crime datasets
Expert Syst. Appl.
(2012) - et al.
Train-movement situation recognition for safety justification using moving-horizon tbm-based multisensor data fusion
Knowl.-Based Syst.
(2019) - et al.
Safety justification of train movement dynamic processes using evidence theory and reference models
Knowl.-Based Syst.
(2018) - et al.
An approach to compute user similarity for gps applications
Knowl.-Based Syst.
(2016) - et al.
Learning traffic signal phase and timing information from low-sampling rate taxi gps trajectories
Knowl.-Based Syst.
(2016) - et al.
Mining sequential patterns for protein fold recognition
J. Biomed. Inform.
(2008) - et al.
Evaluation of traffic data obtained via gps-enabled mobile phones: The mobile century field experiment
Transp. Res. C
(2010) - et al.
Urban computing: Concepts, methodologies, and applications
ACM Trans. Intell. Syst. Technol.
(2014)
Discovering popular routes from trajectories
Clustering of vehicle trajectories
IEEE Trans. Intell. Transp. Syst.
Discovering popular routes from trajectories
Travel time estimation of a path using sparse trajectories
Mining and visual exploration of closed contiguous sequential patterns in trajectories
Int. J. Geogr. Inf. Sci.
Reducing uninteresting spatial association rules in geographic databases using background knowledge: A summary of results
Int. J. Geogr. Inf. Sci.
Semantic-based pruning of redundant and uninteresting frequent geographic patterns
GeoInformatica
Urban crime analysis through areal categorized multivariate associations mining
Appl. Artif. Intell.
Exploration of massive crime data sets through data mining techniques
Appl. Artif. Intell.
Mining sequential patterns: Generalizations and performance improvements
Lapin: Effective sequential pattern mining algorithms by last position induction for dense databases
Sequential pattern mining using a bitmap representation
Cited by (22)
Towards utility-driven contiguous sequential patterns in uncertain multi-sequences
2024, Knowledge-Based SystemsTargeted mining of contiguous sequential patterns
2024, Information SciencesContext-aware road travel time estimation by coupled tensor decomposition based on trajectory data
2022, Knowledge-Based SystemsAn efficient parallel algorithm for mining weighted clickstream patterns
2022, Information SciencesCitation Excerpt :Besides using weights, other researchers suggested using multiple constraints [16,25,29,33,36,37]. Clickstream pattern mining has numerous applications [8,14,23] (e.g. web log analysis and intrusion detection). However, most studies applied SPM algorithms to mine clickstream patterns rather than using or developing specialized CPM algorithms.
Mining truck platooning patterns through massive trajectory data
2021, Knowledge-Based SystemsCitation Excerpt :There is a myriad of successful stories applying varying data mining techniques such as support vector machine [36], group method of data handling [37], euro-fuzzy inference system [38,39], fuzzy Delphi technique [40] and Bayesian network [41]. Among them, trajectory data mining techniques have been widely adopted in sequential pattern mining [42], frequent trajectory pattern mining [43], traffic signal timing learning [44], and user similarity computation [45], and is thus considered the most promising tool to address the issue of connected and autonomous vehicle (CAV) trajectory planning. However, limit effort has been made to investigate how truck trajectory data can automate freight industry.
Damped sliding based utility oriented pattern mining over stream data
2021, Knowledge-Based SystemsCitation Excerpt :Frequent pattern mining (FPM) [2–4], a traditional pattern mining technique, is suitable for analyzing binary data, which contains only information about the presence or absence of items, and it finds frequent patterns with support no less than a given threshold. Studies expanded pattern mining include high average utility pattern mining [5,6], top-k pattern mining [7–9], uncertain pattern mining [10–12], erasable pattern mining [13–15], sequential pattern mining [16,17], and weighted pattern mining [18–20]. Among these, high utility pattern mining (HUPM) extracts meaningful pattern by processing the non-binary data containing information about the features of the items.
- ☆
No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.knosys.2019.105076.