Abstract
Visualizing frequently occurring patterns and potentially unusual behaviors in trajectory can provide valuable insights into activities behind the data. In this paper, we introduce TrajViz, a motif (frequently repeated subsequences) based visualization software that detects patterns and anomalies by inducing “grammars” from discretized spatial trajectories. We consider patterns as a set of sub-trajectories with unknown lengths that are spatially similar to each other. We demonstrate that TrajViz has the capacity to help users visualize anomalies and patterns effectively.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
1 Introduction
With the rapid growth of tracking technology, a large amount of trajectory data are generated from users’ daily activities. Discovering frequently occurring patterns (motifs) and potentially unusual behaviors can be used to summarize the overwhelming amount of trajectories data and obtain meaningful knowledge. In this paper, we present TrajViz, a software that visualizes patterns and anomalies in trajectory datasets. TrajViz extends our previous work in time series motif discovery [1] to sub-trajectory pattern visualization. We consider patterns as a set of sub-trajectories with unknown lengths that are spatially similar to each other. We use a grid-based discretization approach to remove the speed information and adapt a grammar-based motif discovery algorithm, Iterative Sequitur (ItrSequitur), to discover the patterns. We design a user-friendly interface to allow visualization of repeated, as well as unusual sub-trajectories within the datasets.
2 Relate Work and Overview of TrajViz
Previously, we introduced a grammar-based motif discovery framework [7], which uses Sequitur [4], a grammar induction algorithm, to find approximate motifs of variable lengths in time series. However, the unique characteristics and challenges associated with spatial trajectory data make it unsuitable and difficult to apply the algorithms directly on trajectory data. In [5], the authors introduced STAVIS, a trajectory analytical system that uses grammar induction to infer variable-length patterns. However, its definition of “pattern” is based on time series motifs. Therefore, speed variation will significantly affect the quality of patterns discovered. Other work such as [2, 9] focuses on either sequential pattern mining based on important locations, or trajectory clustering, both of which are different from the goal of our software.
A screenshot of TrajViz is shown in Fig. 1. TrajViz follows the Visual Information-Seeking Mantra [8]. After processing the data, an overview heat map of pattern density is displayed. User can zoom in to see the detailed map and use domain knowledge to filter out unwanted patterns by setting minimum frequency, minimum continuous blocks length (Minimal Motif Length) and maximum frequency for anomaly detection (Anomaly Frequency). Adjusting these thresholds does not require re-running the discretization and grammar induction steps (introduced in the next subsection). Further details on TrajViz can be found in goo.gl/cKCeDt.
Screenshot of TrajViz and default view for San Franciso Taxi data [6]
3 Our Approach
3.1 Discretization
Before we can induce grammars on trajectory data, it is necessary to pre-process the data. We first convert the trajectory data to speed-insensitive symbolic sequences after removing noises from the trajectory dataset. To prepare for discretization, we divide the entire region into an \((\alpha \times \alpha )\) equal-frequency grid, where \(\alpha \) is the grid size. We assign each grid cell a block ID sequentially from left to right and from top to bottom.
After block IDs are assigned, we use a four-step procedure to convert raw trajectory to a block ID sequence \(S_{block}\). First, we up-sample the raw trajectory by using linear interpolation to ensure that the consecutive blocks in \(S_{block}\) are spatially adjacent. Then trajectories are converted into block ID sequences based on the order of traversal. Next, we perform further noise removal by removing blocks that are barely covered by the trajectory. Finally, numerosity reduction [3] is adopted to compress the sequence by only recording the first occurrence of consecutively repeating symbol. \(S_{block}\) is insensitive to speed variation. This is an important property that allows us to detect spatially-similar sub-trajectories.
Example of patterns detected in San Franciso Taxi Dataset [6] (a) Motif Heatmap (b) A pattern indicates a frequently visited route from the city to airport (c) An unusual (infrequent) round trip route (Color figure online)
3.2 Grammar Induction with ItrSequitur
As demonstrated in previous work [7], a context-free grammar summarizes the structure of an input sequence. Intuitively, repeated substrings in \(S_{block}\) represent a set of similar sub-trajectories. Therefore, learning a set of grammar rules to identify repeating substrings from \(S_{block}\) can discover frequently occurring patterns (sub-trajectories) in trajectory data. Previous work [5] utilizes Sequitur [4], a linear complexity grammar induction approach, to learn the grammar rules. However, Sequitur can only detect patterns if they have identical symbolic representation. In TrajViz, we adapt an iterative version of Sequitur, called ItrSequitur [1], for more robust grammar induction. ItrSequitur iteratively rewrites the input sequence based on the output of Sequitur and re-induces the grammar on the revised sequence until no new grammar can be found. Different from Sequitur, ItrSequitur allows small variation in matching substrings. Therefore, it is robust to noise in the dataset.
3.3 Patterns/Anomalies Discovery and Motif Heatmap
TrajViz consolidates the patterns detected by merging patterns that have similar symbolic representations. Top-ranked frequent patterns that satisfy user-defined filtering conditions are listed in the motifs/anomalies table. User can navigate the patterns by clicking through the items in the table; a zoom-in of the selected pattern is then shown on the right panel. Figure 2 shows screenshots of a motif and an anomaly detected. To show the direction of the trajectories, the start points are marked by black circles, and the end points are denoted by black squares.
For each point in a motif, we compute the point density by counting the number of points from other motifs within some distance threshold, and create a motif heatmap. A five-color gradient (blue-cyan-green-yellow-red) is built to linearly map the densities to their specific colors. The most dense points have the red colors while the least dense ones are in blue.
To find anomalies, we create a trajectory rule-density curve by counting the number of grammar rules covering each consecutive pair of block IDs (we consider a pair at a time in order to preserve the direction of the trajectory). The intuition is that, an anomalous subsequence would have zero or very few repetitions, hence low rule-density. TrajViz finds low-density subsequences within a trajectory and marks them as unusual routes (Fig. 2(c)).
4 Target Audience
TrajViz provides an efficient, interpretable, and user-interactive mechanism to understand functional activities behind massive trajectory data. TrajViz targets a diverse audience including researchers, practitioners, and scientists who are interested in discovering patterns in trajectory data.
References
Gao, Y., Lin, J., Rangwala, H.: Iterative grammar-based framework for discovering variable-length time series motifs. In: 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 7–12. IEEE (2016)
Lee, J.-G., Han, J., Li, X., Gonzalez, H.: Traclass: trajectory classification using hierarchical region-based and trajectory-based clustering. Proc. VLDB Endow. 1(1), 1081–1094 (2008)
Lin, J., Keogh, E., Wei, L., Lonardi, S.: Experiencing sax: a novel symbolic representation of time series. Data Min. Knowl. Disc. 15(2), 107–144 (2007)
Nevill-Manning, C.G., Witten, I.H.: Identifying hierarchical strcture in sequences: a linear-time algorithm. J. Artif. Intell. Res. (JAIR) 7, 67–82 (1997)
Oates, T., Boedihardjo, A.P., Lin, J., Chen, C., Frankenstein, S., Gandhi, S.: Motif discovery in spatial trajectories using grammar inference. In: Proceedings of the 22nd ACM International Conference on Conference on Information & Knowledge Management, pp. 1465–1468. ACM (2013)
Piorkowski, M., Sarafijanovic-Djukic, N., Grossglauser, M.: A parsimonious model of mobile partitioned networks with clustering. In: 2009 First International Communication Systems and Networks and Workshops, pp. 1–10. IEEE (2009)
Senin, P., Lin, J., Wang, X., Oates, T., Gandhi, S., Boedihardjo, A.P., Chen, C., Frankenstein, S., Lerner, M.: GrammarViz 2.0: a tool for grammar-based pattern discovery in time series. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8726, pp. 468–472. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44845-8_37
Shneiderman, B.: The eyes have it: a task by data type taxonomy for information visualizations. In: IEEE Symposium on Visual Languages, 1996. Proceedings, pp. 336–343. IEEE (1996)
Zheng, Y., Zhang, L., Xie, X., Ma, W.-Y.: Mining interesting locations and travel sequences from GPS trajectories. In: Proceedings of the 18th International Conference on World Wide Web, pp. 791–800. ACM (2009)
Acknowledgements
We would like to thank Ranjeev Mittu at the Naval Research Lab (NRL) for the support and valuable suggestions on our work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Gao, Y., Li, Q., Li, X., Lin, J., Rangwala, H. (2017). TrajViz: A Tool for Visualizing Patterns and Anomalies in Trajectory. In: Altun, Y., et al. Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2017. Lecture Notes in Computer Science(), vol 10536. Springer, Cham. https://doi.org/10.1007/978-3-319-71273-4_45
Download citation
DOI: https://doi.org/10.1007/978-3-319-71273-4_45
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71272-7
Online ISBN: 978-3-319-71273-4
eBook Packages: Computer ScienceComputer Science (R0)