Abstract
The analysis of large-scale trajectory data has tremendous benefits for applications ranging from transportation planning to traffic management. A fundamental building block for the analysis of such data is the computation of similarity between trajectories. Existing work for similarity computation focuses mainly on the spatial aspects of trajectories, but more rarely takes into account time in conjunction with space. A key challenge when considering time is how to handle trajectories that are sampled asynchronously or at variable rates, which can lead to uncertainty. To tackle this problem, we quantify trajectory similarity as an interval, rather than a single value, to capture the uncertainty that can result from different sampling rates and asynchronous sampling. Based on this perspective, we develop a new trajectory similarity measure, Trajectory Interval Distance Estimation, which models similarity computation as a convex optimisation problem. Using two real datasets, we demonstrate that our proposed measure is extremely effective for assessing similarity in comparison to existing state of the art measures.
Similar content being viewed by others
Notes
https://en.wikipedia.org/wiki/Ramer-Douglas-Peucker_algorithm. Accessed 20 Jun 2018.
References
Agrawal R, Faloutsos C, Swami A (1993) Efficient similarity search in sequence databases. International conference on foundations of data organization and algorithms. Springer, Heidelberg, pp 69–84
Alt H, Godau M (1995) Computing the Fréchet distance between two polygonal curves. Int J Comput Geom Appl 5(01n02):75–91
Berndt DJ, Clifford J (1994) Using dynamic time warping to find patterns in time series. KDD Workshop 10(16):359–370
Biagioni J, Eriksson J (2012) Map inference in the face of noise and disparity. In: Proceedings of the 20th international conference on advances in geographic information systems. ACM, pp 79–88
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge
Chen L, Ng R (2004) On the marriage of lp-norms and edit distance. In: Proceedings of the 30th international conference on very large data bases, vol 30. VLDB Endowment, pp 792–803
Chen L, Ozsu MT, Oria V (2005) Robust and fast similarity search for moving object trajectories. In: Proceedings of the 2005 ACM international conference on management of data (ACM SIGMOD). ACM, pp 491–502
Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. In: Proceedings of the 30th international conference on very large data, vol 1, no 2. VLDB Endowment, pp 1542–1552
Eiter T, Mannila H (1994) Computing discrete Fréchet distance. In: Tech. report CD-TR 94/64, Information Systems Department, Technical University of Vienna
Esling P, Agon C (2012) Time-series data mining. ACM Comput Surv (CSUR) 45(1):12
Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. ACM 23(2):419–429
Frentzos E, Gratsias K, Theodoridis Y (2007) Index-based most similar trajectory search. In: Proceedings of the 23rd international conference on data engineering (ICDE). IEEE, pp 816–825
Järvelin K, Kekäläinen J (2002) Cumulated gain-based evaluation of IR techniques. ACM T Infor Syst 20(4):422–446
Keogh EJ, Pazzani MJ (1998) An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback. Kdd 98(1):239–243
Keogh E, Ratanamahatana CA (2005) Exact indexing of dynamic time warping. Knowl Inf Syst 7(3):358–86
Kuijpers B, Moelans B, Othman W, Vaisman A (2009) Analyzing trajectories using uncertainty and background information. International symposium on spatial and temporal databases. Springer, Heidelberg, pp 135–152
Laube P, Imfeld S (2002) Analyzing relative motion within groups of trackable moving point objects. International conference on geographic information science. Springer, Heidelberg, pp 132–144
Lee JG, Han J, Whang KY (2007) Trajectory clustering: a partition-and-group framework. In: Proceedings of the international conference on management of data (ACM SIGMOD). ACM, pp 593–604
Lin B, Su J (2005) Shapes based trajectory queries for moving objects. In: Proceedings of the 13th annual ACM international workshop on geographic information systems. ACM, pp 21–30
Mamoulis N, Cao H, Kollios G, Hadjieleftheriou M, Tao Y, Cheung DW (2004) Mining, indexing, and querying historical spatiotemporal data. In: Proceedings of the 10th international conference on knowledge discovery and data mining (ACM SIGKDD). ACM, pp 236–245
Paparrizos J, Gravano L (2015) k-shape: Efficient and accurate clustering of time series. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data. ACM, pp 1855–1870
Pelekis N, Kopanakis I, Marketos G, Ntoutsi I, Andrienko G, Theodoridis Y (2007) Similarity search in trajectory databases. In: 14th international symposium on temporal representation and reasoning. IEEE, pp 129–140
Piorkowski M, Sarafijanovic-Djukic N, Grossglauser M (2009) A parsimonious model of mobile partitioned networks with clustering. In: First international communication systems and networks and workshops. IEEE, pp 1–10
Ranu S, Deepak P, Telang AD, Deshpande P, Raghavan S (2015) Indexing and matching trajectories under inconsistent sampling rates. In: Proceeding of IEEE 31st international conference on data engineering (ICDE). IEEE, pp 999–1010
Su H, Zheng K, Wang H, Huang J, Zhou X (2013) Calibrating trajectory data for similarity-based analysis. In: Proceedings of the ACM international conference on management of data (ACM SIGMOD). ACM, pp 833–844
Tang B, Yiu ML, Mouratidis K, Wang K (2017) Efficient motif discovery in spatial trajectories using discrete fréchet distance. In: International conference on extending database technology (EDBT)
Trajcevski G, Ding H, Scheuermann P, Tamassia R, Vaccaro D (2007) Dynamics-aware similarity of moving objects trajectories. In: Proceedings of the 15th annual ACM international symposium on Advances in geographic information systems. ACM, p 11
Vlachos M, Hadjieleftheriou M, Gunopulos D, Keogh E (2003) Indexing multi-dimensional time-series with support for multiple distance measures. In: Proceedings of the 9th ACM international conference on knowledge discovery and data mining (ACM SIGKDD). ACM, pp 216–225
Vlachos M, Kollios G, Gunopulos D (2002) Discovering similar multidimensional trajectories. In: Proceedings of the 18th international conference on data engineering (ICDE). IEEE, pp 673–684
Yuan J, Zheng Y, Zhang C, Xie W, Xie X, Sun G, Huang Y (2010) T-drive: driving directions based on taxi trajectories. In: Proceedings of 18th international conference on advances in geographic information systems. ACM, pp 99–108
Zheng K, Trajcevski G, Zhou X, Scheuermann P (2011) Probabilistic range queries for uncertain trajectories on road networks. In: Proceedings of the 14th international conference on extending database technology. ACM, pp 283–294
Zheng K, Zheng Y, Xie X, Zhou X (2012) Reducing uncertainty of low-sampling-rate trajectories. In: IEEE 28th international conference on data engineering (ICDE). IEEE, pp 1144–1155
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Srinivasan Parthasarathy.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A Appendix
A Appendix
1.1 A.1 Accuracy using NDCG measure
In this section, we use Normalized Discounted Cumulative Gain (Järvelin and Kekäläinen 2002) to examine the accuracy of different measures that we mentioned in Sect. 5 for Cabspotting dataset.
For every measure, similar to Sect. 5, the ground truth is built by computing k nearest neighbours (k-NN) set for a given query trajectory from a given set of original reference trajectories DB. The trajectories in k-NN are ordered from most similar to less similar trajectory. Then we give the relevance value of k to the first trajectory (the most similar trajectory) and relevance value of 1 to the last trajectory. We compute k-NN’ for the same query trajectory from the dataset with lower sampling rate. We eliminate all the trajectories in set k-NN’ - k-NN (k-NN’ = k-NN’ \(\cap \)k-NN). Then, we assign a relevance to each of them based on their rank in k-NN’. For example, take \(T_1\) to \(T_{10}\) as reference trajectories and Q as a given query trajectory. Using one of measures 4-NN set for Q is \(<T_4, T_9, T_2, T_7>\) as the most similar trajectories in order (it means that \(T_4\) is the most similar trajectory to Q and \(T_9\) is the second most similar trajectory to Q and so on.). We give a relevance value based on their similarity rank (\(<4,3,2,1>\)). It means that relevance value for \(T_4\) is 4 which means it has highest relevance to Q. Also, we generate a lower sampled version of \(T_1\) to \(T_10\) by choosing \(50\%\) of their sampled points randomly and build trajectories \(T'_1\) to \(T'_10\). Then we extract 4-NN’ for the query trajectory Q from \(T'_1\) to \(T'_{10}\). The ideal situation is to extract the same set of similar trajectories with the same order so that 4-NN and 4-NN’ have the NDCG of 1. However, if a measure extracts \(<T_9, T_4, T_5, T_7> \)the same similar trajectories in order, we eliminate \(T_5\) as it is not in k-NN. Then, the relevance values for the set \(<T_4, T_9, T_2, T_7>\) using k-NN’ is \(<3,4,0,1>\) (\(T_4\) has the relevance of 3 in k-NN’ and \(T_9\) has the relevance of 4, \(T_2\) is not in k-NN’ and \(T_7\) has the relevance of 1. Indeed, the ideal relevance for the set \(<T_4, T_9, T_2, T_7>\) using that given measure is \(<4,3,2,1>\), however, for the lower sampled version of trajectories, it is \(<3,4,0,1>\). Then we compute DCG for \(<4,3,2,1>\) as the ideal DCG (IDCG) and for \(<3,4,0,1>\) as given DCG (GDCG). Then, we divide GDCG by IDCG.
Figure 13 shows the results for cabspotting dataset. Similar to the results of Spearman’s rank correlation, TIDE and TIDE* have higher accuracy in comparison to other measures. However, since NDCG does not penalize for “bad“ trajectories in the results, we see better results in comparison to Spearman’s rank correlation.
1.2 A.2 The impact of estimated maximum speed
As discussed before, when we do not have information about speed limits of an object, we estimate the maximum speed of the object using sampled points of its trajectories (Sect. 3). In this experiment, we want to verify the impact of “estimating“ the maximum speed. In other words, we may underestimate the maximum speed and we want to see the impact of increasing the estimated maximum speed on the accuracy. The ground truth is the same as the previous, however, we increase the estimated maximum speed for the lower-sampled version of the Cabspotting dataset. The outcome is that, there is not a considerable impact on the accuracy. As an example, in Fig. 14, we show the results for different sampling rates experiment in Fig. 9c. TIDE-Speed1 and TIDE*-Speed1 show the results for the original speed estimation (Fig. 9c) and TIDE-Speed2 and TIDE*-Speed2 show the results for the increased speed (by 25 percent).
Rights and permissions
About this article
Cite this article
Naderivesal, S., Kulik, L. & Bailey, J. An effective and versatile distance measure for spatiotemporal trajectories. Data Min Knowl Disc 33, 577–606 (2019). https://doi.org/10.1007/s10618-019-00615-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-019-00615-5