An effective and versatile distance measure for spatiotemporal trajectories

Naderivesal, Somayeh; Kulik, Lars; Bailey, James

doi:10.1007/s10618-019-00615-5

An effective and versatile distance measure for spatiotemporal trajectories

Published: 06 February 2019

Volume 33, pages 577–606, (2019)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

803 Accesses
3 Citations
Explore all metrics

Abstract

The analysis of large-scale trajectory data has tremendous benefits for applications ranging from transportation planning to traffic management. A fundamental building block for the analysis of such data is the computation of similarity between trajectories. Existing work for similarity computation focuses mainly on the spatial aspects of trajectories, but more rarely takes into account time in conjunction with space. A key challenge when considering time is how to handle trajectories that are sampled asynchronously or at variable rates, which can lead to uncertainty. To tackle this problem, we quantify trajectory similarity as an interval, rather than a single value, to capture the uncertainty that can result from different sampling rates and asynchronous sampling. Based on this perspective, we develop a new trajectory similarity measure, Trajectory Interval Distance Estimation, which models similarity computation as a convex optimisation problem. Using two real datasets, we demonstrate that our proposed measure is extremely effective for assessing similarity in comparison to existing state of the art measures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey of trajectory distance measures and performance evaluation

Article 18 October 2019

Han Su, Shuncheng Liu, … Kai Zheng

TRAJEDI: Trajectory Dissimilarity

Trajectory similarity clustering based on multi-feature distance measurement

Article 12 January 2019

Qingying Yu, Yonglong Luo, … Shigang Chen

Notes

https://en.wikipedia.org/wiki/Ramer-Douglas-Peucker_algorithm. Accessed 20 Jun 2018.

References

Agrawal R, Faloutsos C, Swami A (1993) Efficient similarity search in sequence databases. International conference on foundations of data organization and algorithms. Springer, Heidelberg, pp 69–84
Chapter Google Scholar
Alt H, Godau M (1995) Computing the Fréchet distance between two polygonal curves. Int J Comput Geom Appl 5(01n02):75–91
Berndt DJ, Clifford J (1994) Using dynamic time warping to find patterns in time series. KDD Workshop 10(16):359–370
Google Scholar
Biagioni J, Eriksson J (2012) Map inference in the face of noise and disparity. In: Proceedings of the 20th international conference on advances in geographic information systems. ACM, pp 79–88
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge
Book MATH Google Scholar
Chen L, Ng R (2004) On the marriage of lp-norms and edit distance. In: Proceedings of the 30th international conference on very large data bases, vol 30. VLDB Endowment, pp 792–803
Chen L, Ozsu MT, Oria V (2005) Robust and fast similarity search for moving object trajectories. In: Proceedings of the 2005 ACM international conference on management of data (ACM SIGMOD). ACM, pp 491–502
Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. In: Proceedings of the 30th international conference on very large data, vol 1, no 2. VLDB Endowment, pp 1542–1552
Eiter T, Mannila H (1994) Computing discrete Fréchet distance. In: Tech. report CD-TR 94/64, Information Systems Department, Technical University of Vienna
Esling P, Agon C (2012) Time-series data mining. ACM Comput Surv (CSUR) 45(1):12
Google Scholar
Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. ACM 23(2):419–429
Google Scholar
Frentzos E, Gratsias K, Theodoridis Y (2007) Index-based most similar trajectory search. In: Proceedings of the 23rd international conference on data engineering (ICDE). IEEE, pp 816–825
Järvelin K, Kekäläinen J (2002) Cumulated gain-based evaluation of IR techniques. ACM T Infor Syst 20(4):422–446
Article Google Scholar
Keogh EJ, Pazzani MJ (1998) An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback. Kdd 98(1):239–243
Google Scholar
Keogh E, Ratanamahatana CA (2005) Exact indexing of dynamic time warping. Knowl Inf Syst 7(3):358–86
Article Google Scholar
Kuijpers B, Moelans B, Othman W, Vaisman A (2009) Analyzing trajectories using uncertainty and background information. International symposium on spatial and temporal databases. Springer, Heidelberg, pp 135–152
Chapter Google Scholar
Laube P, Imfeld S (2002) Analyzing relative motion within groups of trackable moving point objects. International conference on geographic information science. Springer, Heidelberg, pp 132–144
Chapter Google Scholar
Lee JG, Han J, Whang KY (2007) Trajectory clustering: a partition-and-group framework. In: Proceedings of the international conference on management of data (ACM SIGMOD). ACM, pp 593–604
Lin B, Su J (2005) Shapes based trajectory queries for moving objects. In: Proceedings of the 13th annual ACM international workshop on geographic information systems. ACM, pp 21–30
Mamoulis N, Cao H, Kollios G, Hadjieleftheriou M, Tao Y, Cheung DW (2004) Mining, indexing, and querying historical spatiotemporal data. In: Proceedings of the 10th international conference on knowledge discovery and data mining (ACM SIGKDD). ACM, pp 236–245
Paparrizos J, Gravano L (2015) k-shape: Efficient and accurate clustering of time series. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data. ACM, pp 1855–1870
Pelekis N, Kopanakis I, Marketos G, Ntoutsi I, Andrienko G, Theodoridis Y (2007) Similarity search in trajectory databases. In: 14th international symposium on temporal representation and reasoning. IEEE, pp 129–140
Piorkowski M, Sarafijanovic-Djukic N, Grossglauser M (2009) A parsimonious model of mobile partitioned networks with clustering. In: First international communication systems and networks and workshops. IEEE, pp 1–10
Ranu S, Deepak P, Telang AD, Deshpande P, Raghavan S (2015) Indexing and matching trajectories under inconsistent sampling rates. In: Proceeding of IEEE 31st international conference on data engineering (ICDE). IEEE, pp 999–1010
Su H, Zheng K, Wang H, Huang J, Zhou X (2013) Calibrating trajectory data for similarity-based analysis. In: Proceedings of the ACM international conference on management of data (ACM SIGMOD). ACM, pp 833–844
Tang B, Yiu ML, Mouratidis K, Wang K (2017) Efficient motif discovery in spatial trajectories using discrete fréchet distance. In: International conference on extending database technology (EDBT)
Trajcevski G, Ding H, Scheuermann P, Tamassia R, Vaccaro D (2007) Dynamics-aware similarity of moving objects trajectories. In: Proceedings of the 15th annual ACM international symposium on Advances in geographic information systems. ACM, p 11
Vlachos M, Hadjieleftheriou M, Gunopulos D, Keogh E (2003) Indexing multi-dimensional time-series with support for multiple distance measures. In: Proceedings of the 9th ACM international conference on knowledge discovery and data mining (ACM SIGKDD). ACM, pp 216–225
Vlachos M, Kollios G, Gunopulos D (2002) Discovering similar multidimensional trajectories. In: Proceedings of the 18th international conference on data engineering (ICDE). IEEE, pp 673–684
Yuan J, Zheng Y, Zhang C, Xie W, Xie X, Sun G, Huang Y (2010) T-drive: driving directions based on taxi trajectories. In: Proceedings of 18th international conference on advances in geographic information systems. ACM, pp 99–108
Zheng K, Trajcevski G, Zhou X, Scheuermann P (2011) Probabilistic range queries for uncertain trajectories on road networks. In: Proceedings of the 14th international conference on extending database technology. ACM, pp 283–294
Zheng K, Zheng Y, Xie X, Zhou X (2012) Reducing uncertainty of low-sampling-rate trajectories. In: IEEE 28th international conference on data engineering (ICDE). IEEE, pp 1144–1155

Download references

Author information

Authors and Affiliations

School of Computing and Information Systems, The University of Melbourne, Melbourne, VIC, Australia
Somayeh Naderivesal, Lars Kulik & James Bailey

Authors

Somayeh Naderivesal
View author publications
You can also search for this author in PubMed Google Scholar
Lars Kulik
View author publications
You can also search for this author in PubMed Google Scholar
James Bailey
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Somayeh Naderivesal.

Additional information

Responsible editor: Srinivasan Parthasarathy.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A Appendix

1.1 A.1 Accuracy using NDCG measure

In this section, we use Normalized Discounted Cumulative Gain (Järvelin and Kekäläinen 2002) to examine the accuracy of different measures that we mentioned in Sect. 5 for Cabspotting dataset.

For every measure, similar to Sect. 5, the ground truth is built by computing k nearest neighbours (k-NN) set for a given query trajectory from a given set of original reference trajectories DB. The trajectories in k-NN are ordered from most similar to less similar trajectory. Then we give the relevance value of k to the first trajectory (the most similar trajectory) and relevance value of 1 to the last trajectory. We compute k-NN’ for the same query trajectory from the dataset with lower sampling rate. We eliminate all the trajectories in set k-NN’ - k-NN (k-NN’ = k-NN’ \(\cap \)k-NN). Then, we assign a relevance to each of them based on their rank in k-NN’. For example, take \(T_1\) to \(T_{10}\) as reference trajectories and Q as a given query trajectory. Using one of measures 4-NN set for Q is \(<T_4, T_9, T_2, T_7>\) as the most similar trajectories in order (it means that \(T_4\) is the most similar trajectory to Q and \(T_9\) is the second most similar trajectory to Q and so on.). We give a relevance value based on their similarity rank (\(<4,3,2,1>\)). It means that relevance value for \(T_4\) is 4 which means it has highest relevance to Q. Also, we generate a lower sampled version of \(T_1\) to \(T_10\) by choosing \(50\%\) of their sampled points randomly and build trajectories \(T'_1\) to \(T'_10\). Then we extract 4-NN’ for the query trajectory Q from \(T'_1\) to \(T'_{10}\). The ideal situation is to extract the same set of similar trajectories with the same order so that 4-NN and 4-NN’ have the NDCG of 1. However, if a measure extracts \(<T_9, T_4, T_5, T_7> \)the same similar trajectories in order, we eliminate \(T_5\) as it is not in k-NN. Then, the relevance values for the set \(<T_4, T_9, T_2, T_7>\) using k-NN’ is \(<3,4,0,1>\) (\(T_4\) has the relevance of 3 in k-NN’ and \(T_9\) has the relevance of 4, \(T_2\) is not in k-NN’ and \(T_7\) has the relevance of 1. Indeed, the ideal relevance for the set \(<T_4, T_9, T_2, T_7>\) using that given measure is \(<4,3,2,1>\), however, for the lower sampled version of trajectories, it is \(<3,4,0,1>\). Then we compute DCG for \(<4,3,2,1>\) as the ideal DCG (IDCG) and for \(<3,4,0,1>\) as given DCG (GDCG). Then, we divide GDCG by IDCG.

Figure 13 shows the results for cabspotting dataset. Similar to the results of Spearman’s rank correlation, TIDE and TIDE* have higher accuracy in comparison to other measures. However, since NDCG does not penalize for “bad“ trajectories in the results, we see better results in comparison to Spearman’s rank correlation.

1.2 A.2 The impact of estimated maximum speed

As discussed before, when we do not have information about speed limits of an object, we estimate the maximum speed of the object using sampled points of its trajectories (Sect. 3). In this experiment, we want to verify the impact of “estimating“ the maximum speed. In other words, we may underestimate the maximum speed and we want to see the impact of increasing the estimated maximum speed on the accuracy. The ground truth is the same as the previous, however, we increase the estimated maximum speed for the lower-sampled version of the Cabspotting dataset. The outcome is that, there is not a considerable impact on the accuracy. As an example, in Fig. 14, we show the results for different sampling rates experiment in Fig. 9c. TIDE-Speed1 and TIDE*-Speed1 show the results for the original speed estimation (Fig. 9c) and TIDE-Speed2 and TIDE*-Speed2 show the results for the increased speed (by 25 percent).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Naderivesal, S., Kulik, L. & Bailey, J. An effective and versatile distance measure for spatiotemporal trajectories. Data Min Knowl Disc 33, 577–606 (2019). https://doi.org/10.1007/s10618-019-00615-5

Download citation

Received: 04 November 2017
Accepted: 21 January 2019
Published: 06 February 2019
Issue Date: 15 May 2019
DOI: https://doi.org/10.1007/s10618-019-00615-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An effective and versatile distance measure for spatiotemporal trajectories

Abstract

Access this article

Similar content being viewed by others