Skip to main content
Log in

An effective and versatile distance measure for spatiotemporal trajectories

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

The analysis of large-scale trajectory data has tremendous benefits for applications ranging from transportation planning to traffic management. A fundamental building block for the analysis of such data is the computation of similarity between trajectories. Existing work for similarity computation focuses mainly on the spatial aspects of trajectories, but more rarely takes into account time in conjunction with space. A key challenge when considering time is how to handle trajectories that are sampled asynchronously or at variable rates, which can lead to uncertainty. To tackle this problem, we quantify trajectory similarity as an interval, rather than a single value, to capture the uncertainty that can result from different sampling rates and asynchronous sampling. Based on this perspective, we develop a new trajectory similarity measure, Trajectory Interval Distance Estimation, which models similarity computation as a convex optimisation problem. Using two real datasets, we demonstrate that our proposed measure is extremely effective for assessing similarity in comparison to existing state of the art measures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. https://en.wikipedia.org/wiki/Ramer-Douglas-Peucker_algorithm. Accessed 20 Jun 2018.

References

  • Agrawal R, Faloutsos C, Swami A (1993) Efficient similarity search in sequence databases. International conference on foundations of data organization and algorithms. Springer, Heidelberg, pp 69–84

    Chapter  Google Scholar 

  • Alt H, Godau M (1995) Computing the Fréchet distance between two polygonal curves. Int J Comput Geom Appl 5(01n02):75–91

  • Berndt DJ, Clifford J (1994) Using dynamic time warping to find patterns in time series. KDD Workshop 10(16):359–370

    Google Scholar 

  • Biagioni J, Eriksson J (2012) Map inference in the face of noise and disparity. In: Proceedings of the 20th international conference on advances in geographic information systems. ACM, pp 79–88

  • Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Chen L, Ng R (2004) On the marriage of lp-norms and edit distance. In: Proceedings of the 30th international conference on very large data bases, vol 30. VLDB Endowment, pp 792–803

  • Chen L, Ozsu MT, Oria V (2005) Robust and fast similarity search for moving object trajectories. In: Proceedings of the 2005 ACM international conference on management of data (ACM SIGMOD). ACM, pp 491–502

  • Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. In: Proceedings of the 30th international conference on very large data, vol 1, no 2. VLDB Endowment, pp 1542–1552

  • Eiter T, Mannila H (1994) Computing discrete Fréchet distance. In: Tech. report CD-TR 94/64, Information Systems Department, Technical University of Vienna

  • Esling P, Agon C (2012) Time-series data mining. ACM Comput Surv (CSUR) 45(1):12

    Google Scholar 

  • Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. ACM 23(2):419–429

    Google Scholar 

  • Frentzos E, Gratsias K, Theodoridis Y (2007) Index-based most similar trajectory search. In: Proceedings of the 23rd international conference on data engineering (ICDE). IEEE, pp 816–825

  • Järvelin K, Kekäläinen J (2002) Cumulated gain-based evaluation of IR techniques. ACM T Infor Syst 20(4):422–446

    Article  Google Scholar 

  • Keogh EJ, Pazzani MJ (1998) An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback. Kdd 98(1):239–243

    Google Scholar 

  • Keogh E, Ratanamahatana CA (2005) Exact indexing of dynamic time warping. Knowl Inf Syst 7(3):358–86

    Article  Google Scholar 

  • Kuijpers B, Moelans B, Othman W, Vaisman A (2009) Analyzing trajectories using uncertainty and background information. International symposium on spatial and temporal databases. Springer, Heidelberg, pp 135–152

    Chapter  Google Scholar 

  • Laube P, Imfeld S (2002) Analyzing relative motion within groups of trackable moving point objects. International conference on geographic information science. Springer, Heidelberg, pp 132–144

    Chapter  Google Scholar 

  • Lee JG, Han J, Whang KY (2007) Trajectory clustering: a partition-and-group framework. In: Proceedings of the international conference on management of data (ACM SIGMOD). ACM, pp 593–604

  • Lin B, Su J (2005) Shapes based trajectory queries for moving objects. In: Proceedings of the 13th annual ACM international workshop on geographic information systems. ACM, pp 21–30

  • Mamoulis N, Cao H, Kollios G, Hadjieleftheriou M, Tao Y, Cheung DW (2004) Mining, indexing, and querying historical spatiotemporal data. In: Proceedings of the 10th international conference on knowledge discovery and data mining (ACM SIGKDD). ACM, pp 236–245

  • Paparrizos J, Gravano L (2015) k-shape: Efficient and accurate clustering of time series. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data. ACM, pp 1855–1870

  • Pelekis N, Kopanakis I, Marketos G, Ntoutsi I, Andrienko G, Theodoridis Y (2007) Similarity search in trajectory databases. In: 14th international symposium on temporal representation and reasoning. IEEE, pp 129–140

  • Piorkowski M, Sarafijanovic-Djukic N, Grossglauser M (2009) A parsimonious model of mobile partitioned networks with clustering. In: First international communication systems and networks and workshops. IEEE, pp 1–10

  • Ranu S, Deepak P, Telang AD, Deshpande P, Raghavan S (2015) Indexing and matching trajectories under inconsistent sampling rates. In: Proceeding of IEEE 31st international conference on data engineering (ICDE). IEEE, pp 999–1010

  • Su H, Zheng K, Wang H, Huang J, Zhou X (2013) Calibrating trajectory data for similarity-based analysis. In: Proceedings of the ACM international conference on management of data (ACM SIGMOD). ACM, pp 833–844

  • Tang B, Yiu ML, Mouratidis K, Wang K (2017) Efficient motif discovery in spatial trajectories using discrete fréchet distance. In: International conference on extending database technology (EDBT)

  • Trajcevski G, Ding H, Scheuermann P, Tamassia R, Vaccaro D (2007) Dynamics-aware similarity of moving objects trajectories. In: Proceedings of the 15th annual ACM international symposium on Advances in geographic information systems. ACM, p 11

  • Vlachos M, Hadjieleftheriou M, Gunopulos D, Keogh E (2003) Indexing multi-dimensional time-series with support for multiple distance measures. In: Proceedings of the 9th ACM international conference on knowledge discovery and data mining (ACM SIGKDD). ACM, pp 216–225

  • Vlachos M, Kollios G, Gunopulos D (2002) Discovering similar multidimensional trajectories. In: Proceedings of the 18th international conference on data engineering (ICDE). IEEE, pp 673–684

  • Yuan J, Zheng Y, Zhang C, Xie W, Xie X, Sun G, Huang Y (2010) T-drive: driving directions based on taxi trajectories. In: Proceedings of 18th international conference on advances in geographic information systems. ACM, pp 99–108

  • Zheng K, Trajcevski G, Zhou X, Scheuermann P (2011) Probabilistic range queries for uncertain trajectories on road networks. In: Proceedings of the 14th international conference on extending database technology. ACM, pp 283–294

  • Zheng K, Zheng Y, Xie X, Zhou X (2012) Reducing uncertainty of low-sampling-rate trajectories. In: IEEE 28th international conference on data engineering (ICDE). IEEE, pp 1144–1155

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Somayeh Naderivesal.

Additional information

Responsible editor: Srinivasan Parthasarathy.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A Appendix

A Appendix

1.1 A.1 Accuracy using NDCG measure

In this section, we use Normalized Discounted Cumulative Gain (Järvelin and Kekäläinen 2002) to examine the accuracy of different measures that we mentioned in Sect. 5 for Cabspotting dataset.

For every measure, similar to Sect. 5, the ground truth is built by computing k nearest neighbours (k-NN) set for a given query trajectory from a given set of original reference trajectories DB. The trajectories in k-NN are ordered from most similar to less similar trajectory. Then we give the relevance value of k to the first trajectory (the most similar trajectory) and relevance value of 1 to the last trajectory. We compute k-NN’ for the same query trajectory from the dataset with lower sampling rate. We eliminate all the trajectories in set k-NN’ - k-NN (k-NN’ = k-NN’ \(\cap \)k-NN). Then, we assign a relevance to each of them based on their rank in k-NN’. For example, take \(T_1\) to \(T_{10}\) as reference trajectories and Q as a given query trajectory. Using one of measures 4-NN set for Q is \(<T_4, T_9, T_2, T_7>\) as the most similar trajectories in order (it means that \(T_4\) is the most similar trajectory to Q and \(T_9\) is the second most similar trajectory to Q and so on.). We give a relevance value based on their similarity rank (\(<4,3,2,1>\)). It means that relevance value for \(T_4\) is 4 which means it has highest relevance to Q. Also, we generate a lower sampled version of \(T_1\) to \(T_10\) by choosing \(50\%\) of their sampled points randomly and build trajectories \(T'_1\) to \(T'_10\). Then we extract 4-NN’ for the query trajectory Q from \(T'_1\) to \(T'_{10}\). The ideal situation is to extract the same set of similar trajectories with the same order so that 4-NN and 4-NN’ have the NDCG of 1. However, if a measure extracts \(<T_9, T_4, T_5, T_7> \)the same similar trajectories in order, we eliminate \(T_5\) as it is not in k-NN. Then, the relevance values for the set \(<T_4, T_9, T_2, T_7>\) using k-NN’ is \(<3,4,0,1>\) (\(T_4\) has the relevance of 3 in k-NN’ and \(T_9\) has the relevance of 4, \(T_2\) is not in k-NN’ and \(T_7\) has the relevance of 1. Indeed, the ideal relevance for the set \(<T_4, T_9, T_2, T_7>\) using that given measure is \(<4,3,2,1>\), however, for the lower sampled version of trajectories, it is \(<3,4,0,1>\). Then we compute DCG for \(<4,3,2,1>\) as the ideal DCG (IDCG) and for \(<3,4,0,1>\) as given DCG (GDCG). Then, we divide GDCG by IDCG.

Fig. 13
figure 13

NDCG for Cabspotting data

Figure 13 shows the results for cabspotting dataset. Similar to the results of Spearman’s rank correlation, TIDE and TIDE* have higher accuracy in comparison to other measures. However, since NDCG does not penalize for “bad“ trajectories in the results, we see better results in comparison to Spearman’s rank correlation.

1.2 A.2 The impact of estimated maximum speed

As discussed before, when we do not have information about speed limits of an object, we estimate the maximum speed of the object using sampled points of its trajectories (Sect. 3). In this experiment, we want to verify the impact of “estimating“ the maximum speed. In other words, we may underestimate the maximum speed and we want to see the impact of increasing the estimated maximum speed on the accuracy. The ground truth is the same as the previous, however, we increase the estimated maximum speed for the lower-sampled version of the Cabspotting dataset. The outcome is that, there is not a considerable impact on the accuracy. As an example, in Fig. 14, we show the results for different sampling rates experiment in Fig. 9c. TIDE-Speed1 and TIDE*-Speed1 show the results for the original speed estimation (Fig. 9c) and TIDE-Speed2 and TIDE*-Speed2 show the results for the increased speed (by 25 percent).

Fig. 14
figure 14

The average Spearman’s rank correlation with the original (speed1) and increased (speed2) estimated speed for Cabspotting data

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Naderivesal, S., Kulik, L. & Bailey, J. An effective and versatile distance measure for spatiotemporal trajectories. Data Min Knowl Disc 33, 577–606 (2019). https://doi.org/10.1007/s10618-019-00615-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-019-00615-5

Keywords

Navigation