Skip to main content
Log in

Indexable sub-trajectory matching using multi-segment approximation: a partition-and-stitch framework

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

With advances in base technologies for moving objects, many studies have been conducted on the construction of databases of the trajectories of moving objects, including the diverse applications related to the trajectories. Most previous studies deal with whole trajectory matching, which finds the trajectories T in the database similar to a given query trajectory Q ‘as a whole.’ However, we often want to find those T containing the sub-trajectories \(T_{\mathrm{sub}} \, (\subseteq T)\) that are similar to Q. This problem is known as sub-trajectory matching  and is more complicated than whole trajectory matching since the query trajectory Q can be of any length and the matching sub-trajectories \(T_{\mathrm{sub}}\) can be at any position in the data trajectories T. In this paper, we present a novel indexing-based sub-trajectory matching algorithm using multi-segment approximation. Our algorithm partitions a data trajectory into multiple component segments and then stores the individual segments in an index. The query trajectory is also partitioned into its component segments, and the search for similar segments for each query segment is efficiently performed using the index. The sub-trajectories similar to the query trajectory are reconstructed by our ‘stitching’ algorithm using the individual segments retrieved from the index. Our stitching algorithm is novel and innovative in the sense that it facilitates segment-wise partitioning and indexing of data trajectories. Without stitching, only trajectory-wise operations would be affordable, which causes severe storage space overhead and degradation in search performance. Our study is the first that uses indexing in sub-trajectory matching. We define a (multi-segment) trajectory similarity measure that extends a widely used single-segment similarity measure proposed by Lee et al. (in: Proceedings of ACM SIGMOD international conference on management of data (SIGMOD), 2007; in: Proceedings of IEEE international conference on data engineering (ICDE), 2008; Proc VLDB Endow (PVLDB) 1(1):1081–1094, 2008) by using the Hausdorff distance. We perform extensive experiments to compare our method with EDS (Xie, in: Proceedings of ACM SIGMOD international conference on management of data (SIGMOD), 2014), which has been proved to outperform all representative point-based measures in terms of accuracy and performance. The accuracy of our similarity measure is better than EDS by up to 52.0%, and our algorithm significantly outperforms that using EDS by up to 22,543 times. The performance of our algorithm is linearly scalable in the size of the database, which is an essential property for handling large-scale databases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. Some arrows are omitted to avoid clutter.

  2. http://research.microsoft.com/en-us/projects/geolife/.

  3. http://www.nhc.noaa.gov/data/.

  4. http://www.chorochronos.org/?q=node/5.

References

  1. Alamri S, Taniar D, Safar M (2014) A taxonomy for moving object queries in spatial databases. Future Gener Comput Syst 37:232–242

    Article  Google Scholar 

  2. Atev S, Miller G, Papanikolopoulos NP (2010) Clustering of vehicle trajectories. IEEE Trans Intell Transp Syst 11(3):647–657

    Article  Google Scholar 

  3. Beckmann N, Seeger B (2009) A revised R*-tree in comparison with related index structures. In: Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD), pp 799–812

  4. Buchin K, Buchin M, Kreveld MV, Luo J (2009) Finding long and similar parts of trajectories. In: Proceedings of ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (GIS), pp 296–305

  5. Chen J, Leung MKH, Gao Y (2003) Noisy logo recognition using line segment Hausdorff distance. Pattern Recognit 36(4):943–955

    Article  Google Scholar 

  6. Chen L, Ng R (2004) On the marriage of Lp-norms and edit distance. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp 792–803

    Chapter  Google Scholar 

  7. Chen L, Ozsu MT, Oria V (June 2005) Robust and fast similarity search for moving object trajectories. In: Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD), pp 491–502

  8. Ding X, Chen L, Gao Y, Jensen CS, Bao H (2018) UlTraMan: a unified platform for big trajectory data management and analytics. Proc VLDB Endow 11(7):787–799

    Article  Google Scholar 

  9. Dong Y, Pi D (2018) Novel privacy-preserving algorithm based on frequent path for trajectory data publishing. Knowl Based Syst 148:55–65

    Article  Google Scholar 

  10. Eberly DH (2006) 3D game engine design: a practical approach to real-time computer graphics, 2nd edn. Morgan Kaufmann, Burlington

    Google Scholar 

  11. Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp 226–231

  12. Frentzos E, Gratsias K, Theodoridis Y (2007) Index-based most similar trajectory search. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), pp 816–825

  13. Hung C-C, Peng W-C, Lee W-C (2015) Clustering and aggregating clues of trajectories for mining trajectory patterns and routes. VLDB J 24(2):169–192

    Article  Google Scholar 

  14. Huttenlocher DP, Kedem K (1990) Computing the minimum Hausdorff distance for point sets under translation. In: Proceedings of ACM annual symposium on computational geometry (SCG), pp 340–349

  15. Kaplan E, Gürsoy ME, Nergiz ME, Saygin Y (2018) Location disclosure risks of releasing trajectory distances. Data Knowl Eng 113:43–63

    Article  Google Scholar 

  16. Lee J-G, Han J, Whang K-Y (2007) Trajectory clustering: a partition-and-group framework. In: Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD), pp 593–604

  17. Lee J-G, Han J, Li X (2008) Trajectory outlier detection: a partition-and-detect framework. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), pp 140–149

  18. Lee J-G, Han J, Li X, Gonzalez H (2008) TraClass: trajectory classification using hierarchical region-based and trajectory-based clustering. Proc VLDB Endow (PVLDB) 1(1):1081–1094

    Article  Google Scholar 

  19. Mao J, Sun P, Jin C, Zhou A (2018) Outlier detection over distributed trajectory streams. In: Proceedings of SIAM International Conference on Data Mining (SDM), San Diego, pp 64–72

  20. Mao Y, Zhong H, Xiao X, Li X (2017) A segment-based trajectory similarity measure in the urban transportation systems. Sensors 17(3):524

    Article  Google Scholar 

  21. Nutanong S, Jacox EH, Samet H (2011) An incremental Hausdorff distance calculation algorithm. Proc VLDB Endow (PVLDB) 4(8):506–517

    Article  Google Scholar 

  22. Pelekis N, Tampakis P, Vodas M, Doulkeridis C, Theodoridis Y (2017) On temporal-constrained sub-trajectory cluster analysis. Data Min Knowl Discov (DMKD) 31(5):1294–1330

    Article  MathSciNet  Google Scholar 

  23. Ranu S, Deepak P, Telang AD, Deshpande P, Raghavan S (2015) Indexing and matching trajectories under inconsistent sampling rates. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), pp 999–1010

  24. Shang Z, Li G, Bao Z (2018) DITA: distributed in-memory trajectory analytics. In: Proceedings of International Conference on Management of Data (SIGMOD), Houston, pp 725–740

  25. Vlachos M, Kollios G, Gunopulos D (2002) Discovering similar multidimensional trajectories. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), pp 673–684

  26. Wolfson O, Xu B, Chamberlain S, Jiang L (1998) Moving objects databases: issues and solutions. In: Proceedings of IEEE International Conference on Scientific and Statistical Database Management, pp 111–122

  27. Xie M (2014) EDS: a segment-based distance measure for sub-trajectory similarity search. In: Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD), pp 1609–1610

  28. Xie D, Li F, Phillips JM (2017) Distributed trajectory similarity search. Proc VLDB Endow (PVLDB) 10(11):1478–1489

    Article  Google Scholar 

  29. Yi B-K, Jagadish HV, Faloutsos C (1998) Efficient retrieval of similar time sequences under time warping. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), pp 201–208

  30. Yuan G, Sun P, Zhao J, Li D, Wang C (2017) A review of moving object trajectory clustering algorithms. Artif Intell Rev 47(1):123–144

    Article  Google Scholar 

  31. Zheng Y, Zhang L, Xie X, Ma W-Y (2009) Mining interesting locations and travel sequences from GPS trajectories. In: Proceedings of International Conference on World Wide Web (WWW), pp 791–800

  32. Zheng Y, Zhou X (eds) (2011) Computing with spatial trajectories. Springer, Berlin

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Woong-Kee Loh or Kyu-Young Whang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by the National Research Foundation of Korea (NRF) grant funded by Korean Government (MSIT) (No. 2016R1A2B4015929).

Appendix

Appendix

Lemma 5

For any two segments\(L_i\)and\(L_j\), the following always holds:

$$\begin{aligned} {\mathrm{dist}}(L_i, L_j) \ge d_{{\mathrm{seg}},0}(L_i, L_j), \end{aligned}$$
(11)

where\({\mathrm{dist}}(L_i, L_j) = w_\perp \cdot d_\perp (L_i, L_j) + w_\parallel \cdot d_\parallel (L_i, L_j) + w_\theta \cdot d_\theta (L_i, L_j)\)and\(w_\perp = w_\parallel = w_\theta = 1\) [16].

Fig. 17
figure 17

Similarity measure between two segments \(L_i\) and \(L_j\)

Proof

Let \({\mathcal {L}}_i\) and \({\mathcal {L}}_j\) be two lines containing two segments \(L_i\) and \(L_j\), respectively. Let \(p_s\) and \(p_e\) be the projection points of two end points \(s_j\) and \(e_j\) of \(L_j\) onto \({\mathcal {L}}_i\), respectively. Without loss of generality, we assume \(d(s_j, p_s) \le d(e_j, p_e)\). We also assume that \(L_i\) is longer than \(L_j\) as in [16]. We prove for the following three cases:

Case 1: \(p_s\) is located on \(L_i\).

As shown in Fig. 17a, \(d_{{\mathrm{seg}},0}(L_i, L_j) = l_{\perp 1}\). Thus,

$$\begin{aligned} {\mathrm{dist}}(L_i, L_j)&\ge {} d_\perp (L_i, L_j) = \frac{l_{\perp 1}^2 + l_{\perp 2}^2}{ l_{\perp 1} + l_{\perp 2}} \\&\ge {} \frac{l_{\perp 1}^2 + l_{\perp 1} l_{\perp 2}}{ l_{\perp 1} + l_{\perp 2}} = l_{\perp 1} \\&= {} d_{{\mathrm{seg}},0}(L_i, L_j). \end{aligned}$$

Case 2: \(p_s\) is located behind \(e_i\).

As shown in Fig. 17b, \(d_{{\mathrm{seg}},0}(L_i, L_j) = d(e_i, s_j)\). Thus,

$$\begin{aligned} {\mathrm{dist}}(L_i, L_j)&\ge {} d_{\perp }(L_i, L_j) + d_\parallel (L_i, L_j) \\&\ge {} l_{\perp 1} + l_{\parallel 2} \ge d(e_i, s_j) \\&= {} d_{{\mathrm{seg}},0}(L_i, L_j). \end{aligned}$$

Case 3: \(p_s\) is located in front of \(s_i\).

Let \(p_j\) be the projection point of \(s_i\) in \(L_i\) onto \({\mathcal {L}}_j\), then \(d_{{\mathrm{seg}},0}(L_i, L_j) \le d(s_i, p_j)\). If it holds that \(l_{\parallel 1}' \le l_{\parallel 1}''\), then \(d_{\parallel }(L_i, L_j) = l_{\parallel 1}'\). Thus,

$$\begin{aligned} {\mathrm{dist}}(L_i, L_j)&\ge {} d_\perp (L_i, L_j) + d_\parallel (L_i, L_j) \ge l_{\perp 1} + l_{\parallel 1}' \\&\ge {} d(s_i, s_j) \ge d(s_i, p_j) \\&\ge {} d_{{\mathrm{seg}},0}(L_i, L_j). \end{aligned}$$

If it holds that \(l_{\parallel 1}' \ge l_{\parallel 1}''\), then \(d_\parallel (L_i, L_j) = l_{\parallel 1}''\). Thus,

$$\begin{aligned} {\mathrm{dist}}(L_i, L_j)&= {} d_\perp (L_i, L_j) + d_\parallel (L_i, L_j) + d_\theta (L_i, L_j) \\&\ge {} l_{\perp 1} + l_{\parallel 1}'' + d_\theta \ge d(s_i, e_j) \ge d(s_i, p_j) \\&\ge {} d_{{\mathrm{seg}},0}(L_i, L_j). \end{aligned}$$

Therefore, combining these three cases, it always holds that \({\mathrm{dist}}(L_i, L_j) \ge d_{{\mathrm{seg}},0}(L_i, L_j)\). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yoo, JJ., Loh, WK. & Whang, KY. Indexable sub-trajectory matching using multi-segment approximation: a partition-and-stitch framework. J Supercomput 75, 6129–6157 (2019). https://doi.org/10.1007/s11227-019-02813-w

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-019-02813-w

Keywords

Navigation