Abstract
In this paper, a heterogeneous spatial-temporal similarity search framework is proposed, in which the datasets come from multiple different asynchronous data sources. Due to measuring error, data loss, and other factors, the similarity search based on single points along a trajectory usually cannot fulfill the accuracy requirements in our heterogeneous case. To address this issue, we introduce a concept of the spatial-temporal cluster of points, instead of single points, which can be identified for each target query. By following this concept, we further design a spectral clustering algorithm to construct the clusters in the pre-processing phase effectively. And the query processing is improved for the accuracy of the search by unifying multiple search metrics. To validate our idea, we also prototype a clustered online spatial-temporal similarity search system, "Osprey", to calculate in parallel the similarity of spatial-temporal sequences in the heterogeneous search on a distributed database. Our empirical study is conducted based on an open dataset, called "T-Drive", and a billion-scale dataset consisting of WiFi positioning records gathered from the urban metro system in Shenzhen, China. The experimental results show that the latency of our proposed system is less than 4s in most cases, and the accuracy is more than 70% when the similarity exceeds 0.5.












Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Notes
The paths up and down a viaduct are actually different paths although they have small Euclidean distances.
References
Chen R, Jankovic F, Marinsek N, Foschini L, Kourtis L, Signorini A, Pugh M, Shen J, Yaari R, Maljkovic V et al. (2019) Developing measures of cognitive impairment in the real world from consumer-grade multimodal sensor streams. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 2145–2155
Shang S, Chen L, Jensen CS, Wen J-R, Kalnis P (2017) Searching trajectories by regions of interest. IEEE Trans Knowl Data Eng 29(7):1549–1562. https://doi.org/10.1109/TKDE.2017.2685504
Chen L, Özsu MT, Oria V (2005) Robust and fast similarity search for moving object trajectories. In: Proceedings of the 2005 ACM SIGMOD international conference on management of data. SIGMOD ’05, pp. 491–502. Association for computing machinery, New York, NY, USA. https://doi.org/10.1145/1066157.1066213
Ta N, Li G, Xie Y, Li C, Hao S, Feng J (2017) Signature-based trajectory similarity join. IEEE Trans Knowl Data Eng 29(4):870–883. https://doi.org/10.1109/TKDE.2017.2651821
Xie D, Li F, Phillips JM (2017) Distributed trajectory similarity search. In: VLDB 10:1478–1489
Ying R, Pan J, Fox K, Agarwal PK (2016) A simple efficient approximation algorithm for dynamic time warping. In: Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. SIGSPACIAL ’16. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/2996913.2996954
Ranu SPD, Telang AD, Deshpande P, Raghavan S (2015) Indexing and matching trajectories under inconsistent sampling rates. In: 2015 IEEE 31st International conference on data engineering, pp. 999–1010. https://doi.org/10.1109/ICDE.2015.7113351
Mao Y, Zhong H, Xiao X, Li X (2017) A segment-based trajectory similarity measure in the urban transportation systems. Sensors 17(3):524
Li X, Zhao K, Cong G, Jensen CS, Wei W (2018) Deep representation learning for trajectory similarity computation. In: 2018 IEEE 34th International conference on data engineering (ICDE), pp. 617–628. IEEE
Shang S, Chen L, Jensen CS, Wen J-R, Kalnis P (2017) Searching trajectories by regions of interest. IEEE Trans Knowl Data Eng 29(7):1549–1562
Zhang L, Zhao L, Wang Z, Liu J (2017) Wifi networks in metropolises: from access point and user perspectives. IEEE Communicat Magaz 55(5):42–48
Shang S, Chen L, Wei Z, Jensen CS, Zheng K, Kalnis P (2017) Trajectory similarity join in spatial networks. Proc. VLDB Endow. 10(11), 1178–1189. https://doi.org/10.14778/3137628.3137630
Zheng Y, Zhang L, Ma Z, Xie X, Ma WY (2011) Recommending friends and locations based on individual location history. ACM Trans Web. https://doi.org/10.1145/1921591.1921596
Shang S, Ding R, Zheng K, Jensen CS, Kalnis P, Zhou X (2014) Personalized trajectory matching in spatial networks. VLDB J 23(3):449–468. https://doi.org/10.1007/s00778-013-0331-0
Zheng, K., Yang, Y., Shang, S., Yuan, N.J.: Towards efficient search for activity trajectories. In: Proceedings of the 2013 IEEE international conference on data engineering (ICDE 2013). ICDE ’13, pp. 230–241. IEEE Computer Society, USA (2013)
Keogh E, Ratanamahatana CA (2005) Exact indexing of dynamic time warping. Knowl Info Sys 7(3):358–386
Vlachos M, Kollios G, Gunopulos D (2002) Discovering similar multidimensional trajectories. In: Proceedings 18th International conference on data engineering, pp. 673–684. https://doi.org/10.1109/ICDE.2002.994784
Willkomm J, Bettinger J, Schäler MBöhm K (2019) Efficient interval-focused similarity search under dynamic time warping. ACM International conference proceeding series, 130–139
Lucas B, Shifaz A, Pelletier C, O’Neill L, Zaidi N, Goethals B, Petitjean F, Webb GI (2019) Proximity Forest: an effective and scalable distance-based classifier for time series. Data Min Knowl Discov 33(3):607–635
Kondor D, Hashemian B, de Montjoye Y-A, Ratti C (2020) Towards matching user mobility traces in large-scale datasets. IEEE Trans Big Data 6(4):714–726
Pelekis N, Kopanakis I, Marketos G, Ntoutsi I, Andrienko G, Theodoridis Y (2007) Similarity search in trajectory databases. In: 14th International symposium on temporal representation and reasoning (TIME’07), pp. 129–140. IEEE
Patrou M, Alam MM, Memarzia P, Ray S, Bhavsar VC, Kent KB, Dueck GW (2018) DISTIL: A distributed in-memory data processing system for location-based services. GIS: Proceedings of the ACM international symposium on advances in geographic information systems, 496–499
Memarzia P, Patrou M, Alam MM, Ray S, Bhavsar VC, Kent KB (2019) Toward efficient processing of spatio-temporal workloads in a distributed in-memory system, 118–127. IEEE
Sun L, Zhou W (2017) A multi-source trajectory correlation algorithm based on spatial-temporal similarity. In: 2017 20th International conference on information fusion (Fusion), pp. 1–7. IEEE
Hung C-C, Peng W-C, Lee W-C (2015) Clustering and aggregating clues of trajectories for mining trajectory patterns and routes. VLDB J 24(2):169–192
Li R, Ruan S, Bao J, Li Y, Wu Y, Hong L, Zheng Y (2020) Efficient path query processing over massive trajectories on the cloud. IEEE Trans Big Data 6(1):66–79
Apache: phoenix. [EB/OL]. https://phoenix.apache.org (2020)
Gupta S, Kumar R, Lu K, Moseley B, Vassilvitskii S (2017) Local search methods for k-means with outliers. Proceed VLDB Endowm 10(7):757–768
Schubert E, Sander J, Ester M, Kriegel HP, Xu X (2017) Dbscan revisited, revisited: why and how you should (still) use dbscan. ACM Trans Datab Sys (TODS) 42(3):1–21
Yang Y, Cai J, Yang H, Zhang J, Zhao X (2020) TAD: a trajectory clustering algorithm based on spatial-temporal density analysis. Expert Sys Appl. https://doi.org/10.1016/j.eswa.2019.112846
Von Luxburg U (2007) A tutorial on spectral clustering. Statist Comput 17(4):395–416
Guo N, Xiong W, Wu Y, Chen L, Jing N (2019) A geographic meshing and coding method based on adaptive hilbert-geohash. IEEE Access 7:39815–39825
Wang C, Huang Y, Shao M, Hu Q, Chen D (2019) Feature selection based on neighborhood self-information. IEEE Trans Cybern 50(9):4031–4042
Bag S, Kumar SK, Tiwari MK (2019) An efficient recommendation generation using relevant jaccard similarity. Info Sci 483(1):53–64
de Matthews AGG., Hensman J, Turner R, Ghahramani Z (2016) On sparse variational methods and the kullback-leibler divergence between stochastic processes. In: Artificial Intelligence and Statistics, pp. 231–239 PMLR
Xu H, Zeng W, Zhang D, Zeng X (2019) Moea/hd: a multiobjective evolutionary algorithm based on hierarchical decomposition. IEEE Trans Cyber 49(2):517–526. https://doi.org/10.1109/TCYB.2017.2779450
Apache: HBase. [EB/OL]. https://hbase.apache.org/ (2020)
Arnold J, Glavic B, Raicu I (2019) A high-performance distributed relational database system for scalable OLAP processing. IPDPS, 738–748
InfluxData: InfluxDB. https://www.influxdata.com/products/ (2020)
Yuan J, Zheng Y, Zhang C, Xie W, Xie X, Sun G, Huang Y (2010) T-drive: driving directions based on taxi trajectories. In: Proceedings of the 18th SIGSPATIAL International conference on advances in geographic information systems, pp. 99–108
Yuan J, Zheng Y, Xie X, Sun G (2011) Driving with knowledge from the physical world. In: Proceedings of the 17th ACM SIGKDD International conference on knowledge discovery and data mining, pp. 316–324
Yue M, Kang C, Andris C, Qin K, Liu Y, Meng Q (2018) Understanding the interplay between bus, metro, and cab ridership dynamics in shenzhen, china. Trans GIS 22(3):855–871
Acknowledgements
This work is supported in part by Key-Area Research and Development Program of Guangdong Province (No. 2020B010164002), and National Natural Science Foundation of China (No. 61672513).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Dai, H., Wang, Y. & Xu, C. Osprey: a heterogeneous search framework for spatial-temporal similarity. Computing 104, 1949–1975 (2022). https://doi.org/10.1007/s00607-022-01075-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-022-01075-4