ABSTRACT
Trajectory data analytics plays an important role in many applications, such as transportation optimization, urban planning, taxi scheduling, and so on. However, trajectory data analytics has a great challenge that the time cost for processing queries is too high on big datasets. In this paper, we demonstrate a distributed in-memory framework Ratel base on Spark for analyzing large scale trajectories. Ratel groups trajectories into partitions by considering the data locality and load balance. We build R-Tree based global indexes to prune partitions when applying trajectory search and join. For each partition, Ratel uses a filter-refinement method to efficiently find similar trajectories. We show three kinds of scenarios - bus station planning, route recommendation, and transportation analytics. Demo attendees can interact with a web UI, pose different queries on the dataset, and navigate the query result.
- Helmut Alt and Michael Godau. 1995. Computing the Fréchet distance between two polygonal curves. International Journal of Computational Geometry & Applications 5, 01n02 (1995), 75--91.Google ScholarCross Ref
- Donald J Berndt and James Clifford. 1994. Using dynamic time warping to find patterns in time series. In KDD workshop, Vol. 10. 359--370. Google ScholarDigital Library
- Lei Chen and Raymond Ng. 2004. On the marriage of lp-norms and edit distance. In VLDB. VLDB, 792--803.Google Scholar
- Jean-Francois Hangouet. 1995. Computation of the Hausdorff distance between plane vector polylines. In AUTOCARTO-CONFERENCE-. 1--10.Google Scholar
- Zeyuan Shang, Guoliang Li, and Zhifeng Bao. 2018. Dita: Distributed in-memory trajectory analytics. In SIGMOD. ACM, 725--740. Google ScholarDigital Library
- Haiquan Wang, Guoliang Li, Nan Tang, and Jianhua Feng. 2019. Distributed Trajectory Similarity Search and Join. In VLDB.Google Scholar
- Xiaoyue Wang, Abdullah Mueen, Hui Ding, Goce Trajcevski, Peter Scheuermann, and Eamonn Keogh. 2013. Experimental comparison of representation methods and distance measures for time series data. DMKD (2013). Google ScholarDigital Library
- Haitao Yuan and Guoliang Li. 2019. Distributed In-Memory Trajectory Similarity Search and Join on Road Network. In ICDE.Google Scholar
Index Terms
- Ratel: Interactive Analytics for Large Scale Trajectories
Recommendations
Uncertain top-k query processing in distributed environments
The top-k query on uncertain data set has been a very hot topic these years, and there have been many studies on uncertain top-k queries. Unfortunately, most of the existing algorithms only consider centralized processing environments, and they are not ...
GeoFlink: A Distributed and Scalable Framework for the Real-time Processing of Spatial Streams
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge ManagementApache Flink is an open-source system for scalable processing of batch and streaming data. Flink does not natively support efficient processing of spatial data streams, which is a requirement of many applications dealing with spatial data. Besides Flink,...
SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming
AbstractFinding frequent itemsets in a continuous streaming data is an important data mining task which is widely used in network monitoring, Internet of Things data analysis and so on. In the era of big data, it is necessary to develop a distributed ...
Comments