Abstract
Web big data contains a wealth of valuable information, which can be extracted through web mining and knowledge extraction. Among them, the real-time location information of web can provide a richer calculation basis for existing applications, such as real-time monitoring systems and recommendation systems based on real-time trajectory clustering. However, as a trajectory is a sequence of user positions in the time dimension, the correlation calculation of the trajectories will inevitably incur a massive computational cost. In addition, such trajectory data is usually time-sensitive, that is, once the trajectory data has been generated and changed, the corresponding clustering results need to be output with low latency. Although the offline trajectory clustering has been well studied, extending such work to an online environment directly tends to incur (1) expensive network cost, (2) high processing latency, and (3) low accuracy results. To enable a real-time clustering on trajectory stream, we propose a distributed cLustering framework for hexagonal-based streaming trajectory (Lunatory). Lunatory covers three key components, that are: (1) Simplifier: to solve the problem of extensive network transmission in a distributed trajectory streaming system, a pivot trajectory data structure is introduced to simplify trajectories by reducing the number of samples and extracting key features; (2) Partitioner: to enhance the local computational efficiency of subsequent clustering, a hexagonal-based indexing strategy is proposed to index the pivot trajectories; (3) Executor extends DBSCAN to pivot trajectories and implements real-time trajectory clustering based on Flink. Empirical studies on real-world data validate the usefulness of our proposal and prove the huge advantage of our approach over available solutions in the literature.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agarwal, P.K., Fox, K., Munagala, K., Nath, A., Pan, J., Taylor, E.: Subtrajectory clustering: models and algorithms. In: Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pp. 75–87 (2018)
Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: Optics: ordering points to identify the clustering structure. ACM SIGMOD Rec. 28(2), 49–60 (1999)
Birant, D., Kut, A.: ST-DBScan: an algorithm for clustering spatial-temporal data. Data Knowl. Eng. 60(1), 208–221 (2007)
Chen, L., Chao, P., Fang, J., Chen, W., Xu, J., Zhao, L.: Disatra: a real-time distributed abstract trajectory clustering. In: Zhang, W., Zou, L., Maamar, Z., Chen, L. (eds.) WISE 2021. LNCS, vol. 13080, pp. 619–635. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-90888-1_47
Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 603–619 (2002)
Uber Engineering: H3: Uber’s Hexagonal Hierarchical Spatial Index. https://eng.uber.com/h3/
Ester, M., Kriegel, H.P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, vol. 96, pp. 226–231 (1996)
Fang, Z., Du, Y., Chen, L., Hu, Y., Gao, Y., Chen, G.: E 2 DTC: an end to end deep trajectory clustering framework via self-training. In: 2021 IEEE 37th International Conference on Data Engineering (ICDE), pp. 696–707. IEEE (2021)
Flink, A.: Apache Flink - Stateful Computations over Data Streams. https://flink.apache.org/
Gudmundsson, J., Valladares, N.: A GPU approach to subtrajectory clustering using the fréchet distance. IEEE Trans. Parallel Distrib. Syst. 26(4), 924–937 (2014)
Hung, C.-C., Peng, W.-C., Lee, W.-C.: Clustering and aggregating clues of trajectories for mining trajectory patterns and routes. VLDB J. 24(2), 169–192 (2011). https://doi.org/10.1007/s00778-011-0262-6
Lee, J.G., Han, J., Whang, K.Y.: Trajectory clustering: a partition-and-group framework. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, pp. 593–604 (2007)
Li, X., Zhao, K., Cong, G., Jensen, C.S., Wei, W.: Deep representation learning for trajectory similarity computation. In: 2018 IEEE 34th International Conference on Data Engineering (ICDE), pp. 617–628. IEEE (2018)
Li, Z., Lee, J.-G., Li, X., Han, J.: Incremental clustering for trajectories. In: Kitagawa, H., Ishikawa, Y., Li, Q., Watanabe, C. (eds.) DASFAA 2010. LNCS, vol. 5982, pp. 32–46. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12098-5_3
Liu, A., et al.: Representation learning with multi-level attention for activity trajectory similarity computation. IEEE Trans. Knowl. Data Eng. 34(5), 2387–2400 (2020)
Mao, J., Song, Q., Jin, C., Zhang, Z., Zhou, A.: TSCluWin: trajectory stream clustering over sliding window. In: Navathe, S.B., Wu, W., Shekhar, S., Du, X., Wang, X.S., Xiong, H. (eds.) DASFAA 2016. LNCS, vol. 9643, pp. 133–148. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32049-6_9
Mao, J., Song, Q., Jin, C., Zhang, Z., Zhou, A.: Online clustering of streaming trajectories. Front. Comp. Sci. 12(2), 245–263 (2018). https://doi.org/10.1007/s11704-017-6325-0
Mao, J., Wang, T., Jin, C., Zhou, A.: Feature grouping-based outlier detection upon streaming trajectories. IEEE Trans. Knowl. Data Eng. 29(12), 2696–2709 (2017)
Myung, P.D., Myung, J.I., Pitt, M.A.: Advances in Minimum Description Length: Theory and Applications. MIT Press, Cambridge (2005)
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Yao, D., Zhang, C., Zhu, Z., Huang, J., Bi, J.: Trajectory clustering via deep representation learning. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp. 3880–3887. IEEE (2017)
Yue, M., Li, Y., Yang, H., Ahuja, R., Chiang, Y.Y., Shahabi, C.: Detect: deep trajectory clustering for mobility-behavior analysis. In: 2019 IEEE International Conference on Big Data (Big Data), pp. 988–997. IEEE (2019)
Zheng, Y.: Trajectory data mining: an overview. ACM Trans. Intell. Syst. Technol. (TIST) 6(3), 1–41 (2015)
Acknowledgements
This work was supported by National Natural Science Foundation of China under grant (No. 61802273, 62102277), Postdoctoral Science Foundation of China (No. 2020M681529), Natural Science Foundation of Jiangsu Province (BK20210703), China Science and Technology Plan Project of Suzhou (No. SYG202139), Postgraduate Research & Practice Innovation Program of Jiangsu Province (SJCX2\(\_\)11342).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Wu, Y., Pan, Z., Chao, P., Fang, J., Chen, W., Zhao, L. (2022). Lunatory: A Real-Time Distributed Trajectory Clustering Framework for Web Big Data. In: Di Noia, T., Ko, IY., Schedl, M., Ardito, C. (eds) Web Engineering. ICWE 2022. Lecture Notes in Computer Science, vol 13362. Springer, Cham. https://doi.org/10.1007/978-3-031-09917-5_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-09917-5_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-09916-8
Online ISBN: 978-3-031-09917-5
eBook Packages: Computer ScienceComputer Science (R0)