Abstract
Due to the high availability of location-based sensors like GPS, it has been possible to collect large amounts of spatio-temporal data in the form of trajectories, each of which is a sequence of spatial locations that a moving object occupies in space as time progresses. Many applications, such as intelligent transportation systems and urban planning, can benefit from clustering the trajectories of cars in each locality of a city in order to learn about traffic behavior in each neighborhood. However, the immense and ever-increasing volume of trajectory data and the concept drift present in city traffic constitute scalability challenges that have not been addressed. In order to fill this gap, we propose the first GPU algorithm for local trajectory clustering, called GTraclus. We present a parallelized trajectory partitioning algorithm which simplifies trajectories into line segments using the Minimum Description Length (MDL) principle. We evaluated our proposed algorithm using two large real-life trajectory datasets and compared it against a multicore CPU version, which we call MC-Traclus, of the popular trajectory clustering algorithm, Traclus; our experiments showed that GTraclus had on average up to \(24\times\) faster execution time when compared against MC-Traclus.
Similar content being viewed by others
References
Zheng, Y.: Location-based social networks: users. In: Zheng, Y., Zhou, X. (eds.) Computing with Spatial Trajectories (2011). https://doi.org/10.1007/978-1-4614-1629-6_8
Zheng, Y., Xie, X., Ma, W.: Geolife: a collaborative social networking service among user, location and trajectory. IEEE Data Eng. Bull. 33 (2010)
Li, Q., Zheng, Y., Xie, X., Chen, Y., Liu, W., Ma, W.-Y.: Mining user similarity based on location history. In: Proceedings of the 16th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. GIS ’08. Association for Computing Machinery, New York, NY, USA (2008). https://doi.org/10.1145/1463434.1463477
Zheng, Y., Capra, L., Wolfson, O., Yang, H.: Urban computing: Concepts, methodologies, and applications. ACM Trans. Intell. Syst. Technol. 5 (2014). https://doi.org/10.1145/2629592
Ghose, A.: Tap: Unlocking the Mobile Economy (2017)
Powell, M.D., Aberson, S.D.: Accuracy of United States tropical cyclone landfall forecasts in the Atlantic basin (1976-2000). Bull. Am. Meteorol. Soc. 82 (2001). https://doi.org/10.1175/1520-0477(2001)082<2749:AOUSTC>2.3.CO;2
Wisdom, M.J., Cimon, N.J., Johnson, B.K., Garton, E.O., Thomas, J.W.: Spatial partitioning by mule deer and elk in relation to traffic (2004)
Lee, J.-G., Han, J., Whang, K.-Y.: Trajectory clustering: a partition-and-group framework. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. SIGMOD ’07, pp. 593–604. Association for Computing Machinery, New York, NY, USA (2007). https://doi.org/10.1145/1247480.1247546
Lee, V.W., Kim, C., Chhugani, J., Deisher, M., Kim, D., Nguyen, A.D., Satish, N., Smelyanskiy, M., Chennupaty, S., Hammarlund, P., Singhal, R., Dubey, P.: Debunking the 100x GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. (2010). https://doi.org/10.1145/1815961.1816021
Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Skadron, K.: A performance study of general-purpose applications on graphics processors using CUDA. J. Parallel Distrib. Comput. 68 (2008). https://doi.org/10.1016/j.jpdc.2008.05.014
Moreira-Matias, L., Gama, J., Ferreira, M., Mendes-Moreira, J., Damas, L.: Predicting taxi-passenger demand using streaming data. IEEE Trans. Intell. Transp. Syst. 14 (2013). https://doi.org/10.1109/TITS.2013.2262376
Mustafa, H., Barrus, C., Leal, E., Gruenwald, L.: Gtraclus: A local trajectory clustering algorithm for GPUS. In: 2021 IEEE 37th International Conference on Data Engineering Workshops (ICDEW), pp. 30–35 (2021). https://doi.org/10.1109/ICDEW53142.2021.00013
Nvidia: Cuda C++ Programming Guide Toolkit Documentation. https://docs.nvidia.com/cuda/cuda-c-programming-guide/. Accessed 11 Oct 2020
Nvidia: Cuda C++ Best Practices Guide. https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html. Accessed 11 Oct 2020
Jørgensen, J.R., Scheel, K., Assent, I., Pathak, A.R., Elster, A.C.: GPU-FAST-PROCLUS: a fast GPU-parallelized approach to projected clustering. In: EDBT, pp. 2–196 (2022). https://doi.org/10.48786/edbt.2022.09
Jørgensen, J.R., Scheel, K., Assent, I.: GPU-INSCY: A GPU-parallel algorithm and tree structure for efficient density-based subspace clustering. In: EDBT, pp. 25–36 (2021). https://doi.org/10.5441/002/edbt.2021.04
Thapa, R.J., Trefftz, C., Wolffe, G.: Memory-efficient implementation of a graphics processor-based cluster detection algorithm for large spatial databases. In: 2010 IEEE International Conference on Electro/Information Technology, pp. 1–5 (2010). https://doi.org/10.1109/EIT.2010.5612134
Böhm, C., Noll, R., Plant, C., Wackersreuther, B.: Density-based clustering using graphics processors. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 661–670 (2009). https://doi.org/10.1145/1645953.1646038
Poudel, M., Gowanlock, M.: CUDA-DClust+: Revisiting early GPU-accelerated DBSCAN clustering designs. In: 2021 IEEE 28th International Conference on High Performance Computing, Data, and Analytics (HiPC), pp. 354–363 (2021). https://doi.org/10.1109/HiPC53243.2021.00049
Loh, W.-K., Yu, H.: Fast density-based clustering through dataset partition using graphics processing units. Inf. Sci. 308, 94–112 (2015). https://doi.org/10.1016/j.ins.2014.10.023
Prokopenko, A., Lebrun-Grandié, D., Arndt, D.: Fast tree-based algorithms for DBSCAN on GPUS. CoRR arXiv:2103.05162 (2021)
Mustafa, H., Leal, E., Gruenwald, L.: An experimental comparison of GPU techniques for DBSCAN clustering. In: 2019 IEEE International Conference on Big Data (Big Data), pp. 3701–3710 (2019). https://doi.org/10.1109/BigData47090.2019.9006169
Gaffney, S., Smyth, P.: Trajectory clustering with mixtures of regression models. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’99, pp. 63–72. Association for Computing Machinery, New York, NY, USA (1999). https://doi.org/10.1145/312129.312198
Zheng, Y.: Trajectory data mining: an overview. ACM Trans. Intell. Syst. Technol. 6(3) (2015). https://doi.org/10.1145/2743025
Gaffney, S.J., Robertson, A.W., Smyth, P., Camargo, S.J., Ghil, M.: Probabilistic clustering of extratropical cyclones using regression mixture models. Clim. Dyn. 29 (2007). https://doi.org/10.1007/s00382-007-0235-z
Li, Z., Lee, J.G., Li, X., Han, J.: Incremental Clustering for Trajectories, vol. 5982 LNCS (2010). https://doi.org/10.1007/978-3-642-12098-5_3
Pelekis, N., Kopanakis, I., Kotsifakos, E.E., Frentzos, E., Theodoridis, Y.: Clustering uncertain trajectories. Knowl. Inf. Syst. 28 (2011). https://doi.org/10.1007/s10115-010-0316-x
Roh, G.-P., Hwang, S.-W.: Nncluster: An efficient clustering algorithm for road network trajectories. In: Kitagawa, H., Ishikawa, Y., Li, Q., Watanabe, C. (eds.) Database Systems for Advanced Applications, pp. 47–61. Springer, Berlin (2010). https://doi.org/10.1007/978-3-642-12098-5_4
Zhang, X., Niu, X., Fournier-Viger, P., Wang, B.: Two-stage traffic clustering based on HNSW. In: Advances and Trends in Artificial Intelligence. Theory and Practices in Artificial Intelligence: 35th International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2022, Kitakyushu, Japan, July 19–22, 2022, Proceedings, pp. 609–620. Springer, Berlin (2022). https://doi.org/10.1007/978-3-031-08530-7_51
Deng, Z., Hu, Y., Zhu, M., Huang, X., Du, B.: A scalable and fast optics for clustering trajectory big data. Cluster Comput. 18 (2015). https://doi.org/10.1007/s10586-014-0413-9
Gudmundsson, J., Valladares, N.: A GPU approach to subtrajectory clustering using the fréchet distance. IEEE Trans. Parallel Distrib. Syst. 26 (2015). https://doi.org/10.1109/TPDS.2014.2317713
Harish, P., Narayanan, P.J.: Accelerating large graph algorithms on the gpu using cuda. In: Aluru, S., Parashar, M., Badrinath, R., Prasanna, V.K. (eds.) High Performance Computing—HiPC 2007, pp. 197–208. Springer, Berlin (2007). https://doi.org/10.1007/978-3-540-77220-0_21
Min, S.W., Mailthody, V.S., Qureshi, Z., Xiong, J., Ebrahimi, E., Hwu, W.: Emogi: Efficient memory-access for out-of-memory graph-traversal in GPUS. Proc. VLDB Endow. 14(2), 114–127 (2020). https://doi.org/10.14778/3425879.3425883
Andrade, G., Ramos, G., Madeira, D., Sachetto, R., Ferreira, R., Rocha, L.: G-dbscan: A GPU accelerated algorithm for density-based clustering. Procedia Comput. Sci. 18, 369–378 (2013). https://doi.org/10.1016/j.procs.2013.05.200. 2013 International Conference on Computational Science
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. KDD’96, pp. 226–231. AAAI Press, Portland, Oregon (1996). https://doi.org/10.5555/3001460.3001507
Merrill, D., Garland, M., Grimshaw, A.: Scalable GPU graph traversal. SIGPLAN Not. 47(8), 117–128 (2012). https://doi.org/10.1145/2370036.2145832
Merrill, D., Garland, M., Grimshaw, A.: Scalable GPU graph traversal. In: Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. PPoPP ’12, pp. 117–128. Association for Computing Machinery, New York, NY, USA (2012). https://doi.org/10.1145/2145816.2145832
Song, H., Lee, J.-G.: RP-DBSCAN: A superfast parallel DBSCAN algorithm based on random partitioning. In: Proceedings of the 2018 International Conference on Management of Data, pp. 1173–1187 (2018). https://doi.org/10.1145/3183713.3196887
Acknowledgements
This work is supported in part by the National Science Foundation under Grant Nos. 1302439 and 1302423.
Author information
Authors and Affiliations
Contributions
All authors contributed equally to the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mustafa, H., Barrus, C., Leal, E. et al. GTraclus: a novel algorithm for local trajectory clustering on GPUs. Distrib Parallel Databases 41, 467–488 (2023). https://doi.org/10.1007/s10619-023-07429-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-023-07429-x