Skip to main content

Advertisement

Log in

GTraclus: a novel algorithm for local trajectory clustering on GPUs

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

Due to the high availability of location-based sensors like GPS, it has been possible to collect large amounts of spatio-temporal data in the form of trajectories, each of which is a sequence of spatial locations that a moving object occupies in space as time progresses. Many applications, such as intelligent transportation systems and urban planning, can benefit from clustering the trajectories of cars in each locality of a city in order to learn about traffic behavior in each neighborhood. However, the immense and ever-increasing volume of trajectory data and the concept drift present in city traffic constitute scalability challenges that have not been addressed. In order to fill this gap, we propose the first GPU algorithm for local trajectory clustering, called GTraclus. We present a parallelized trajectory partitioning algorithm which simplifies trajectories into line segments using the Minimum Description Length (MDL) principle. We evaluated our proposed algorithm using two large real-life trajectory datasets and compared it against a multicore CPU version, which we call MC-Traclus, of the popular trajectory clustering algorithm, Traclus; our experiments showed that GTraclus had on average up to \(24\times\) faster execution time when compared against MC-Traclus.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Zheng, Y.: Location-based social networks: users. In: Zheng, Y., Zhou, X. (eds.) Computing with Spatial Trajectories (2011). https://doi.org/10.1007/978-1-4614-1629-6_8

  2. Zheng, Y., Xie, X., Ma, W.: Geolife: a collaborative social networking service among user, location and trajectory. IEEE Data Eng. Bull. 33 (2010)

  3. Li, Q., Zheng, Y., Xie, X., Chen, Y., Liu, W., Ma, W.-Y.: Mining user similarity based on location history. In: Proceedings of the 16th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. GIS ’08. Association for Computing Machinery, New York, NY, USA (2008). https://doi.org/10.1145/1463434.1463477

  4. Zheng, Y., Capra, L., Wolfson, O., Yang, H.: Urban computing: Concepts, methodologies, and applications. ACM Trans. Intell. Syst. Technol. 5 (2014). https://doi.org/10.1145/2629592

  5. Ghose, A.: Tap: Unlocking the Mobile Economy (2017)

  6. Powell, M.D., Aberson, S.D.: Accuracy of United States tropical cyclone landfall forecasts in the Atlantic basin (1976-2000). Bull. Am. Meteorol. Soc. 82 (2001). https://doi.org/10.1175/1520-0477(2001)082<2749:AOUSTC>2.3.CO;2

  7. Wisdom, M.J., Cimon, N.J., Johnson, B.K., Garton, E.O., Thomas, J.W.: Spatial partitioning by mule deer and elk in relation to traffic (2004)

  8. Lee, J.-G., Han, J., Whang, K.-Y.: Trajectory clustering: a partition-and-group framework. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. SIGMOD ’07, pp. 593–604. Association for Computing Machinery, New York, NY, USA (2007). https://doi.org/10.1145/1247480.1247546

  9. Lee, V.W., Kim, C., Chhugani, J., Deisher, M., Kim, D., Nguyen, A.D., Satish, N., Smelyanskiy, M., Chennupaty, S., Hammarlund, P., Singhal, R., Dubey, P.: Debunking the 100x GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. (2010). https://doi.org/10.1145/1815961.1816021

    Article  Google Scholar 

  10. Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Skadron, K.: A performance study of general-purpose applications on graphics processors using CUDA. J. Parallel Distrib. Comput. 68 (2008). https://doi.org/10.1016/j.jpdc.2008.05.014

  11. Moreira-Matias, L., Gama, J., Ferreira, M., Mendes-Moreira, J., Damas, L.: Predicting taxi-passenger demand using streaming data. IEEE Trans. Intell. Transp. Syst. 14 (2013). https://doi.org/10.1109/TITS.2013.2262376

  12. Mustafa, H., Barrus, C., Leal, E., Gruenwald, L.: Gtraclus: A local trajectory clustering algorithm for GPUS. In: 2021 IEEE 37th International Conference on Data Engineering Workshops (ICDEW), pp. 30–35 (2021). https://doi.org/10.1109/ICDEW53142.2021.00013

  13. Nvidia: Cuda C++ Programming Guide Toolkit Documentation. https://docs.nvidia.com/cuda/cuda-c-programming-guide/. Accessed 11 Oct 2020

  14. Nvidia: Cuda C++ Best Practices Guide. https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html. Accessed 11 Oct 2020

  15. Jørgensen, J.R., Scheel, K., Assent, I., Pathak, A.R., Elster, A.C.: GPU-FAST-PROCLUS: a fast GPU-parallelized approach to projected clustering. In: EDBT, pp. 2–196 (2022). https://doi.org/10.48786/edbt.2022.09

  16. Jørgensen, J.R., Scheel, K., Assent, I.: GPU-INSCY: A GPU-parallel algorithm and tree structure for efficient density-based subspace clustering. In: EDBT, pp. 25–36 (2021). https://doi.org/10.5441/002/edbt.2021.04

  17. Thapa, R.J., Trefftz, C., Wolffe, G.: Memory-efficient implementation of a graphics processor-based cluster detection algorithm for large spatial databases. In: 2010 IEEE International Conference on Electro/Information Technology, pp. 1–5 (2010). https://doi.org/10.1109/EIT.2010.5612134

  18. Böhm, C., Noll, R., Plant, C., Wackersreuther, B.: Density-based clustering using graphics processors. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 661–670 (2009). https://doi.org/10.1145/1645953.1646038

  19. Poudel, M., Gowanlock, M.: CUDA-DClust+: Revisiting early GPU-accelerated DBSCAN clustering designs. In: 2021 IEEE 28th International Conference on High Performance Computing, Data, and Analytics (HiPC), pp. 354–363 (2021). https://doi.org/10.1109/HiPC53243.2021.00049

  20. Loh, W.-K., Yu, H.: Fast density-based clustering through dataset partition using graphics processing units. Inf. Sci. 308, 94–112 (2015). https://doi.org/10.1016/j.ins.2014.10.023

    Article  Google Scholar 

  21. Prokopenko, A., Lebrun-Grandié, D., Arndt, D.: Fast tree-based algorithms for DBSCAN on GPUS. CoRR arXiv:2103.05162 (2021)

  22. Mustafa, H., Leal, E., Gruenwald, L.: An experimental comparison of GPU techniques for DBSCAN clustering. In: 2019 IEEE International Conference on Big Data (Big Data), pp. 3701–3710 (2019). https://doi.org/10.1109/BigData47090.2019.9006169

  23. Gaffney, S., Smyth, P.: Trajectory clustering with mixtures of regression models. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’99, pp. 63–72. Association for Computing Machinery, New York, NY, USA (1999). https://doi.org/10.1145/312129.312198

  24. Zheng, Y.: Trajectory data mining: an overview. ACM Trans. Intell. Syst. Technol. 6(3) (2015). https://doi.org/10.1145/2743025

  25. Gaffney, S.J., Robertson, A.W., Smyth, P., Camargo, S.J., Ghil, M.: Probabilistic clustering of extratropical cyclones using regression mixture models. Clim. Dyn. 29 (2007). https://doi.org/10.1007/s00382-007-0235-z

  26. Li, Z., Lee, J.G., Li, X., Han, J.: Incremental Clustering for Trajectories, vol. 5982 LNCS (2010). https://doi.org/10.1007/978-3-642-12098-5_3

  27. Pelekis, N., Kopanakis, I., Kotsifakos, E.E., Frentzos, E., Theodoridis, Y.: Clustering uncertain trajectories. Knowl. Inf. Syst. 28 (2011). https://doi.org/10.1007/s10115-010-0316-x

  28. Roh, G.-P., Hwang, S.-W.: Nncluster: An efficient clustering algorithm for road network trajectories. In: Kitagawa, H., Ishikawa, Y., Li, Q., Watanabe, C. (eds.) Database Systems for Advanced Applications, pp. 47–61. Springer, Berlin (2010). https://doi.org/10.1007/978-3-642-12098-5_4

  29. Zhang, X., Niu, X., Fournier-Viger, P., Wang, B.: Two-stage traffic clustering based on HNSW. In: Advances and Trends in Artificial Intelligence. Theory and Practices in Artificial Intelligence: 35th International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2022, Kitakyushu, Japan, July 19–22, 2022, Proceedings, pp. 609–620. Springer, Berlin (2022). https://doi.org/10.1007/978-3-031-08530-7_51

  30. Deng, Z., Hu, Y., Zhu, M., Huang, X., Du, B.: A scalable and fast optics for clustering trajectory big data. Cluster Comput. 18 (2015). https://doi.org/10.1007/s10586-014-0413-9

  31. Gudmundsson, J., Valladares, N.: A GPU approach to subtrajectory clustering using the fréchet distance. IEEE Trans. Parallel Distrib. Syst. 26 (2015). https://doi.org/10.1109/TPDS.2014.2317713

  32. Harish, P., Narayanan, P.J.: Accelerating large graph algorithms on the gpu using cuda. In: Aluru, S., Parashar, M., Badrinath, R., Prasanna, V.K. (eds.) High Performance Computing—HiPC 2007, pp. 197–208. Springer, Berlin (2007). https://doi.org/10.1007/978-3-540-77220-0_21

  33. Min, S.W., Mailthody, V.S., Qureshi, Z., Xiong, J., Ebrahimi, E., Hwu, W.: Emogi: Efficient memory-access for out-of-memory graph-traversal in GPUS. Proc. VLDB Endow. 14(2), 114–127 (2020). https://doi.org/10.14778/3425879.3425883

  34. Andrade, G., Ramos, G., Madeira, D., Sachetto, R., Ferreira, R., Rocha, L.: G-dbscan: A GPU accelerated algorithm for density-based clustering. Procedia Comput. Sci. 18, 369–378 (2013). https://doi.org/10.1016/j.procs.2013.05.200. 2013 International Conference on Computational Science

    Article  Google Scholar 

  35. Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. KDD’96, pp. 226–231. AAAI Press, Portland, Oregon (1996). https://doi.org/10.5555/3001460.3001507

  36. Merrill, D., Garland, M., Grimshaw, A.: Scalable GPU graph traversal. SIGPLAN Not. 47(8), 117–128 (2012). https://doi.org/10.1145/2370036.2145832

    Article  Google Scholar 

  37. Merrill, D., Garland, M., Grimshaw, A.: Scalable GPU graph traversal. In: Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. PPoPP ’12, pp. 117–128. Association for Computing Machinery, New York, NY, USA (2012). https://doi.org/10.1145/2145816.2145832

  38. Song, H., Lee, J.-G.: RP-DBSCAN: A superfast parallel DBSCAN algorithm based on random partitioning. In: Proceedings of the 2018 International Conference on Management of Data, pp. 1173–1187 (2018). https://doi.org/10.1145/3183713.3196887

Download references

Acknowledgements

This work is supported in part by the National Science Foundation under Grant Nos. 1302439 and 1302423.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed equally to the manuscript.

Corresponding author

Correspondence to Eleazar Leal.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mustafa, H., Barrus, C., Leal, E. et al. GTraclus: a novel algorithm for local trajectory clustering on GPUs. Distrib Parallel Databases 41, 467–488 (2023). https://doi.org/10.1007/s10619-023-07429-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-023-07429-x

Keywords

Navigation