Skip to main content
Log in

Parallel grid-based density peak clustering of big trajectory data

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

With the widespread adoption of data intensive applications such as navigation systems for mobile devices and unmanned vehicles, analyzing trajectory data has become a key research area. One of the main tasks is trajectory clustering, which consists of automatically grouping similar trajectories into clusters. To perform this task, Density Peak Clustering (DPC) is widely used due to its speed and small number of artificial parameters. However, a major problem is that its performance does not scale well for large datasets. To address this issue, this paper proposes an efficient parallel trajectory clustering algorithm, named Tra-PDPC (Trajectory-Parallel DPC). It is applied in three steps, namely trajectory division and partition, trajectory similarity calculation, and clustering. Those steps are all designed to run in a distributed fashion using the Spark programming model. For the first step, a scheme is proposed to divide sub-trajectories based on local grid area density. Then, a combined similarity measurement method based on Euclidean space and grid space is defined for sub-trajectories similarity calculation. Finally, a version of DPC is applied, which dramatically improves clustering speed. Experiments on multiple large realistic trajectory datasets have demonstrated that the proposed Tra-PDPC algorithm can considerably decrease runtime while providing a high accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Chamseddine A, Zhang Y, Rabbath C A, Join C (2012) Flatness-based trajectory planning/replanning for a quadrotor unmanned aerial vehicle. IEEE Trans Aerosp Electron Syst 48 (4):2832–2848. https://doi.org/10.1109/TAES.2012.6324664

    Article  Google Scholar 

  2. Chia-Ho O, He Wei-Lun (2012) Path planning algorithm for mobile anchor-based localization in wireless sensor networks. IEEE Sensors J 13(2):466–475. https://doi.org/10.1109/JSEN.2012.2218100

    Google Scholar 

  3. Zhiming G, Haipeng Y, Yunlong T (2016) Locating traffic hot routes from massive taxi tracks in clusters. J Inf Sci Eng 32(1):113–131. https://doi.org/10.1109/JISE.2016.1122893

    Google Scholar 

  4. Hao T, Jian S, Kai L (2016) A smart low-consumption Iot framework for location tracking and its real application. In: Proceedings of the 6th International Conference on Electronics Information and Emergency Communication, pp 306–309. https://doi.org/10.1109/ICEIEC.2016.7589744

  5. Yanwei Y, Jindong Z, Xiaodong W, Qin W (2015) Cludoop: an efficient distributed density-based clustering for big data using hadoop. Int J Distrib Sens Netw 11(6):1–13. https://doi.org/10.1155/2015/579391

    Google Scholar 

  6. Tampakis P, Pelekis N, Doulkeridis C, Theodoridis Y (2019) Scalable distributed subtrajectory clustering. In: Proceedings of the 4th IEEE International Conference on Big Data, pp 950–959. https://doi.org/10.1109/BigData47090.2019.9005563

  7. Jae-Gil L, Jiawei H, Kyu-Young W (2007) Trajectory clustering: a partition-and-group framework. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, pp 593–604. https://doi.org/10.1145/1247480.1247546

  8. Wanqi Y, Yang G, Longbing C (2013) TRASMIL: A local anomaly detection framework based on trajectory segmentation and multi-instance learning. Comput Vis Image Underst 117(10):1273–1286. https://doi.org/10.1016/j.cviu.2012.08.010

    Article  Google Scholar 

  9. Min X (2014) EDS: A segment-based distance measure for sub-trajectory similarity search. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp 1609–1610. https://doi.org/10.1145/2588555.2612665

  10. Bergroth L, Hakonen H, Raita T (2000) A survey of longest common subsequence algorithms. In: Proceedings of the 7th International Symposium on String Processing and Information Retrieval, pp 39–48. https://doi.org/10.1109/SPIRE.2000.878178

  11. Yushun W, Peng L, Hanhai Z, Xiaoping W (2014) Using DTW to measure trajectory distance in grid space. In: Proceedings of the 4th IEEE International Conference on Information Science and Technology, pp 152–155. https://doi.org/10.1109/ICIST.2014.6920353

  12. Adil F, Najlaa A, Zahir T, Abdullah A (2014) A Survey of clustering algorithms for big data: taxonomy and empirical analysis. IEEE Trans Emerging Top Comput 2(3):267–279. https://doi.org/10.1109/TETC.2014.2330519

    Article  Google Scholar 

  13. Binfeng W, Li T, Chao G, Dawen X (2014) Dividing traffic sub-areas based on a parallel K-Means algorithm. In: Proceedings of the 2014 International Conference on Knowledge Science, Engineering and Management, pp 127–137. https://doi.org/10.1007/978-3-319-12096-6_12

  14. Qian H, Yiting C, Qinghe D, Dongsheng C (2017) A Parallel clustering and test partitioning techniques based mining trajectory algorithm for moving objects. In: Proceedings of the 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery, pp 455–462. https://doi.org/10.1109/FSKD.2017.8393312

  15. Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344 (6191):1492. https://doi.org/10.1126/science.1242072

    Article  Google Scholar 

  16. Hua Z, Ruimin H, Yimin W, Qingming L (2012) A novel method of similarity search for moving object trajectories. In: Proceedings of the 2012 International Conference on Automatic Control and Artificial Intelligence, pp 235–238. https://doi.org/10.1049/cp.2012.0962

  17. Yunhong Z, Xinzheng N, Fournier-Viger P (2020) Distributed density peak clustering of trajectory data on Spark. In: Proceedings of the 33rd International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, pp 792–804. https://doi.org/10.1007/978-3-030-55789-8_68

  18. Hua Y, Yu Q, Baojun M, Qiang W (2014) From trajectories to path network: an endpoints-Based GPS trajectory partition and clustering framework. In: Proceedings of the 15th International Conference on Web-Age Information Management, pp 740–743. https://doi.org/10.1007/978-3-319-08010-9_80

  19. Costas P, Nikos P, Ioannis K, Emmanuel R (2012) Segmentation and sampling of moving object trajectories based on representativeness. IEEE Trans Knowl Data Eng 24(7):1328–1343. https://doi.org/10.1109/TKDE.2011.39

    Article  Google Scholar 

  20. Yasushi S, Masatoshi Y, Christos F (2005) FTW: fast similarity search under the time warping distance. In: Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp 326–337. https://doi.org/10.1145/1065167.1065210

  21. Genolini C, Pingault B, Driss T (2013) Kml3d: A non-parametric algorithm for clustering joint trajectories. Comput Methods Prog Biomed 109(1):104–111. https://doi.org/10.1016/j.cmpb.2012.08.016

    Article  Google Scholar 

  22. Mei Yeen C, Lorita A, Renee C, Kiam Y (2018) Modeling of vehicle trajectory using K-means and fuzzy C-means clustering. In: Proceedings of the 2018 IEEE International Conference on Artificial Intelligence in Engineering and Technology, pp 1–6. https://doi.org/10.1109/IICAIET.2018.8638471

  23. Dongzhi Z, Kyungmi L, Lckjai L (2018) Hierarchical trajectory clustering for spatio-temporal periodic pattern mining. Expert Syst Appl 92(1):1–11. https://doi.org/10.1016/j.eswa.2017.09.040

    Google Scholar 

  24. Pierpaolo D, Livia De G, Riccardo M (2018) Robust fuzzy clustering of multivariate time trajectories. Int J Approx Reason 99(1):12–38. https://doi.org/10.1016/j.ijar.2018.05.002

    MathSciNet  MATH  Google Scholar 

  25. Liu Liangxu, Song Jiatao, Bo G, Zhaoxiao W (2012) Tra-DBScan: A algorithm of clustering trajectories. Appl Mech Mater 121-126:4875–4879. https://doi.org/10.4028/www.scientific.net/AMM.121-126.4875

    Article  Google Scholar 

  26. Ailin H, Zhong L, Dechao Z (2019) Movement pattern extraction based on a non-parameter sub-trajectory clustering algorithm. In: Proceedings of the 4th IEEE International Conference on Big Data Analytics, pp 5-9. https://doi.org/10.1109/ICBDA.2019.8713239

  27. Silva T, Zeitouni K (2016) Online clustering of trajectory data stream. In: Proceedings of the 17th IEEE International Conference on Mobile Data Management, pp 112–121. https://doi.org/10.1109/MDM.2016.28

  28. Shein T, Puntheeranurak S, Imamura M (2020) Discovery of evolving companion from trajectory data streams. Knowl Inf Syst 62(9):3509–3533. https://doi.org/10.1007/s10115-020-01471-2

    Article  Google Scholar 

  29. Weiming H, Xi L, Guodong T, Maybank S (2013) An incremental DPMM-based method for trajectory clustering, modeling, and retrieval. IEEE Trans Pattern Anal Mach Intell 35(5):1051–1065. https://doi.org/10.1109/TPAMI.2012.188

    Article  Google Scholar 

  30. Jing Z, Guodong Y, Xiang W, Zhitao H (2018) Incremental frequent sub-trajectory mining based on dual division. In: Proceedings of the 2018 IEEE International Conference on Signal Processing, Communications and Computing, pp 1–5. https://doi.org/10.1109/ICSPCC.2018.8567805

  31. Eleazar L, Le G (2018) DynMDL: A parallel trajectory segmentation algorithm. In: Proceedings of the 2018 IEEE International Congress on Big Data, pp 215-218. https://doi.org/10.1109/BigDataCongress.2018.00036

  32. Shuo S, Lisi C, Zhewei W, Chistian S. J. (2018) Parallel trajectory similarity joins in spatial networks. VLDB J 27(3):395–420. https://doi.org/10.1007/s00778-018-0502-0

    Article  Google Scholar 

  33. Dawen X, Binfeng W, Yantao L, Zhuobo R (2015) An efficient mapreduce-based parallel clustering algorithm for distributed traffic subarea division. Dyn Nat Soc 2015(6018):1–18. https://doi.org/10.1155/2015/793010

    MATH  Google Scholar 

  34. Chunchun H, Xionghua K, Nianxue L, Qiansheng Z (2015) Parallel clustering of big data of spatio-temporal trajectory. In: Proceedings of the 11th International Conference on Natural Computation, pp 769–774. https://doi.org/10.1109/ICNC.2015.7378088

  35. Xiaoming L, Luxi D, Chunlin S, Xiangda W (2020) An improved high-Density sub trajectory clustering algorithm. IEEE Access 8(1):46041–46054. https://doi.org/10.1109/ACCESS.2020.2974059

    Google Scholar 

  36. Ze D, Yangyang H, Mao Z, Xiaohui H (2015) A scalable and fast OPTICS for clustering trajectory big data. Clust Comput 18(1):549–562. https://doi.org/10.1007/s10586-014-0413-9

    Google Scholar 

  37. Yongyi X, Yan L, Chuanfei X (2016) Parallel gathering discovery over big trajectory data. In: Proceedings of the 2016 IEEE International Conference on Big Data, pp 783–792. https://doi.org/10.1109/BigData.2016.7840671

  38. Min W, Genlin J, Bin Z, Mengmeng T (2015) A parallel clustering algorithm based on grid index for spatio-temporal trajectories. In: Proceedings of the 3rd International Conference on Advanced Cloud and Big Data, pp 319–326. https://doi.org/10.1109/CBD.2015.58

  39. Zhihua C, Jianming G, Qing L (2017) DBSCAN algorithm clustering for massive AIS data based on the hadoop platform. In: Proceedings of the 2017 International Conference on Industrial Informatics - Computing Technology, Intelligent Technology, Industrial Information Integration, pp 25–28. https://doi.org/10.1109/ICIICII.2017.72

  40. Rui L, Xiaoge L, Liping D, Shuting Z (2017) Parallel implementation of density peaks clustering algorithm based on Spark. Procedia Comput Sci 107(1):442–447. https://doi.org/10.1016/j.procs.2017.03.138

    Google Scholar 

  41. Behrooz H, Kourosh K (2018) A robust distributed big data clustering-based on adaptive density partitioning using apache Spark. Symmetry 10(8):342-. https://doi.org/10.3390/sym10080342

  42. Ne W, Shu G, Xiangwen P, Minrui W (2018) Research on fast and parallel clustering method for trajectory data. In: Proceedings of the 24th IEEE International Conference on Parallel and Distributed Systems, pp 252–258. https://doi.org/10.1109/PADSW.2018.8644631

  43. Davies D, Don B (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1(2):224–227. https://doi.org/10.1109/TPAMI.1979.4766909

    Article  Google Scholar 

Download references

Acknowledgments

This research is sponsored by the Science and Technology Planning Project of Sichuan Province under Grant No. 2020YFG0054, and the Joint Funds of the Ministry of Education of China.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xinzheng Niu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Emerging topics in Applied Intelligence selected from IEA/AIE2020

Guest Editors: Hamido Fujita, Philippe Fournier-Viger and Moonis Ali

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Niu, X., Zheng, Y., Fournier-Viger, P. et al. Parallel grid-based density peak clustering of big trajectory data. Appl Intell 52, 17042–17057 (2022). https://doi.org/10.1007/s10489-021-02757-w

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02757-w

Keywords

Navigation