Skip to main content
Log in

A scalable and fast OPTICS for clustering trajectory big data

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Clustering trajectory data is an important way to mine hidden information behind moving object sampling data, such as understanding trends in movement patterns, gaining high popularity in geographic information and so on. In the era of ‘Big data’, the current approaches for clustering trajectory data generally do not apply for excessive costs in both scalability and computing performance for trajectory big data. Aiming at these problems, this study first proposes a new clustering algorithm for trajectory big data, namely Tra-POPTICS by modifying a scalable clustering algorithm for point data (POPTICS). Tra-POPTICS has employed the spatiotemporal distance function and trajectory indexing to support trajectory data. Tra-POPTICS can process the trajectory big data in a distributed manner to meet a great scalability. Towards providing a fast solution to clustering trajectory big data, this study has explored the feasibility to utilize the contemporary general-purpose computing on the graphics processing unit (GPGPU). The GPGPU-aided clustering approach parallelized the Tra-POPTICS with the Hyper-Q feature of Kelper GPU and massive GPU threads. The experimental results indicate that (1) the Tra-POPTICS algorithm has a comparable clustering quality with T-OPTICS (the state of art work of clustering trajectories in a centralized fashion) and outperforms T-OPTICS by average four times in terms of scalability, and (2) the G-Tra-POPTICS has a comparable clustering quality with T-POPTICS as well and further gains about 30 speedup on average for clustering trajectories comparing to Tra-POPTICS with eight threads. The proposed algorithms exhibit great scalability and computing performance in clustering trajectory big data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Akodjènou-Jeannin, M.I., Salamatian, K., Gallinari, P.: Flexible grid-based clustering. LNAI 4702, 350–357 (2007)

    Google Scholar 

  2. Alhamazani, K., Ranjan, R., Jayaraman, P.P., Mitra, K., Wang, M., Huang, Z.G., Wang, L., Rabhi, F.A.: Real-time qos monitoring for cloud-based big data analytics applications in mobile environments. In: IEEE international conference on mobile data management, pp. 661–670 (2014)

  3. Alon, J., Sclaroff, S., Kollios, G., Pavlovic, V.: Discovering clusters in motion time-series data. In: IEEE conference on computer vision and pattern recognition, pp. 375–381 (2003)

  4. Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: Optics: ordering points to identify the clustering structure. In: ACM SIGMOD international conference on management of data, pp. 49–60 (1999)

  5. Birant, D., Kut, A.: St-dbscan: an algorithm for clustering spatial temporal data. Data Knowl. Eng. 60, 208–221 (2007)

    Article  Google Scholar 

  6. BÖhm, C., Noll, R., Plant, C., Wackersreuther, B.: Density-based clustering using graphics processors. In: ACM international conference on information and knowledge management, pp. 661–670 (2009)

  7. Camargo, S.J., Robertson, A.W., Gaffney, C.J., Smyth, P., Ghil, M.: Cluster analysis of typhoon tracks. Part ii: large-scale circulation and enso. J. Clim. 20, 3654–3676 (2007)

    Article  Google Scholar 

  8. Chawla, S., Zheng, Y., Hu, J.: Inferring the root cause in road traffic anomalies. In: International conference on data mining, pp. 141–150 (2012)

  9. Chen, D., Li, X., Wang, L., Khan, S., Wang, J., Zeng, K., Cai, C.: Fast and scalable multi-way analysis of massive neural data. IEEE Trans. Comput. 63 (2014).

  10. Chen, L., Özsu, M.T., Oria, V.: Robust and fast similarity search for moving object trajectories. In: ACM SIGMOD international conference on management of data, pp. 491–502 (2005)

  11. Chen, D., Wang, L., Zomaya, A.Y., Dou, M., Chen, J., Deng, Z., Hariri, S.: Parallel simulation of complex evacuation scenarios with adaptive agent models. IEEE Trans. Parallel Distrib. Syst. 25 (2014)

  12. Chen, D., Li, X., Cui, D., Wang, L., Lu, D.: Global synchronization measurement of multivariate neural signals with massively parallel nonlinear interdependence analysis. IEEE Trans. Neural Syst. Rehabil. Eng. 22, 33–43 (2014)

    Article  Google Scholar 

  13. Chudova, D., Gaffney, S., Mjolsness, E., Smyth, P.: Translation-invariant mixture models for curve clustering. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp. 79–88 (2003)

  14. Defays, D.: An efficient algorithm for a complete link method. Comput. J. 20, 364–366 (1977)

    Article  MATH  MathSciNet  Google Scholar 

  15. Deng, Z., Wu, X., Wang\(\ast \), L., Chen, X., Ranjan, R., Zomaya, A., Chen\(\ast \), D.: Parallel processing of dynamic continuous queries over streaming data flows. IEEE Trans. Parallel Distrib. Syst. PrePrint

  16. Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd international conference on knowledge discovery and data mining, pp. 226–231 (1996)

  17. Ferreira, N., Silva, C., Klosowski, J.T., Scheidegger, C.: Vector field k-means: clustering trajectories by fitting multiple vector fields. Comput. Graph. Forum 32, 201–210 (2013)

    Article  Google Scholar 

  18. Frentzos, E., Gratsias, K., Theodoridis, Y.: Index-based most similar trajectory search. In: IEEE international conference on data engineering, pp. 816–825 (2007)

  19. Frentzos, E., Gratsias, K., Pelekis, N., Theodoridis, Y.: Algorithms for nearest neighbor search on moving object trajectories. Geoinformatica 11, 159–193 (2007)

    Article  Google Scholar 

  20. Geolife project (Microsoft Research Asia). http://research.microsoft.com/en-us/downloads/b16d359d-d164-469e-9fd4-daa38f2b2e13/ (2012)

  21. Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann, Burlington (2011)

    Google Scholar 

  22. Kisilevich, S., Mansmann, F., Nanni, M., Rinzivillo, S.: Spatio-temporal clustering. Data Mining and Knowledge Discovery Handbook, 2nd edn, pp. 855–874. Springer, New York (2010)

    Google Scholar 

  23. Kolodziej, J., Khan, S.U.: Multi-level hierarchical genetic-based scheduling of independent jobs in dynamic heterogeneous grid environment. Inf. Sci. 214, 1–19 (2012)

    Article  Google Scholar 

  24. Kołodziej, J., González-Vélez, H., Wang, L.: Advances in data-intensive modelling and simulation. Future Gener. Comput. Syst. 37, 282–283 (2014)

    Article  Google Scholar 

  25. Lee, J.G., Han, J., Whang, K.Y.: Trajectory clustering: a partition-and-group framework. In: ACM SIGMOD international conference on management of data, pp. 49–60 (2007)

  26. Liu, L., Song, J., Guan, B., Wu, Z., He, K.: Tra-dbscan: a algorithm of clustering trajectories. Front. Manuf. Des. Sci. II(121–126), 4875–4879 (2012)

    Google Scholar 

  27. Liu, H., Chen, S., Kubota, N.: Intelligent video systems and analytics: a survey. IEEE Trans. Ind. Inform. 9, 1222–1223 (2013)

  28. Liu, P., Yuan, T., Ma, Y., Wang, L., Liu, D., Yue, S., Kołodziej, J.: Parallel processing of massive remote sensing images in a GPU architecture. Comput. Inform. 33, 197–217 (2014)

    Google Scholar 

  29. Loh, W.K., Moon, Y.S., Park, Y.H.: Fast density-based clustering using graphics processing units. IEICE Trans. Inform. Syst. 97, 1349–1352 (2014)

    Article  Google Scholar 

  30. Ma, Y., Wang, L., Liu, D., Yuan, T., Liu, P., Zhang, W.: Distributed data structure templates for data-intensive remote sensing applications. Concurr. Comput. 25, 1784–1793 (2013)

    Article  Google Scholar 

  31. Ma, Y., Wang, L., Zomaya, A.Y., Chen, D., Ranjan, R.: Task-tree based large-scale mosaicking for massive remote sensed imageries with dynamic dag scheduling. IEEE Trans. Parallel Distrib. Syst. 25, 2126–2135 (2014)

    Article  Google Scholar 

  32. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, pp. 281–297 (1967)

  33. Nanni, M., Pedreschi, D.: Time-focused clustering of trajectories of moving objects. J. Intell. Inf. Syst. 27, 267–289 (2006)

    Article  Google Scholar 

  34. NVIDIA Corporation. KEPLER—THE WORLD’S FASTEST, MOST EFFICIENT HPC ARCHITECTURE. http://www.nvidia.com/object/nvidia-kepler.html (2013)

  35. Park, H.S., Jun, C.H.: A simple and fast algorithm for k-medoids clustering. Expert Syst. Appl. 36, 3336–3341 (2009)

    Article  Google Scholar 

  36. Patwary, M.M.A., Palsetia, D., Agrawal, A., Liao, W.K., Manne, F., Choudhary, A.: Scalable parallel optics data clustering using graph algorithmic techniques. In: The international conference for high performance computing, networking, storage and analysis, pp. 49:1–49:12 (2013)

  37. Pelekis, N., Kopanakis, I., Marketos, G., Ntoutsi, I., Andrienko, G., Theodoridis, Y.: Similarity search in trajectory databases. In: International symposium on temporal representation and reasoning, pp. 129–140 (2007)

  38. Pfoser, D., Jensen, C.S., Theodoridis, Y.: Novel approaches to the indexing of moving object trajectories. In: International conference on very large databases, pp. 395–406 (2000)

  39. Rinzivillo, S., Pedreschi, D., Nanni, M., Giannotti, F., Andrienko, N., Andrienko, G.: Visually driven analysis of movement data by progressive clustering. Inf. Vis. 7, 225–239 (2008)

    Article  Google Scholar 

  40. Shekhar, S., Evans, M.R., Gunturi, V., Yang, K.: Spatial big-data challenges intersecting mobility and cloud computing. In: ACM international workshop on data engineering for wireless and mobile access, pp. 1–6 (2012)

  41. Sibson, R.: Slink: an optimally efficient algorithm for the single-link cluster method. Comput. J. 16, 30–34 (1973)

    Article  MathSciNet  Google Scholar 

  42. Tang, L.A., Zheng, Y., Yuan, J., Han, J., Leung, A., Peng, W.C., Porta, T.L.: A framework of traveling companion discovery on trajectory data streams. ACM Trans. Intell. Syst. Technol. 5, 3:1–3:34 (2013)

    Article  Google Scholar 

  43. Vlachos, M., Kollios, G., Gunopulos, D.: Discovering similar multidimensional trajectories. In: IEEE international conference on data engineering, pp. 673–684 (2002)

  44. Wang, L., von Laszewski, G., Younge, A.J., He, X., Kunze, M., Tao, J., Fu, C.: Cloud computing: a perspective study. New Gener. Comput. 28, 137–146 (2010)

    Article  MATH  Google Scholar 

  45. Wang, L., Chen, D., Hu, Y., Ma, Y., Wang, J.: Towards enabling cyberinfrastructure as a service in clouds. Comput. Electr. Eng. 39, 3–14 (2013)

    Article  Google Scholar 

  46. Wang, L., Lu, K., Liu, P., Ranjan, R., Chen, L.: Ik-svd: dictionary learning for spatial big data via incremental atom update. Comput. Sci. Eng. 16, 41–52 (2014)

    Article  Google Scholar 

  47. Wei, J., Liu, D., Wang, L.: A general metric and parallel framework for adaptive image fusion in clusters. Concurr. Comput. 26, 1375–1387 (2014)

    Article  Google Scholar 

  48. Wu, H.R., Yeh, M.Y., Chen, M.S.: Profiling moving objects by dividing and clustering trajectories spatiotemporally. IEEE Trans. Knowl. Data Eng. 25, 2615–2628 (2013)

    Article  Google Scholar 

  49. Xue, W., Yang, C., Fu, H., Wang, X., Xu, Y., Gan, L., Lu, Y., Zhu, X.: Enabling and scaling a global shallow-water atmospheric model on tianhe-2. In: International parallel and distributed processing symposium, pp. 745–754 (2014)

  50. Yuan, N.J., Zheng, Y., Zhang, L., Xie, X.: T-finder: a recommender system for finding passengers and vacant taxis. IEEE Trans. Knowl. Data Eng. 25, 2390–2401 (2013)

    Article  Google Scholar 

  51. Zhao, J., Wang, L., Tao, J., Chen, J., Sun, W., Ranjan, R., Kołodziej, J., Streit, A., Georgakopoulos, D.: A security framework in g-hadoop for bigdata computing across distributed cloud data centres. J. Comput. Syst. Sci. 80, 994–1007 (2014)

    Article  MATH  Google Scholar 

  52. Zoumpatianos, K., Idreos, S., Palpanas, T.: Indexing for interactive exploration of big data series. In: ACM SIGMOD international conference on management of data, pp. 1555–1566 (2014)

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (Nos. 61272314, 61361120098, 61440018) the Program for New Century Excellent Talents in University (NCET-11-0722), the Excellent Youth Foundation of Hubei Scientific Committee (No. 2012FFA025), the China Postdoctoral Science Foundation (2014M552112), the Fundamental Research Funds for the National University, China University of Geosciences (Wuhan) (Nos. CUG120114, CUG130617, 1410491B17), Beijing Microelectronics Technology Institute under the University Research Programme (No. BM-KJ-FK-WX-20130731-0013), the Hubei Natural Science Foundation (No. 2014CF- B904).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ze Deng.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Deng, Z., Hu, Y., Zhu, M. et al. A scalable and fast OPTICS for clustering trajectory big data. Cluster Comput 18, 549–562 (2015). https://doi.org/10.1007/s10586-014-0413-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-014-0413-9

Keywords

Navigation