Abstract
Clustering trajectory data is an important way to mine hidden information behind moving object sampling data, such as understanding trends in movement patterns, gaining high popularity in geographic information and so on. In the era of ‘Big data’, the current approaches for clustering trajectory data generally do not apply for excessive costs in both scalability and computing performance for trajectory big data. Aiming at these problems, this study first proposes a new clustering algorithm for trajectory big data, namely Tra-POPTICS by modifying a scalable clustering algorithm for point data (POPTICS). Tra-POPTICS has employed the spatiotemporal distance function and trajectory indexing to support trajectory data. Tra-POPTICS can process the trajectory big data in a distributed manner to meet a great scalability. Towards providing a fast solution to clustering trajectory big data, this study has explored the feasibility to utilize the contemporary general-purpose computing on the graphics processing unit (GPGPU). The GPGPU-aided clustering approach parallelized the Tra-POPTICS with the Hyper-Q feature of Kelper GPU and massive GPU threads. The experimental results indicate that (1) the Tra-POPTICS algorithm has a comparable clustering quality with T-OPTICS (the state of art work of clustering trajectories in a centralized fashion) and outperforms T-OPTICS by average four times in terms of scalability, and (2) the G-Tra-POPTICS has a comparable clustering quality with T-POPTICS as well and further gains about 30 speedup on average for clustering trajectories comparing to Tra-POPTICS with eight threads. The proposed algorithms exhibit great scalability and computing performance in clustering trajectory big data.
Similar content being viewed by others
References
Akodjènou-Jeannin, M.I., Salamatian, K., Gallinari, P.: Flexible grid-based clustering. LNAI 4702, 350–357 (2007)
Alhamazani, K., Ranjan, R., Jayaraman, P.P., Mitra, K., Wang, M., Huang, Z.G., Wang, L., Rabhi, F.A.: Real-time qos monitoring for cloud-based big data analytics applications in mobile environments. In: IEEE international conference on mobile data management, pp. 661–670 (2014)
Alon, J., Sclaroff, S., Kollios, G., Pavlovic, V.: Discovering clusters in motion time-series data. In: IEEE conference on computer vision and pattern recognition, pp. 375–381 (2003)
Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: Optics: ordering points to identify the clustering structure. In: ACM SIGMOD international conference on management of data, pp. 49–60 (1999)
Birant, D., Kut, A.: St-dbscan: an algorithm for clustering spatial temporal data. Data Knowl. Eng. 60, 208–221 (2007)
BÖhm, C., Noll, R., Plant, C., Wackersreuther, B.: Density-based clustering using graphics processors. In: ACM international conference on information and knowledge management, pp. 661–670 (2009)
Camargo, S.J., Robertson, A.W., Gaffney, C.J., Smyth, P., Ghil, M.: Cluster analysis of typhoon tracks. Part ii: large-scale circulation and enso. J. Clim. 20, 3654–3676 (2007)
Chawla, S., Zheng, Y., Hu, J.: Inferring the root cause in road traffic anomalies. In: International conference on data mining, pp. 141–150 (2012)
Chen, D., Li, X., Wang, L., Khan, S., Wang, J., Zeng, K., Cai, C.: Fast and scalable multi-way analysis of massive neural data. IEEE Trans. Comput. 63 (2014).
Chen, L., Özsu, M.T., Oria, V.: Robust and fast similarity search for moving object trajectories. In: ACM SIGMOD international conference on management of data, pp. 491–502 (2005)
Chen, D., Wang, L., Zomaya, A.Y., Dou, M., Chen, J., Deng, Z., Hariri, S.: Parallel simulation of complex evacuation scenarios with adaptive agent models. IEEE Trans. Parallel Distrib. Syst. 25 (2014)
Chen, D., Li, X., Cui, D., Wang, L., Lu, D.: Global synchronization measurement of multivariate neural signals with massively parallel nonlinear interdependence analysis. IEEE Trans. Neural Syst. Rehabil. Eng. 22, 33–43 (2014)
Chudova, D., Gaffney, S., Mjolsness, E., Smyth, P.: Translation-invariant mixture models for curve clustering. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp. 79–88 (2003)
Defays, D.: An efficient algorithm for a complete link method. Comput. J. 20, 364–366 (1977)
Deng, Z., Wu, X., Wang\(\ast \), L., Chen, X., Ranjan, R., Zomaya, A., Chen\(\ast \), D.: Parallel processing of dynamic continuous queries over streaming data flows. IEEE Trans. Parallel Distrib. Syst. PrePrint
Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd international conference on knowledge discovery and data mining, pp. 226–231 (1996)
Ferreira, N., Silva, C., Klosowski, J.T., Scheidegger, C.: Vector field k-means: clustering trajectories by fitting multiple vector fields. Comput. Graph. Forum 32, 201–210 (2013)
Frentzos, E., Gratsias, K., Theodoridis, Y.: Index-based most similar trajectory search. In: IEEE international conference on data engineering, pp. 816–825 (2007)
Frentzos, E., Gratsias, K., Pelekis, N., Theodoridis, Y.: Algorithms for nearest neighbor search on moving object trajectories. Geoinformatica 11, 159–193 (2007)
Geolife project (Microsoft Research Asia). http://research.microsoft.com/en-us/downloads/b16d359d-d164-469e-9fd4-daa38f2b2e13/ (2012)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann, Burlington (2011)
Kisilevich, S., Mansmann, F., Nanni, M., Rinzivillo, S.: Spatio-temporal clustering. Data Mining and Knowledge Discovery Handbook, 2nd edn, pp. 855–874. Springer, New York (2010)
Kolodziej, J., Khan, S.U.: Multi-level hierarchical genetic-based scheduling of independent jobs in dynamic heterogeneous grid environment. Inf. Sci. 214, 1–19 (2012)
Kołodziej, J., González-Vélez, H., Wang, L.: Advances in data-intensive modelling and simulation. Future Gener. Comput. Syst. 37, 282–283 (2014)
Lee, J.G., Han, J., Whang, K.Y.: Trajectory clustering: a partition-and-group framework. In: ACM SIGMOD international conference on management of data, pp. 49–60 (2007)
Liu, L., Song, J., Guan, B., Wu, Z., He, K.: Tra-dbscan: a algorithm of clustering trajectories. Front. Manuf. Des. Sci. II(121–126), 4875–4879 (2012)
Liu, H., Chen, S., Kubota, N.: Intelligent video systems and analytics: a survey. IEEE Trans. Ind. Inform. 9, 1222–1223 (2013)
Liu, P., Yuan, T., Ma, Y., Wang, L., Liu, D., Yue, S., Kołodziej, J.: Parallel processing of massive remote sensing images in a GPU architecture. Comput. Inform. 33, 197–217 (2014)
Loh, W.K., Moon, Y.S., Park, Y.H.: Fast density-based clustering using graphics processing units. IEICE Trans. Inform. Syst. 97, 1349–1352 (2014)
Ma, Y., Wang, L., Liu, D., Yuan, T., Liu, P., Zhang, W.: Distributed data structure templates for data-intensive remote sensing applications. Concurr. Comput. 25, 1784–1793 (2013)
Ma, Y., Wang, L., Zomaya, A.Y., Chen, D., Ranjan, R.: Task-tree based large-scale mosaicking for massive remote sensed imageries with dynamic dag scheduling. IEEE Trans. Parallel Distrib. Syst. 25, 2126–2135 (2014)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, pp. 281–297 (1967)
Nanni, M., Pedreschi, D.: Time-focused clustering of trajectories of moving objects. J. Intell. Inf. Syst. 27, 267–289 (2006)
NVIDIA Corporation. KEPLER—THE WORLD’S FASTEST, MOST EFFICIENT HPC ARCHITECTURE. http://www.nvidia.com/object/nvidia-kepler.html (2013)
Park, H.S., Jun, C.H.: A simple and fast algorithm for k-medoids clustering. Expert Syst. Appl. 36, 3336–3341 (2009)
Patwary, M.M.A., Palsetia, D., Agrawal, A., Liao, W.K., Manne, F., Choudhary, A.: Scalable parallel optics data clustering using graph algorithmic techniques. In: The international conference for high performance computing, networking, storage and analysis, pp. 49:1–49:12 (2013)
Pelekis, N., Kopanakis, I., Marketos, G., Ntoutsi, I., Andrienko, G., Theodoridis, Y.: Similarity search in trajectory databases. In: International symposium on temporal representation and reasoning, pp. 129–140 (2007)
Pfoser, D., Jensen, C.S., Theodoridis, Y.: Novel approaches to the indexing of moving object trajectories. In: International conference on very large databases, pp. 395–406 (2000)
Rinzivillo, S., Pedreschi, D., Nanni, M., Giannotti, F., Andrienko, N., Andrienko, G.: Visually driven analysis of movement data by progressive clustering. Inf. Vis. 7, 225–239 (2008)
Shekhar, S., Evans, M.R., Gunturi, V., Yang, K.: Spatial big-data challenges intersecting mobility and cloud computing. In: ACM international workshop on data engineering for wireless and mobile access, pp. 1–6 (2012)
Sibson, R.: Slink: an optimally efficient algorithm for the single-link cluster method. Comput. J. 16, 30–34 (1973)
Tang, L.A., Zheng, Y., Yuan, J., Han, J., Leung, A., Peng, W.C., Porta, T.L.: A framework of traveling companion discovery on trajectory data streams. ACM Trans. Intell. Syst. Technol. 5, 3:1–3:34 (2013)
Vlachos, M., Kollios, G., Gunopulos, D.: Discovering similar multidimensional trajectories. In: IEEE international conference on data engineering, pp. 673–684 (2002)
Wang, L., von Laszewski, G., Younge, A.J., He, X., Kunze, M., Tao, J., Fu, C.: Cloud computing: a perspective study. New Gener. Comput. 28, 137–146 (2010)
Wang, L., Chen, D., Hu, Y., Ma, Y., Wang, J.: Towards enabling cyberinfrastructure as a service in clouds. Comput. Electr. Eng. 39, 3–14 (2013)
Wang, L., Lu, K., Liu, P., Ranjan, R., Chen, L.: Ik-svd: dictionary learning for spatial big data via incremental atom update. Comput. Sci. Eng. 16, 41–52 (2014)
Wei, J., Liu, D., Wang, L.: A general metric and parallel framework for adaptive image fusion in clusters. Concurr. Comput. 26, 1375–1387 (2014)
Wu, H.R., Yeh, M.Y., Chen, M.S.: Profiling moving objects by dividing and clustering trajectories spatiotemporally. IEEE Trans. Knowl. Data Eng. 25, 2615–2628 (2013)
Xue, W., Yang, C., Fu, H., Wang, X., Xu, Y., Gan, L., Lu, Y., Zhu, X.: Enabling and scaling a global shallow-water atmospheric model on tianhe-2. In: International parallel and distributed processing symposium, pp. 745–754 (2014)
Yuan, N.J., Zheng, Y., Zhang, L., Xie, X.: T-finder: a recommender system for finding passengers and vacant taxis. IEEE Trans. Knowl. Data Eng. 25, 2390–2401 (2013)
Zhao, J., Wang, L., Tao, J., Chen, J., Sun, W., Ranjan, R., Kołodziej, J., Streit, A., Georgakopoulos, D.: A security framework in g-hadoop for bigdata computing across distributed cloud data centres. J. Comput. Syst. Sci. 80, 994–1007 (2014)
Zoumpatianos, K., Idreos, S., Palpanas, T.: Indexing for interactive exploration of big data series. In: ACM SIGMOD international conference on management of data, pp. 1555–1566 (2014)
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China (Nos. 61272314, 61361120098, 61440018) the Program for New Century Excellent Talents in University (NCET-11-0722), the Excellent Youth Foundation of Hubei Scientific Committee (No. 2012FFA025), the China Postdoctoral Science Foundation (2014M552112), the Fundamental Research Funds for the National University, China University of Geosciences (Wuhan) (Nos. CUG120114, CUG130617, 1410491B17), Beijing Microelectronics Technology Institute under the University Research Programme (No. BM-KJ-FK-WX-20130731-0013), the Hubei Natural Science Foundation (No. 2014CF- B904).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Deng, Z., Hu, Y., Zhu, M. et al. A scalable and fast OPTICS for clustering trajectory big data. Cluster Comput 18, 549–562 (2015). https://doi.org/10.1007/s10586-014-0413-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-014-0413-9