Abstract
It is challenging for complex urban transportation networks to recommend taxi waiting spots for mobile passengers because the traditional centralized mining platform cannot address the storage and calculation problems of GPS trajectory big data, and especially the boundary identification of DBSCAN is difficult on the Spark parallel processing framework. To this end, we propose a parallel DBSCAN optimization algorithm with the silhouette coefficient and the pickup rate on Spark in this paper, named SP-DBSCAN, where users merely input one parameter to complete the distributed recommendation of the best waiting spot. Specifically, under the Hadoop distributed computing platform, a general framework of distributed modeling for waiting spot recommendation on Spark is developed to solve the distributed storage and parallel computing issues of the serial algorithm in handling data partition and clustering of large-scale traffic data on a single machine. Moreover, we put forward a parallel SP-DBSCAN algorithm on Spark to recommend the best waiting spot for passengers, where the traditional DBSCAN algorithm is optimized via the silhouette coefficient and the boarding ratio to address the parameter sensitive problem and the issue of the center of the non-convex clustering graph is solved by giving one cluster with two centroids in the clustering hotspot areas. Finally, experimental results on four groups of real-world taxi GPS trajectory data sets demonstrate that compared with C-DBSCAN and P-DBSCAN, the recognition rate of SP-DBSCAN is increased by 1.6%, 6.2%, 3.47%, and 5.8%, respectively. The empirical study indicates that the clustering region generated by our SP-DBSCAN algorithm can satisfy the requirements that passengers can ride in the hotspot area when they have not successfully hitchhiked at a specific location and turned to the next spot randomly.
Similar content being viewed by others
References
Akbari Z, Unland R (2016) Automated determination of the input parameter of DBSCAN based on outlier detection. In: International conference on artificial intelligence applications and innovations. Springer, pp 280–291
Alshammari H, Lee J, Bajwa H (2016) H2Hadoop: improving hadoop performance using the metadata of related jobs. IEEE Transact Cloud Comput 6:1031–1040
Asadianfam S, Shamsi M, Kenari AR (2020) Big data platform of traffic violation detection system: identifying the risky behaviors of vehicle drivers. Multimed Tools Appl 79:24645–24684
Chen C, Zhang D, Li N, Zhou Z-H (2014) B-Planner: planning bidirectional night bus routes using large-scale taxi GPS traces. IEEE Trans Intell Transp Sys 15:1451–1465
Chmiel W, Danda J, Dziech A, Ernst S, Kadluczka P, Mikrut Z, Pawlik P, Szwed P, Wojnicki I (2016) INSIGMA: an intelligent transportation system for urban mobility enhancement. Multimed Tools Appl 75:10529–10560
Farajzadeh N, Karamiani A, Hashemzadeh M (2018) A fast and accurate moving object tracker in active camera model. Multimed Tools Appl 77:6775–6797
Han D, Agrawal A, Liao WK, Choudhary A (2018) Parallel DBSCAN algorithm using a data partitioning strategy with Spark implementation. In: 2018 IEEE International conference on big data (Big Data). IEEE, pp 305–312
Han D, Agrawal A, Liao WK, Choudhary A (2016) A novel scalable DBSCAN algorithm with spark. In: 2016 IEEE international parallel and distributed processing symposium eorkshops (IPDPSW). IEEE, pp 1393–1402
He Y, Tan H, Luo W, Feng S, Fan J (2014) MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data, Frontiers of Computer. Science 8:83–99
Heidari S, Alborzi M, Radfar R, Afsharkazemi MA, Ghatari AR (2019) Big data clustering with varied density based on MapReduce. J Big Data 6:77
Hou J, Zhang B (2018) Cluster merging based on a decision threshold. Neural Comput Appl 30:99–110
Hou Q, Zhang X, Li B, Zhang X, Wang W (2019) Identification of low-carbon travel block based on GIS hotspot analysis using spatial distribution learning algorithm. Neural Comput Appl 31:4703–4713
Hu H, Zhang G, Gao W, Wang M (2020) Big data analytics for MOOC video watching behavior based on Spark. Neural Comput Appl 32:6481–6489
Huang F, Zhu Q, Zhou J, Tao J, Zhou X, Jin D, Tan X, Wang L (2017) Research on the parallelization of the DBSCAN clustering algorithm for spatial data mining based on the Spark platform. Remote Sens 9:1301
Jiang X, Adeli H (2005) Dynamic wavelet neural network model for traffic flow forecasting. J Transp Eng 131:771–779
Lai W, Zhou M, Hu F, Bian K, Song Q (2019) A new DBSCAN parameters determination method based on improved MVO. IEEE Access 7:104085–104095
Lei X, Ding Y, Wu FX (2016) Detecting protein complexes from DPINs by density based clustering with Pigeon-inspired optimization algorithm. Sci China Info Sci 59:070103
Li Y, Chen D (2016) A learning-based comprehensive evaluation model for traffic data quality in intelligent transportation systems. Multimed Tools Appl 75:1–16
Li L, Xiong Z, Dai Q, Zha Y, Zhang Y, Dan J (2020) A novel graph-based clustering method using noise cutting. Info Syst 91:101504
Liu P, Wang R, Ding J, Yin X (2017) Performance modeling and evaluating workflow of ITS: real-time positioning and route planning. Multimed Tools Appl 77:10867–10881
Luo G, Luo X, Gooch TF, Tian L, Qin K (2016) A parallel DBSCAN algorithm based on spark. In: 2016 IEEE international conferences on big data and cloud computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom)(BDCloud-SocialCom-SustainCom). IEEE, pp 548–553
Marinakis V, Doukas H, Tsapelas J, Mouzakitis S, Sicilia Á, Madrazo L, Sgouridis S (2020) From big data to smart energy services: an application for intelligent energy management. Future Gener Comput Syst 110:572–586
Miao F, Han S, Lin S, Stankovic JA, Zhang D, Munir S, Huang H, He T, Pappas GJ (2016) Taxi dispatch with real-time sensing data in metropolitan areas: A receding horizon control approach. IEEE Trans Autom Sci Eng 13:463–478
Peixoto DA, Nguyen HQV, Zheng B, Zhou X (2019) A framework for parallel map-matching at scale using Spark. Distributed and Parallel Databases 37:697–720
Qiu Z, Li H, Hong S, Lin Y, Fan N, Ou G, Wang T, Fan L (2014) Finding vacant taxis using large scale GPS traces. In: International conference on web-age information management. Springer, pp 793–804
Qu Z, Wang X, Song X, Pan Z, Li H (2019) Location optimization for urban taxi stands based on taxi GPS trajectory big data. IEEE Access 7:62273–62283
Rafi M, Mukhopadhyay S (2019) Salient object detection employing regional principal color and texture cues. Multimed Tools Appl 78:19735–19751
Rong H, Zhang X, Liu Q, Yang Q, Gu J (2018) A Mokov decision process approach to optimizing waiting for taxis. In: 2018 IEEE SmartWorld, ubiquitous intelligence & computing, advanced & trusted computing, scalable computing & communications, cloud & big data computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI). IEEE, pp 1346–1351
Rong H, Zhang X, Li Z, Ai Z (2020) Waiting or moving? A crossroad network-based Markov decision process approach to catch vacant taxis. IEEE Access 8:10528–10542
Segatori A, Marcelloni F, Pedrycz W (2017) On distributed fuzzy decision trees for big data. IEEE Trans Fuzzy Syst 26:174–192
Starczewski A, Cader A (2019) Determining the Eps parameter of the DBSCAN algorithm. In: International conference on artificial intelligence and soft computing. Springer, pp 420–430
Sun S, Xu X (2010) Variational inference for infinite mixtures of Gaussian processes with applications to traffic flow prediction. IEEE Trans Intell Transpo Syst 12:466–475
Wang W, Tao L, Gao C, Wang B, Yang H, Zhang Z (2014) A C-DBSCAN algorithm for determining bus-stop locations based on taxi GPS data. In: International conference on advanced data mining and applications. Springer, pp 293–304
Wang H, Belhassena A (2017) Parallel trajectory search based on distributed index. Info Sci 388:62–83
Wang L, Zhang Y, Zhao X, Liu H, Zhang K (2019) Irregular travel groups detection based on cascade clustering in urban subway. IEEE Trans Intell Transpo Syst 21:2216–2225
Wang C, Gong L, Li X, Zhou X (2020) A ubiquitous machine learning accelerator with automatic parallelization on FPGA. IEEE Trans Parallel Distrib Syst 31:2346–2359
Xia D, Wang B, Li H, Li Y, Zhang Z (2016) A distributed spatial-temporal weighted model on MapReduce for short-term traffic flow forecasting. Neurocomputing 179:246–263
Yuan NJ, Zheng Y, Zhang L, Xie X (2012) T-Finder: a recommender system for finding passengers and vacant taxis. IEEE Trans Knowl Data Eng 25:2390–2403
Zhang Y, Feng D, Zhang R, Geng N (2017) Multi-stage optimization of taxi service stations location using GPS data. In: 2017 IEEE 2nd international conference on big data analysis (ICBDA). IEEE, pp 316–322
Zhang J, Li X, Nie W, Su Y (2017) Automatic report generation based on multi-modal information. Multimed Tools Appl 76:12005–12015
Zheng X, Liang X, Xu K (2012) Where to wait for a taxi?. In: Proceedings of the ACM SIGKDD international workshop on urban computing, pp 149–156
Acknowledgements
This work described in this paper was supported in part by the National Natural Science Foundation of China (Grant nos. 61762020, 61773321, 62162012, 62173278, and 62072061), the Science and Technology Talents Fund for Excellent Young of Guizhou, China (Grant no. QKHPTRC20195669), the Science and Technology Support Program of Guizhou, China (Grant no. QKHZC2021YB531), and the Scientific Research Platform Project of Guizhou Minzu University (Grant no. GZMUSYS[2021]04). The authors would like to thank Datatang (Beijing) Technology Co., Ltd. for providing the experimental data.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflicts of interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Xia, D., Bai, Y., Zheng, Y. et al. A parallel SP-DBSCAN algorithm on spark for waiting spot recommendation. Multimed Tools Appl 81, 4015–4038 (2022). https://doi.org/10.1007/s11042-021-11639-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-11639-9