Skip to main content
Log in

A parallel SP-DBSCAN algorithm on spark for waiting spot recommendation

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

It is challenging for complex urban transportation networks to recommend taxi waiting spots for mobile passengers because the traditional centralized mining platform cannot address the storage and calculation problems of GPS trajectory big data, and especially the boundary identification of DBSCAN is difficult on the Spark parallel processing framework. To this end, we propose a parallel DBSCAN optimization algorithm with the silhouette coefficient and the pickup rate on Spark in this paper, named SP-DBSCAN, where users merely input one parameter to complete the distributed recommendation of the best waiting spot. Specifically, under the Hadoop distributed computing platform, a general framework of distributed modeling for waiting spot recommendation on Spark is developed to solve the distributed storage and parallel computing issues of the serial algorithm in handling data partition and clustering of large-scale traffic data on a single machine. Moreover, we put forward a parallel SP-DBSCAN algorithm on Spark to recommend the best waiting spot for passengers, where the traditional DBSCAN algorithm is optimized via the silhouette coefficient and the boarding ratio to address the parameter sensitive problem and the issue of the center of the non-convex clustering graph is solved by giving one cluster with two centroids in the clustering hotspot areas. Finally, experimental results on four groups of real-world taxi GPS trajectory data sets demonstrate that compared with C-DBSCAN and P-DBSCAN, the recognition rate of SP-DBSCAN is increased by 1.6%, 6.2%, 3.47%, and 5.8%, respectively. The empirical study indicates that the clustering region generated by our SP-DBSCAN algorithm can satisfy the requirements that passengers can ride in the hotspot area when they have not successfully hitchhiked at a specific location and turned to the next spot randomly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Akbari Z, Unland R (2016) Automated determination of the input parameter of DBSCAN based on outlier detection. In: International conference on artificial intelligence applications and innovations. Springer, pp 280–291

  2. Alshammari H, Lee J, Bajwa H (2016) H2Hadoop: improving hadoop performance using the metadata of related jobs. IEEE Transact Cloud Comput 6:1031–1040

    Article  Google Scholar 

  3. Asadianfam S, Shamsi M, Kenari AR (2020) Big data platform of traffic violation detection system: identifying the risky behaviors of vehicle drivers. Multimed Tools Appl 79:24645–24684

    Article  Google Scholar 

  4. Chen C, Zhang D, Li N, Zhou Z-H (2014) B-Planner: planning bidirectional night bus routes using large-scale taxi GPS traces. IEEE Trans Intell Transp Sys 15:1451–1465

    Article  Google Scholar 

  5. Chmiel W, Danda J, Dziech A, Ernst S, Kadluczka P, Mikrut Z, Pawlik P, Szwed P, Wojnicki I (2016) INSIGMA: an intelligent transportation system for urban mobility enhancement. Multimed Tools Appl 75:10529–10560

    Article  Google Scholar 

  6. Farajzadeh N, Karamiani A, Hashemzadeh M (2018) A fast and accurate moving object tracker in active camera model. Multimed Tools Appl 77:6775–6797

    Article  Google Scholar 

  7. Han D, Agrawal A, Liao WK, Choudhary A (2018) Parallel DBSCAN algorithm using a data partitioning strategy with Spark implementation. In: 2018 IEEE International conference on big data (Big Data). IEEE, pp 305–312

  8. Han D, Agrawal A, Liao WK, Choudhary A (2016) A novel scalable DBSCAN algorithm with spark. In: 2016 IEEE international parallel and distributed processing symposium eorkshops (IPDPSW). IEEE, pp 1393–1402

  9. He Y, Tan H, Luo W, Feng S, Fan J (2014) MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data, Frontiers of Computer. Science 8:83–99

    MathSciNet  Google Scholar 

  10. Heidari S, Alborzi M, Radfar R, Afsharkazemi MA, Ghatari AR (2019) Big data clustering with varied density based on MapReduce. J Big Data 6:77

    Article  Google Scholar 

  11. Hou J, Zhang B (2018) Cluster merging based on a decision threshold. Neural Comput Appl 30:99–110

    Article  Google Scholar 

  12. Hou Q, Zhang X, Li B, Zhang X, Wang W (2019) Identification of low-carbon travel block based on GIS hotspot analysis using spatial distribution learning algorithm. Neural Comput Appl 31:4703–4713

    Article  Google Scholar 

  13. Hu H, Zhang G, Gao W, Wang M (2020) Big data analytics for MOOC video watching behavior based on Spark. Neural Comput Appl 32:6481–6489

    Article  Google Scholar 

  14. Huang F, Zhu Q, Zhou J, Tao J, Zhou X, Jin D, Tan X, Wang L (2017) Research on the parallelization of the DBSCAN clustering algorithm for spatial data mining based on the Spark platform. Remote Sens 9:1301

    Article  Google Scholar 

  15. Jiang X, Adeli H (2005) Dynamic wavelet neural network model for traffic flow forecasting. J Transp Eng 131:771–779

    Article  Google Scholar 

  16. Lai W, Zhou M, Hu F, Bian K, Song Q (2019) A new DBSCAN parameters determination method based on improved MVO. IEEE Access 7:104085–104095

    Article  Google Scholar 

  17. Lei X, Ding Y, Wu FX (2016) Detecting protein complexes from DPINs by density based clustering with Pigeon-inspired optimization algorithm. Sci China Info Sci 59:070103

    Article  Google Scholar 

  18. Li Y, Chen D (2016) A learning-based comprehensive evaluation model for traffic data quality in intelligent transportation systems. Multimed Tools Appl 75:1–16

    Article  Google Scholar 

  19. Li L, Xiong Z, Dai Q, Zha Y, Zhang Y, Dan J (2020) A novel graph-based clustering method using noise cutting. Info Syst 91:101504

    Article  Google Scholar 

  20. Liu P, Wang R, Ding J, Yin X (2017) Performance modeling and evaluating workflow of ITS: real-time positioning and route planning. Multimed Tools Appl 77:10867–10881

    Article  Google Scholar 

  21. Luo G, Luo X, Gooch TF, Tian L, Qin K (2016) A parallel DBSCAN algorithm based on spark. In: 2016 IEEE international conferences on big data and cloud computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom)(BDCloud-SocialCom-SustainCom). IEEE, pp 548–553

  22. Marinakis V, Doukas H, Tsapelas J, Mouzakitis S, Sicilia Á, Madrazo L, Sgouridis S (2020) From big data to smart energy services: an application for intelligent energy management. Future Gener Comput Syst 110:572–586

    Article  Google Scholar 

  23. Miao F, Han S, Lin S, Stankovic JA, Zhang D, Munir S, Huang H, He T, Pappas GJ (2016) Taxi dispatch with real-time sensing data in metropolitan areas: A receding horizon control approach. IEEE Trans Autom Sci Eng 13:463–478

    Article  Google Scholar 

  24. Peixoto DA, Nguyen HQV, Zheng B, Zhou X (2019) A framework for parallel map-matching at scale using Spark. Distributed and Parallel Databases 37:697–720

    Article  Google Scholar 

  25. Qiu Z, Li H, Hong S, Lin Y, Fan N, Ou G, Wang T, Fan L (2014) Finding vacant taxis using large scale GPS traces. In: International conference on web-age information management. Springer, pp 793–804

  26. Qu Z, Wang X, Song X, Pan Z, Li H (2019) Location optimization for urban taxi stands based on taxi GPS trajectory big data. IEEE Access 7:62273–62283

    Article  Google Scholar 

  27. Rafi M, Mukhopadhyay S (2019) Salient object detection employing regional principal color and texture cues. Multimed Tools Appl 78:19735–19751

    Article  Google Scholar 

  28. Rong H, Zhang X, Liu Q, Yang Q, Gu J (2018) A Mokov decision process approach to optimizing waiting for taxis. In: 2018 IEEE SmartWorld, ubiquitous intelligence & computing, advanced & trusted computing, scalable computing & communications, cloud & big data computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI). IEEE, pp 1346–1351

  29. Rong H, Zhang X, Li Z, Ai Z (2020) Waiting or moving? A crossroad network-based Markov decision process approach to catch vacant taxis. IEEE Access 8:10528–10542

    Article  Google Scholar 

  30. Segatori A, Marcelloni F, Pedrycz W (2017) On distributed fuzzy decision trees for big data. IEEE Trans Fuzzy Syst 26:174–192

    Article  Google Scholar 

  31. Starczewski A, Cader A (2019) Determining the Eps parameter of the DBSCAN algorithm. In: International conference on artificial intelligence and soft computing. Springer, pp 420–430

  32. Sun S, Xu X (2010) Variational inference for infinite mixtures of Gaussian processes with applications to traffic flow prediction. IEEE Trans Intell Transpo Syst 12:466–475

    Article  Google Scholar 

  33. Wang W, Tao L, Gao C, Wang B, Yang H, Zhang Z (2014) A C-DBSCAN algorithm for determining bus-stop locations based on taxi GPS data. In: International conference on advanced data mining and applications. Springer, pp 293–304

  34. Wang H, Belhassena A (2017) Parallel trajectory search based on distributed index. Info Sci 388:62–83

    Article  Google Scholar 

  35. Wang L, Zhang Y, Zhao X, Liu H, Zhang K (2019) Irregular travel groups detection based on cascade clustering in urban subway. IEEE Trans Intell Transpo Syst 21:2216–2225

    Article  Google Scholar 

  36. Wang C, Gong L, Li X, Zhou X (2020) A ubiquitous machine learning accelerator with automatic parallelization on FPGA. IEEE Trans Parallel Distrib Syst 31:2346–2359

    Article  Google Scholar 

  37. Xia D, Wang B, Li H, Li Y, Zhang Z (2016) A distributed spatial-temporal weighted model on MapReduce for short-term traffic flow forecasting. Neurocomputing 179:246–263

    Article  Google Scholar 

  38. Yuan NJ, Zheng Y, Zhang L, Xie X (2012) T-Finder: a recommender system for finding passengers and vacant taxis. IEEE Trans Knowl Data Eng 25:2390–2403

    Article  Google Scholar 

  39. Zhang Y, Feng D, Zhang R, Geng N (2017) Multi-stage optimization of taxi service stations location using GPS data. In: 2017 IEEE 2nd international conference on big data analysis (ICBDA). IEEE, pp 316–322

  40. Zhang J, Li X, Nie W, Su Y (2017) Automatic report generation based on multi-modal information. Multimed Tools Appl 76:12005–12015

    Article  Google Scholar 

  41. Zheng X, Liang X, Xu K (2012) Where to wait for a taxi?. In: Proceedings of the ACM SIGKDD international workshop on urban computing, pp 149–156

Download references

Acknowledgements

This work described in this paper was supported in part by the National Natural Science Foundation of China (Grant nos. 61762020, 61773321, 62162012, 62173278, and 62072061), the Science and Technology Talents Fund for Excellent Young of Guizhou, China (Grant no. QKHPTRC20195669), the Science and Technology Support Program of Guizhou, China (Grant no. QKHZC2021YB531), and the Scientific Research Platform Project of Guizhou Minzu University (Grant no. GZMUSYS[2021]04). The authors would like to thank Datatang (Beijing) Technology Co., Ltd. for providing the experimental data.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Dawen Xia or Huaqing Li.

Ethics declarations

Conflicts of interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xia, D., Bai, Y., Zheng, Y. et al. A parallel SP-DBSCAN algorithm on spark for waiting spot recommendation. Multimed Tools Appl 81, 4015–4038 (2022). https://doi.org/10.1007/s11042-021-11639-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-11639-9

Keywords

Navigation