Skip to main content
Log in

Enhancing multistep-ahead bike-sharing demand prediction with a two-stage online learning-based time-series model: insight from Seoul

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Bike-sharing is a powerful solution to urban challenges (e.g., expanding bike communities, lowering transportation costs, alleviating traffic congestion, reducing emissions, and enhancing health). Accurately predicting bike-sharing demand not only ensures the system meets community needs but also optimizes resource allocation, reduces operational costs, and enhances the user experience, thereby increasing the system's sustainability and city-wide benefits. However, prediction is complicated in low-computing environments with insufficient data due to privacy regulations or policy constraints. This study proposes an online learning-based two-stage forecasting model based on a low-computing environment with insufficient data for robust, fast multistep-ahead prediction for bike-sharing demand in Seoul. The model was applied with exploratory data analysis (EDA) to the Seoul Bike-sharing Demand dataset, split into insufficient training and sufficient testing sets. First, we generated prediction values for the random forest, extreme gradient boosting, and Cubist methods in training and testing. Second, we used the Ranger package trained with external factors and prediction values using time-series cross-validation for multistep-ahead prediction 1 h to 1 day later. We compared the model performance with that of 23 machine and deep learning models to verify its superiority. Using interpretability methods and EDA, we reported the relationships between external factors and bike-sharing demand.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Availability of data and materials

The data that support the findings of this study are openly available from the UCI Machine Learning Repository: Seoul Bike Sharing Demand Dataset at https://archive.ics.uci.edu/ml/datasets/Seoul+Bike+Sharing+Demand. These data were derived from the following resources available in the public domain: https://data.seoul.go.kr/index.do (Seoul bike sharing demand), https://data.kma.go.kr/resources/html/en/aowdp.html (weather data), and https://publicholidays.co.kr/ (holiday information).

References

  1. Zhang L, Zhang J, Duan ZY, Bryde D (2015) Sustainable bike-sharing systems: characteristics and commonalities across cases in urban China. J Clean Prod. https://doi.org/10.1016/j.jclepro.2014.04.006

    Article  Google Scholar 

  2. Sun S, Ertz M (2021) Contribution of bike-sharing to urban resource conservation: the case of free-floating bike-sharing. J Clean Prod. https://doi.org/10.1016/j.jclepro.2020.124416

    Article  Google Scholar 

  3. Yang Y, Heppenstall A, Turner A, Comber A (2019) A spatiotemporal and graph-based analysis of dockless bike sharing patterns to understand urban flows over the last mile. Comput Environ Urban Syst. https://doi.org/10.1016/j.compenvurbsys.2019.101361

    Article  Google Scholar 

  4. Shaheen SA, Guzman S, Zhang H (2010) Bikesharing in Europe, the Americas, and Asia: past, present, and future. Transport Res Rec. https://doi.org/10.3141/2143-20

    Article  Google Scholar 

  5. Raviv T, Kolka O (2013) Optimal inventory management of a bike-sharing station. IIE Trans. https://doi.org/10.1080/0740817X.2013.770186

    Article  Google Scholar 

  6. Buck D, Buehler R, Happ P, Rawls B, Chung P, Borecki N (2013) Are bikeshare users different from regular cyclists?: A first look at short-term users, annual members, and area cyclists in the Washington, D.C., region. Transport Res Rec. https://doi.org/10.3141/2387-13

  7. Macioszek E, Świerk P, Kurek A (2020) The bike-sharing system as an element of enhancing sustainable mobility—a case study based on a city in Poland. Sustainability. https://doi.org/10.3390/su12083285

    Article  Google Scholar 

  8. García-Palomares JC, Gutiérrez J, Latorre M (2012) Optimizing the location of stations in bike-sharing programs: a GIS approach. Appl Geogr. https://doi.org/10.1016/j.apgeog.2012.07.002

    Article  Google Scholar 

  9. Zhu R, Zhang X, Kondor D, Santi P, Ratti C (2020) Understanding spatio-temporal heterogeneity of bike-sharing and scooter-sharing mobility. Comput Environ Urban Syst. https://doi.org/10.1016/j.compenvurbsys.2020.101483

    Article  Google Scholar 

  10. Yang Y, Heppenstall A, Turner A, Comber A (2020) Using graph structural information about flows to enhance short-term demand prediction in bike-sharing systems. Comput Environ Urban Syst. https://doi.org/10.1016/j.compenvurbsys.2020.101521

    Article  Google Scholar 

  11. Song J, Zhang L, Qin Z, Ramli MA (2021) A spatiotemporal dynamic analyses approach for dockless bike-share system. Comput Environ Urban Syst. https://doi.org/10.1016/j.compenvurbsys.2020.101566

    Article  Google Scholar 

  12. Kim EJ, Kim J, Kim H (2020) Does environmental walkability matter? The role of walkable environment in active commuting. Int J Environ Res Public Health. https://doi.org/10.3390/ijerph17041261

  13. The Korea Bizwire (2020) Seoul City introduces sturdier models of public bike. http://koreabizwire.com/seoul-city-introduces-sturdier-models-of-public-bike/173869/. Accessed 1 April 2022

  14. Hua M, Chen X, Zheng S, Cheng L, Chen J (2020) Estimating the parking demand of free-floating bike sharing: A journey-data-based study of Nanjing, China. J Clean Prod. https://doi.org/10.1016/j.jclepro.2019.118764

  15. Moon J, Park S, Rho S, Hwang E (2022) Interpretable short-term electrical load forecasting scheme using Cubist. Comput Intell Neurosci. https://doi.org/10.1155/2022/6892995

    Article  Google Scholar 

  16. Moon J, Jung S, Rew J, Rho S, Hwang E (2020) Combination of short-term load forecasting models based on a stacking ensemble approach. Energy Build. https://doi.org/10.1016/j.enbuild.2020.109921

    Article  Google Scholar 

  17. Moon J, Park S, Rho S, Hwang E (2022) Robust building energy consumption forecasting using an online learning approach with R ranger. J Build Eng. https://doi.org/10.1016/j.jobe.2021.103851

    Article  Google Scholar 

  18. So D, Oh J, Leem S, Ha H, Moon J (2023) A hybrid ensemble model for solar irradiance forecasting: advancing digital models for smart island realization. Electronics. https://doi.org/10.3390/electronics12122607

    Article  Google Scholar 

  19. Wright MN, Ziegler A (2017) Ranger: A fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw. https://doi.org/10.18637/jss.v077.i01

  20. Zhou J, Li E, Wei H, Li C, Qiao Q, Armaghani DJ (2019) Random forests and Cubist algorithms for predicting shear strengths of rockfill materials. Appl Sci. https://doi.org/10.3390/app9081621

    Article  Google Scholar 

  21. Haggag M, Yosri A, El-Dakhakhni W, Hassini E (2022) Interpretable data-driven model for climate-induced disaster damage prediction: the first step in community resilience planning. Int J Disaster Risk Reduct. https://doi.org/10.1016/j.ijdrr.2022.102884

    Article  Google Scholar 

  22. Sathishkumar VE, Cho Y (2020) A rule-based model for Seoul bike sharing demand prediction using weather data. Eur J Remote Sens. https://doi.org/10.1080/22797254.2020.1725789

    Article  Google Scholar 

  23. Sathishkumar VE, Park J, Cho Y (2020) Using data mining techniques for bike sharing demand prediction in metropolitan city. Comput Commun. https://doi.org/10.1016/j.comcom.2020.02.007

    Article  Google Scholar 

  24. Sathishkumar VE, Cho Y (2020) Seoul Bike Sharing Demand Data Set. UCI Machine Learning Repository. https://archive.ics.uci.edu/ml/datasets/Seoul+Bike+Sharing+Demand

  25. Potgieter PH (2020) Machine learning and forecasting: A review. In: Alleman J, Rappoport P, Hamoudia M (Eds) Applied economics in the digital era, pp 193–207. Palgrave Macmillan. https://doi.org/10.1007/978-3-030-40601-1_8

  26. Taieb SB, Bontempi G, Atiya AF, Sorjamaa A (2012) A review and comparison of strategies for multi-step ahead time series forecasting based on the NN5 forecasting competition. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2012.01.039

    Article  Google Scholar 

  27. Yang BS, Tan ACC (2009) Multi-step ahead direct prediction for the machine condition prognosis using regression trees and neuro-fuzzy systems. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2009.01.007

    Article  Google Scholar 

  28. Schmidt J, Marques MR, Botti S, Marques MA (2019) Recent advances and applications of machine learning in solid-state materials science. NPJ Comput Mater. https://doi.org/10.1038/s41524-019-0221-0

    Article  Google Scholar 

  29. Collini E, Nesi P, Pantaleo G (2021) Deep learning for short-term prediction of available bikes on bike-sharing stations. IEEE Access. https://doi.org/10.1109/ACCESS.2021.3110794

    Article  Google Scholar 

  30. Zi W, Xiong W, Chen H, Chen L (2021) TAGCN: Station-level demand prediction for bike-sharing system via a temporal attention graph convolution network. Inf Sci. https://doi.org/10.1016/j.ins.2021.01.065

    Article  MathSciNet  Google Scholar 

  31. Mehdizadeh Dastjerdi A, Morency C (2022) Bike-sharing demand prediction at community level under COVID-19 using deep learning. Sensors. https://doi.org/10.3390/s22031060

    Article  Google Scholar 

  32. Ding H, Lu Y, Sze NN, Li H (2022) Effect of dockless bike-sharing scheme on the demand for London Cycle Hire at the disaggregate level using a deep learning approach. Transport Res Part A: Policy Practice. https://doi.org/10.1016/j.tra.2022.10.013

  33. Zhao S, Zhao K, Xia Y, Jia W (2022) Hyper-clustering enhanced spatio-temporal deep learning for traffic and demand prediction in bike-sharing systems. Inf Sci. https://doi.org/10.1016/j.ins.2022.07.054

    Article  Google Scholar 

  34. Gammelli D, Wang Y, Prak D, Rodrigues F, Minner S, Pereira FC (2022) Predictive and prescriptive performance of bike-sharing demand forecasts for inventory management. Transport Res Part C: Emerg Technol. https://doi.org/10.1016/j.trc.2022.103571

  35. Lim H, Chung K, Lee S (2022) Probabilistic forecasting for demand of a bike-sharing service using a deep-learning approach. Sustainability. https://doi.org/10.3390/su142315889

    Article  Google Scholar 

  36. Ma X, Yin Y, Jin Y, He M, Zhu M (2022) Short-term prediction of bike-sharing demand using multi-source data: a spatial-temporal graph attentional LSTM approach. Appl Sci. https://doi.org/10.3390/app12031161

    Article  Google Scholar 

  37. Lee SH, Ku HC (2022) A dual attention-based recurrent neural network for short-term bike sharing usage demand prediction. IEEE Trans Intell Transport Syst. https://doi.org/10.1109/TITS.2022.3208087

    Article  Google Scholar 

  38. Harikrishnakumar R, Nannapaneni S (2023) Forecasting bike sharing demand using quantum Bayesian network. Expert Syst. https://doi.org/10.1016/j.eswa.2023.119749

    Article  Google Scholar 

  39. Li X, Xu Y, Zhang X, Shi W, Yue Y, Li Q (2023) Improving short-term bike sharing demand forecast through an irregular convolutional neural network. Transport Res Part C: Emerg Technol. https://doi.org/10.1016/j.trc.2022.103984

  40. Kim K (2023) Discovering spatiotemporal usage patterns of a bike-sharing system by type of pass: a case study from Seoul. Transportation. https://doi.org/10.1007/s11116-023-10371-7

    Article  Google Scholar 

  41. Choi SJ, Jiao J, Lee HK, Farahi A (2023) Combatting the mismatch: Modeling bike-sharing rental and return machine learning classification forecast in Seoul, South Korea. J Transport Geogr. https://doi.org/10.1016/j.jtrangeo.2023.103587

  42. Lee J, Jeong J, Jung S, Moon J, Rho S (2022) Verification of de-identification techniques for personal information using tree-based methods with Shapley values. J Personalized Med. https://doi.org/10.3390/jpm12020190

    Article  Google Scholar 

  43. Altman N, Krzywinski M (2017) Ensemble methods: bagging and random forests. Nat Methods. https://doi.org/10.1038/nmeth.4438

    Article  Google Scholar 

  44. Oshiro TM, Perez PS, Baranauskas JA (2012) How many trees in a random forest? In: Perner P (Eds) Machine learning and data mining in pattern recognition, pp 154–168. Springer, Cham. https://doi.org/10.1007/978-3-642-31537-4_13

  45. Kuhn M, Johnson K (2013) Applied predictive modeling. Springer, Cham

    Book  Google Scholar 

  46. Vartholomaios A (2019) A machine learning approach to modelling solar irradiation of urban and terrain 3D models. Comput Environ Urban Syst. https://doi.org/10.1016/j.compenvurbsys.2019.101387

    Article  Google Scholar 

  47. Feng C, Jiao J (2021) Predicting and mapping neighborhood-scale health outcomes: a machine learning approach. Comput Environ Urban Syst. https://doi.org/10.1016/j.compenvurbsys.2020.101562

    Article  Google Scholar 

  48. Chen T, Guestrin C (2016) XGBoost: A scalable tree boosting system. Oral session presentation at the 22nd ACM SIGKDD Int Conf Knowledge Discovery and Data Mining. San Francisco, CA. https://doi.org/10.1145/2939672.2939785

  49. Kuhn M, Weston S, Keefer C, Coulter N (2012) Cubist models for regression. https://cran.r-project.org/web/packages/Cubist/. Accessed 1 April 2022

  50. Divina F, Gilson A, Goméz-Vela F, García Torres M, Torres JF (2018) Stacking ensemble learning for short-term electricity consumption forecasting. Energies. https://doi.org/10.3390/en11040949

    Article  Google Scholar 

  51. Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2(3):18–22

    Google Scholar 

  52. Hyndman RJ, Athanasopoulos G (2018) Forecasting: principles and practice. OTexts

  53. Akiba T, Sano S, Yanase T, Ohta T, Koyama M (2019) Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp 2623–2631. https://doi.org/10.1145/3292500.3330701

  54. Jang J, Jeong W, Kim S, Lee B, Lee M, Moon J (2023) RAID: Robust and interpretable daily peak load forecasting via multiple deep neural networks and Shapley values. Sustainability. https://doi.org/10.3390/su15086951

    Article  Google Scholar 

  55. Kuhn M (2008) Building predictive models in R using the caret package. J Stat Softw. https://doi.org/10.18637/jss.v028.i05

  56. Chen T, He T (2017). XGBoost: Extreme gradient boosting. https://cran.r-project.org/web/packages/xgboost/. Accessed 1 April 2022

  57. Malshe A (2019) Data analytics applications. https://ashgreat.github.io/analyticsAppBook/xgboost. Accessed 1 April 2022

  58. Rahman R, Otridge J, Pal R (2017) IntegratedMRF: Random forest-based framework for integrating prediction from different data types. Bioinformatics. https://doi.org/10.1093/bioinformatics/btw765

    Article  Google Scholar 

  59. Jung S, Moon J, Park S, Hwang E (2021) An attention-based multilayer GRU model for multistep-ahead short-term load forecasting. Sensors. https://doi.org/10.3390/s21051639

    Article  Google Scholar 

  60. Moon J, Han Y, Chang H, Rho S (2022) Multistep-ahead solar irradiance forecasting for smart cities Based on LSTM, Bi-LSTM, and GRU neural networks. J Soc e-Bus Stud 27(4):27–52

    Article  Google Scholar 

  61. Molnar C (2020) Interpretable Machine Learning. Lulu.com

Download references

Funding

This research was supported by a grant (2021-MOIS37-004) from the Intelligent Technology Development Program on Disaster Response and Emergency Management funded by the Ministry of Interior and Safety (MOIS, Korea) and the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2023–2018-0–01799) supervised by the IITP (Institute for Information & communications Technology Planning & Evaluation). This research was also supported by the Soonchunhyang University Research Fund.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: SL; Methodology: SL; Writing—original draft: SL; Data curation: JO; Visualization: JO; Validation: JO; Software: JM; Formal analysis: JM; Writing—review and editing: JM; Project administration: MK; Funding acquisition: MK; Investigation: SR; Resources: SR; Supervision: SR.

Corresponding authors

Correspondence to Jihoon Moon or Seungmin Rho.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix 1: Comparative analysis of run times for DT-based EL models

Appendix 1: Comparative analysis of run times for DT-based EL models

The run times of the DT-based EL models with optimal hyperparameter values for the training set are presented in Fig. 17. The XGBoost model outperformed Cubist in terms of training time. The training times for the RF model (without parallel processing) and Cubist were the quickest and slowest, respectively. Therefore, the RF implemented using the Ranger package is suitable for online learning, even in a computing environment with limited performance. We applied fivefold cross-validation with optimal hyperparameter values to generate RF, XGBoost, and Cubist prediction values for the training set. Then, we performed bike-sharing demand prediction on the testing set using the RF, XGBoost, and Cubist models trained with optimal hyperparameters to generate input variables.

Fig. 17
figure 17

Run times for each model

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Leem, S., Oh, J., Moon, J. et al. Enhancing multistep-ahead bike-sharing demand prediction with a two-stage online learning-based time-series model: insight from Seoul. J Supercomput 80, 4049–4082 (2024). https://doi.org/10.1007/s11227-023-05593-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-023-05593-6

Keywords

Navigation