Abstract
Bike-sharing is a powerful solution to urban challenges (e.g., expanding bike communities, lowering transportation costs, alleviating traffic congestion, reducing emissions, and enhancing health). Accurately predicting bike-sharing demand not only ensures the system meets community needs but also optimizes resource allocation, reduces operational costs, and enhances the user experience, thereby increasing the system's sustainability and city-wide benefits. However, prediction is complicated in low-computing environments with insufficient data due to privacy regulations or policy constraints. This study proposes an online learning-based two-stage forecasting model based on a low-computing environment with insufficient data for robust, fast multistep-ahead prediction for bike-sharing demand in Seoul. The model was applied with exploratory data analysis (EDA) to the Seoul Bike-sharing Demand dataset, split into insufficient training and sufficient testing sets. First, we generated prediction values for the random forest, extreme gradient boosting, and Cubist methods in training and testing. Second, we used the Ranger package trained with external factors and prediction values using time-series cross-validation for multistep-ahead prediction 1 h to 1 day later. We compared the model performance with that of 23 machine and deep learning models to verify its superiority. Using interpretability methods and EDA, we reported the relationships between external factors and bike-sharing demand.
Similar content being viewed by others
Availability of data and materials
The data that support the findings of this study are openly available from the UCI Machine Learning Repository: Seoul Bike Sharing Demand Dataset at https://archive.ics.uci.edu/ml/datasets/Seoul+Bike+Sharing+Demand. These data were derived from the following resources available in the public domain: https://data.seoul.go.kr/index.do (Seoul bike sharing demand), https://data.kma.go.kr/resources/html/en/aowdp.html (weather data), and https://publicholidays.co.kr/ (holiday information).
References
Zhang L, Zhang J, Duan ZY, Bryde D (2015) Sustainable bike-sharing systems: characteristics and commonalities across cases in urban China. J Clean Prod. https://doi.org/10.1016/j.jclepro.2014.04.006
Sun S, Ertz M (2021) Contribution of bike-sharing to urban resource conservation: the case of free-floating bike-sharing. J Clean Prod. https://doi.org/10.1016/j.jclepro.2020.124416
Yang Y, Heppenstall A, Turner A, Comber A (2019) A spatiotemporal and graph-based analysis of dockless bike sharing patterns to understand urban flows over the last mile. Comput Environ Urban Syst. https://doi.org/10.1016/j.compenvurbsys.2019.101361
Shaheen SA, Guzman S, Zhang H (2010) Bikesharing in Europe, the Americas, and Asia: past, present, and future. Transport Res Rec. https://doi.org/10.3141/2143-20
Raviv T, Kolka O (2013) Optimal inventory management of a bike-sharing station. IIE Trans. https://doi.org/10.1080/0740817X.2013.770186
Buck D, Buehler R, Happ P, Rawls B, Chung P, Borecki N (2013) Are bikeshare users different from regular cyclists?: A first look at short-term users, annual members, and area cyclists in the Washington, D.C., region. Transport Res Rec. https://doi.org/10.3141/2387-13
Macioszek E, Świerk P, Kurek A (2020) The bike-sharing system as an element of enhancing sustainable mobility—a case study based on a city in Poland. Sustainability. https://doi.org/10.3390/su12083285
García-Palomares JC, Gutiérrez J, Latorre M (2012) Optimizing the location of stations in bike-sharing programs: a GIS approach. Appl Geogr. https://doi.org/10.1016/j.apgeog.2012.07.002
Zhu R, Zhang X, Kondor D, Santi P, Ratti C (2020) Understanding spatio-temporal heterogeneity of bike-sharing and scooter-sharing mobility. Comput Environ Urban Syst. https://doi.org/10.1016/j.compenvurbsys.2020.101483
Yang Y, Heppenstall A, Turner A, Comber A (2020) Using graph structural information about flows to enhance short-term demand prediction in bike-sharing systems. Comput Environ Urban Syst. https://doi.org/10.1016/j.compenvurbsys.2020.101521
Song J, Zhang L, Qin Z, Ramli MA (2021) A spatiotemporal dynamic analyses approach for dockless bike-share system. Comput Environ Urban Syst. https://doi.org/10.1016/j.compenvurbsys.2020.101566
Kim EJ, Kim J, Kim H (2020) Does environmental walkability matter? The role of walkable environment in active commuting. Int J Environ Res Public Health. https://doi.org/10.3390/ijerph17041261
The Korea Bizwire (2020) Seoul City introduces sturdier models of public bike. http://koreabizwire.com/seoul-city-introduces-sturdier-models-of-public-bike/173869/. Accessed 1 April 2022
Hua M, Chen X, Zheng S, Cheng L, Chen J (2020) Estimating the parking demand of free-floating bike sharing: A journey-data-based study of Nanjing, China. J Clean Prod. https://doi.org/10.1016/j.jclepro.2019.118764
Moon J, Park S, Rho S, Hwang E (2022) Interpretable short-term electrical load forecasting scheme using Cubist. Comput Intell Neurosci. https://doi.org/10.1155/2022/6892995
Moon J, Jung S, Rew J, Rho S, Hwang E (2020) Combination of short-term load forecasting models based on a stacking ensemble approach. Energy Build. https://doi.org/10.1016/j.enbuild.2020.109921
Moon J, Park S, Rho S, Hwang E (2022) Robust building energy consumption forecasting using an online learning approach with R ranger. J Build Eng. https://doi.org/10.1016/j.jobe.2021.103851
So D, Oh J, Leem S, Ha H, Moon J (2023) A hybrid ensemble model for solar irradiance forecasting: advancing digital models for smart island realization. Electronics. https://doi.org/10.3390/electronics12122607
Wright MN, Ziegler A (2017) Ranger: A fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw. https://doi.org/10.18637/jss.v077.i01
Zhou J, Li E, Wei H, Li C, Qiao Q, Armaghani DJ (2019) Random forests and Cubist algorithms for predicting shear strengths of rockfill materials. Appl Sci. https://doi.org/10.3390/app9081621
Haggag M, Yosri A, El-Dakhakhni W, Hassini E (2022) Interpretable data-driven model for climate-induced disaster damage prediction: the first step in community resilience planning. Int J Disaster Risk Reduct. https://doi.org/10.1016/j.ijdrr.2022.102884
Sathishkumar VE, Cho Y (2020) A rule-based model for Seoul bike sharing demand prediction using weather data. Eur J Remote Sens. https://doi.org/10.1080/22797254.2020.1725789
Sathishkumar VE, Park J, Cho Y (2020) Using data mining techniques for bike sharing demand prediction in metropolitan city. Comput Commun. https://doi.org/10.1016/j.comcom.2020.02.007
Sathishkumar VE, Cho Y (2020) Seoul Bike Sharing Demand Data Set. UCI Machine Learning Repository. https://archive.ics.uci.edu/ml/datasets/Seoul+Bike+Sharing+Demand
Potgieter PH (2020) Machine learning and forecasting: A review. In: Alleman J, Rappoport P, Hamoudia M (Eds) Applied economics in the digital era, pp 193–207. Palgrave Macmillan. https://doi.org/10.1007/978-3-030-40601-1_8
Taieb SB, Bontempi G, Atiya AF, Sorjamaa A (2012) A review and comparison of strategies for multi-step ahead time series forecasting based on the NN5 forecasting competition. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2012.01.039
Yang BS, Tan ACC (2009) Multi-step ahead direct prediction for the machine condition prognosis using regression trees and neuro-fuzzy systems. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2009.01.007
Schmidt J, Marques MR, Botti S, Marques MA (2019) Recent advances and applications of machine learning in solid-state materials science. NPJ Comput Mater. https://doi.org/10.1038/s41524-019-0221-0
Collini E, Nesi P, Pantaleo G (2021) Deep learning for short-term prediction of available bikes on bike-sharing stations. IEEE Access. https://doi.org/10.1109/ACCESS.2021.3110794
Zi W, Xiong W, Chen H, Chen L (2021) TAGCN: Station-level demand prediction for bike-sharing system via a temporal attention graph convolution network. Inf Sci. https://doi.org/10.1016/j.ins.2021.01.065
Mehdizadeh Dastjerdi A, Morency C (2022) Bike-sharing demand prediction at community level under COVID-19 using deep learning. Sensors. https://doi.org/10.3390/s22031060
Ding H, Lu Y, Sze NN, Li H (2022) Effect of dockless bike-sharing scheme on the demand for London Cycle Hire at the disaggregate level using a deep learning approach. Transport Res Part A: Policy Practice. https://doi.org/10.1016/j.tra.2022.10.013
Zhao S, Zhao K, Xia Y, Jia W (2022) Hyper-clustering enhanced spatio-temporal deep learning for traffic and demand prediction in bike-sharing systems. Inf Sci. https://doi.org/10.1016/j.ins.2022.07.054
Gammelli D, Wang Y, Prak D, Rodrigues F, Minner S, Pereira FC (2022) Predictive and prescriptive performance of bike-sharing demand forecasts for inventory management. Transport Res Part C: Emerg Technol. https://doi.org/10.1016/j.trc.2022.103571
Lim H, Chung K, Lee S (2022) Probabilistic forecasting for demand of a bike-sharing service using a deep-learning approach. Sustainability. https://doi.org/10.3390/su142315889
Ma X, Yin Y, Jin Y, He M, Zhu M (2022) Short-term prediction of bike-sharing demand using multi-source data: a spatial-temporal graph attentional LSTM approach. Appl Sci. https://doi.org/10.3390/app12031161
Lee SH, Ku HC (2022) A dual attention-based recurrent neural network for short-term bike sharing usage demand prediction. IEEE Trans Intell Transport Syst. https://doi.org/10.1109/TITS.2022.3208087
Harikrishnakumar R, Nannapaneni S (2023) Forecasting bike sharing demand using quantum Bayesian network. Expert Syst. https://doi.org/10.1016/j.eswa.2023.119749
Li X, Xu Y, Zhang X, Shi W, Yue Y, Li Q (2023) Improving short-term bike sharing demand forecast through an irregular convolutional neural network. Transport Res Part C: Emerg Technol. https://doi.org/10.1016/j.trc.2022.103984
Kim K (2023) Discovering spatiotemporal usage patterns of a bike-sharing system by type of pass: a case study from Seoul. Transportation. https://doi.org/10.1007/s11116-023-10371-7
Choi SJ, Jiao J, Lee HK, Farahi A (2023) Combatting the mismatch: Modeling bike-sharing rental and return machine learning classification forecast in Seoul, South Korea. J Transport Geogr. https://doi.org/10.1016/j.jtrangeo.2023.103587
Lee J, Jeong J, Jung S, Moon J, Rho S (2022) Verification of de-identification techniques for personal information using tree-based methods with Shapley values. J Personalized Med. https://doi.org/10.3390/jpm12020190
Altman N, Krzywinski M (2017) Ensemble methods: bagging and random forests. Nat Methods. https://doi.org/10.1038/nmeth.4438
Oshiro TM, Perez PS, Baranauskas JA (2012) How many trees in a random forest? In: Perner P (Eds) Machine learning and data mining in pattern recognition, pp 154–168. Springer, Cham. https://doi.org/10.1007/978-3-642-31537-4_13
Kuhn M, Johnson K (2013) Applied predictive modeling. Springer, Cham
Vartholomaios A (2019) A machine learning approach to modelling solar irradiation of urban and terrain 3D models. Comput Environ Urban Syst. https://doi.org/10.1016/j.compenvurbsys.2019.101387
Feng C, Jiao J (2021) Predicting and mapping neighborhood-scale health outcomes: a machine learning approach. Comput Environ Urban Syst. https://doi.org/10.1016/j.compenvurbsys.2020.101562
Chen T, Guestrin C (2016) XGBoost: A scalable tree boosting system. Oral session presentation at the 22nd ACM SIGKDD Int Conf Knowledge Discovery and Data Mining. San Francisco, CA. https://doi.org/10.1145/2939672.2939785
Kuhn M, Weston S, Keefer C, Coulter N (2012) Cubist models for regression. https://cran.r-project.org/web/packages/Cubist/. Accessed 1 April 2022
Divina F, Gilson A, Goméz-Vela F, García Torres M, Torres JF (2018) Stacking ensemble learning for short-term electricity consumption forecasting. Energies. https://doi.org/10.3390/en11040949
Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2(3):18–22
Hyndman RJ, Athanasopoulos G (2018) Forecasting: principles and practice. OTexts
Akiba T, Sano S, Yanase T, Ohta T, Koyama M (2019) Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp 2623–2631. https://doi.org/10.1145/3292500.3330701
Jang J, Jeong W, Kim S, Lee B, Lee M, Moon J (2023) RAID: Robust and interpretable daily peak load forecasting via multiple deep neural networks and Shapley values. Sustainability. https://doi.org/10.3390/su15086951
Kuhn M (2008) Building predictive models in R using the caret package. J Stat Softw. https://doi.org/10.18637/jss.v028.i05
Chen T, He T (2017). XGBoost: Extreme gradient boosting. https://cran.r-project.org/web/packages/xgboost/. Accessed 1 April 2022
Malshe A (2019) Data analytics applications. https://ashgreat.github.io/analyticsAppBook/xgboost. Accessed 1 April 2022
Rahman R, Otridge J, Pal R (2017) IntegratedMRF: Random forest-based framework for integrating prediction from different data types. Bioinformatics. https://doi.org/10.1093/bioinformatics/btw765
Jung S, Moon J, Park S, Hwang E (2021) An attention-based multilayer GRU model for multistep-ahead short-term load forecasting. Sensors. https://doi.org/10.3390/s21051639
Moon J, Han Y, Chang H, Rho S (2022) Multistep-ahead solar irradiance forecasting for smart cities Based on LSTM, Bi-LSTM, and GRU neural networks. J Soc e-Bus Stud 27(4):27–52
Molnar C (2020) Interpretable Machine Learning. Lulu.com
Funding
This research was supported by a grant (2021-MOIS37-004) from the Intelligent Technology Development Program on Disaster Response and Emergency Management funded by the Ministry of Interior and Safety (MOIS, Korea) and the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2023–2018-0–01799) supervised by the IITP (Institute for Information & communications Technology Planning & Evaluation). This research was also supported by the Soonchunhyang University Research Fund.
Author information
Authors and Affiliations
Contributions
Conceptualization: SL; Methodology: SL; Writing—original draft: SL; Data curation: JO; Visualization: JO; Validation: JO; Software: JM; Formal analysis: JM; Writing—review and editing: JM; Project administration: MK; Funding acquisition: MK; Investigation: SR; Resources: SR; Supervision: SR.
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Ethical approval
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix 1: Comparative analysis of run times for DT-based EL models
Appendix 1: Comparative analysis of run times for DT-based EL models
The run times of the DT-based EL models with optimal hyperparameter values for the training set are presented in Fig. 17. The XGBoost model outperformed Cubist in terms of training time. The training times for the RF model (without parallel processing) and Cubist were the quickest and slowest, respectively. Therefore, the RF implemented using the Ranger package is suitable for online learning, even in a computing environment with limited performance. We applied fivefold cross-validation with optimal hyperparameter values to generate RF, XGBoost, and Cubist prediction values for the training set. Then, we performed bike-sharing demand prediction on the testing set using the RF, XGBoost, and Cubist models trained with optimal hyperparameters to generate input variables.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Leem, S., Oh, J., Moon, J. et al. Enhancing multistep-ahead bike-sharing demand prediction with a two-stage online learning-based time-series model: insight from Seoul. J Supercomput 80, 4049–4082 (2024). https://doi.org/10.1007/s11227-023-05593-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-023-05593-6