Innovative Applications of O.R.
Simulation-based estimation of the real demand in bike-sharing systems in the presence of censoring

https://doi.org/10.1016/j.ejor.2019.02.013Get rights and content

Highlights

  • Inter-pickup/-dropoff times censor the true inter-arrival times for high quantiles.

  • The censoring problem is more significant for heavy-tailed distributions.

  • The true demand is estimated via an iterative simulation-based inference method.

  • Ignoring the censoring effect can lead to incorrect/suboptimal decisions.

Abstract

Data on successful bike pickups/drop-offs censor the demand from customers/riders that were unable to pickup/drop-off a bike due to bike/dock unavailability (i.e., balks). The objective of this paper is two-fold: (1) provide a formal comparison between the distribution of satisfied bike/dock demand and the true (latent) demand in bike-sharing systems through simulation experiments and nonparametric bootstrap tests to show when and how the two may differ; and, (2) propose a novel methodology combining simulation, bootstrapping, and subset selection that harnesses the useful partial information in every bike pickup/drop-off observation (even if it is subject to censoring) to estimate the true demand in situations where data filtering/cleaning approaches commonly used in the bike-sharing literature fail due to lack of valid data. The results reveal that the distribution of inter-pickup/drop-off times may differ (statistically) from the distribution of the actual inter-arrival time of customers/bikes primarily for higher percentile values and even if the demand rate is slower than the supply rate, especially if customer/bike inter-arrival times follow a heavy-tailed distribution. The statistical power of the proposed demand estimation approach in identifying an appropriate model for the underlying demand distribution is tested through simulation experiments as well as a real-world application. The paper has important academic and practical impacts by providing additional means to obtain and use statistically valid demand estimates, enhancing decision-making related to the design and operation of bike-sharing systems.

Section snippets

Introduction and background

A bike-sharing system is a network of GPS-enabled bicycles distributed around a city for rent as an active form of public transportation. There are two primary types of bike-sharing systems: stationed and stationless. In a stationed system, customers can pick up and drop off bikes only at designated stations, while in a stationless system, bikes can be picked up and dropped off anywhere. These systems can play an important role in reducing traffic congestion and carbon emissions, and improving

Preliminaries

In the remainder of the paper, the focus is primarily on estimating the demand for bikes, noting that the analysis would be similar for demand for docks as discussed in Section 6. The components discussed in this section are used extensively throughout the paper.

How Do IPT and CIAT distributions differ?

While it is clear that bike inter-pickup time (IPT) and actual customer inter-arrival time (CIAT) distributions may differ due to censoring, a formal analysis is needed to determine how the two may differ – an important question that the current literature leaves unanswered. This section investigates this question.

Simulation experiments are performed for different combinations of CIAT and BIAT distribution families selected based on a preliminary analysis of real-world data from the CitiBike

The demand estimation problem

Before defining the demand estimation problem, it is important to note that the analyst first needs to determine whether sufficient valid observations can be extracted from the data, and if so, existing data filtering approaches should be used to estimate the true demand. However, if this is not feasible due to lack of valid data (as a result of insufficient supply and high demand) and/or risk of mixing different demand patterns (as discussed above), then the demand estimation problem can be

Real-world application

We consider station 519 in the CitiBike system in the interval between 8:00 AM and 9:00 AM on weekdays in February 2018. Since data on disabled bikes and docks are available, a bike availability threshold of 0 is used to determine VIPT observations. On a typical weekday in that month, bike demand for the station starts to increase at around 6 AM, and with little or no rebalancing performed between 6:00 AM and 8:00 AM, the station has only a few functional bikes at 8:00 AM almost every weekday.

Discussion and conclusions

This paper provides a formal analysis on when and how bike/dock usage data censor the true demand for these resources in bike sharing systems. For situations where commonly used data filtering approaches in the bike-sharing literature are not applicable due to lack of valid historical demand data, the paper introduces a novel iterative methodology combining discrete-event simulation, nonparametric bootstrapping tests, and subset selection to harness the partial information on the underlying

References (37)

  • C. Anderson-Bergman

    icenReg: Regression models for interval censored data in R

    Journal of Statistical Software

    (2017)
  • M. Ansari et al.

    HistoRIA: A new tool for simulation input analysis

    Proceedings of the 2014 winter simulation conference

    (2014)
  • D.G. Becker et al.

    Interval-censored unimodal kernel density estimation via data sharpening

    Journal of Statistical Computation and Simulation

    (2017)
  • J. Boesel et al.

    Using ranking and selection to “clean up” after simulation optimization

    Operations Research

    (2003)
  • M.C. Bryson

    Heavy-tailed distributions: properties and tests

    Technometrics

    (1974)
  • J. Chen et al.

    Parametric statistical change point analysis: with applications to genetics, medicine, and finance

    (2011)
  • W.J. Conover

    Practical nonparametric statistics

    (1980)
  • M. Delignette-Muller et al.

    fitdistrplus: An R package for fitting distributions

    Journal of Statistical Software

    (2015)
  • Cited by (33)

    • Recursive decomposition probability model for demand estimation of street-hailing taxis utilizing GPS trajectory data

      2023, Transportation Research Part B: Methodological
      Citation Excerpt :

      The first is simulation-based estimation and the second is statistics-based estimation. The former method has been used by Ciari et al. (2014), Jian et al. (2016), Negahban (2019), and Fields et al. (2021b). In these studies, a simulator that defines the procedure of service supply, demand occurrence, and pick up procedure was developed, and then the output result (e.g., number of pickups) was compared with the observed information (e.g., observed demand) to find the most likely distribution of demand.

    • Minimizing fleet size and improving vehicle allocation of shared mobility under future uncertainty: A case study of bike sharing

      2022, Journal of Cleaner Production
      Citation Excerpt :

      In this paper, only a few trip demands are not met, which is unavoidable in actual operations. Existing studies (O'Mahony and Shmoys, 2015; Goh et al., 2019; Negahban, 2019) have found that the observed datasets are trip demands that have been met, and there are still some potential trip demands that are not met in actual operations. Because of the uncertainty of the future, the future trip demands cannot be completely predicted and fully met.

    • A simulation framework for a station-based bike-sharing system

      2022, Computers and Industrial Engineering
      Citation Excerpt :

      We report those of particular interest and relevance to this paper. Negahban (2019) presents a novel approach to overcome the intrinsic censoring of user demand and estimate the real demand starting from the partial information available. Eren and Uz (2020) present a review of the factors influencing user demand, in particular weather conditions and temporal factors.

    • Predictive and prescriptive performance of bike-sharing demand forecasts for inventory management

      2022, Transportation Research Part C: Emerging Technologies
      Citation Excerpt :

      Gammelli et al. (2020a, b) employ probabilistic techniques to estimate true demand using Tobit regression combined with Gaussian processes to mitigate the bias caused by censored demand observations, in both single and multi-output settings. In the presence of demand censoring, Negahban (2019) estimates real demand with a combination of simulation and bootstrapping, whereas Albiński et al. (2018) present a data-driven approach to estimate achieved service levels. Boufidis et al. (2020) compare various machine learning models in predicting station-level hourly pickups and returns.

    View all citing articles on Scopus
    View full text