Skip to main content
Log in

A tutorial on value function approximation for stochastic and dynamic transportation

  • Educational Paper
  • Published:
4OR Aims and scope Submit manuscript

Abstract

This paper provides an introductory tutorial on Value Function Approximation (VFA), a solution class from Approximate Dynamic Programming. VFA describes a heuristic way for solving sequential decision processes like a Markov Decision Process. Real-world problems in supply chain management (and beyond) containing dynamic and stochastic elements might be modeled as such processes, but large-scale instances are intractable to be solved to optimality by enumeration due to the curses of dimensionality. VFA can be a proper method for these cases and this tutorial is designed to ease its use in research, practice, and education. For this, the tutorial describes VFA in the context of stochastic and dynamic transportation and makes three main contributions. First, it gives a concise theoretical overview of VFA’s fundamental concepts, outlines a generic VFA algorithm, and briefly discusses advanced topics of VFA. Second, the VFA algorithm is applied to the taxicab problem that describes an easy-to-understand transportation planning task. Detailed step-by-step results are presented for a small-scale instance, allowing readers to gain an intuition about VFA’s main principles. Third, larger instances are solved by enhancing the basic VFA algorithm demonstrating its general capability to approach more complex problems. The experiments are done with artificial instances and the respective Python scripts are part of an electronic appendix. Overall, the tutorial provides the necessary knowledge to apply VFA to a wide range of stochastic and dynamic settings and addresses likewise researchers, lecturers, tutors, students, and practitioners.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Data availibility statement

The experiments used artificial data generated with Python. The scripts for the algorithms are part of Appendix A.

References

  • Alnaggar A, Gzara F, Bookbinder JH (2020) Distribution planning with random demand and recourse in a transshipment network. EURO J Transp Logist 9:100007

    Article  Google Scholar 

  • Behrend M, Meisel F (2018) The integration of item-sharing and crowdshipping: can collaborative consumption be pushed by delivering through the crowd? Transp Res Part B: Methodol 111:227–243

    Article  Google Scholar 

  • Behrend M, Meisel F (2019) Heterogeneity of items in an integrated item-sharing and crowdshipping setting. In: Fortz B, Labbé M (eds) Operations Research Proceedings 2018. Springer International Publishing, Cham, pp 269–275

  • Bellman R (1957) Dynamic programming. Princeton University Press, Princeton

    Google Scholar 

  • Cattaruzza D, Absi N, Feillet D, González-Feliu J (2017) Vehicle routing problems for city logistics. EURO J Transp Logist 6:51–79

    Article  Google Scholar 

  • Christopher M (2010). Logistics and supply chain management. Pearson Business

  • George A, Powell WB, Kulkarni SR (2008) Value function approximation using multiple aggregation for multiattribute resource management. J Mach Learn Res 9:2079–2111

    Google Scholar 

  • George AP, Powell WB (2006) Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming. Mach Learn 65:167–198

    Article  Google Scholar 

  • Heinold A, Meisel F, Ulmer M.W (2022). Primal-dual value function approximation for stochastic dynamic intermodal transportation with eco-labels. Transp Sci (Ahead of Print). https://doi.org/10.1287/trsc.2022.1164

  • Hoberg K (2020) Supply chain and big data. In: Schintler LA, McNeely CL (eds) Encyclopedia of big data. Springer International Publishing, Cham

    Google Scholar 

  • Maxwell MS, Restrepo M, Henderson SG, Topaloglu H (2010) Approximate dynamic programming for ambulance redeployment. INFORMS J Comput 22:266–281

    Article  Google Scholar 

  • Mes M, van der Heijden M, Schuur P (2010) Look-ahead strategies for dynamic pickup and delivery problems. OR Spectrum 32:395–421

    Article  Google Scholar 

  • Mes MRK, Rivera AP (2017) Approximate dynamic programming by practical examples. In: Boucherie RJ, van Dijk NM (eds) Markov decision processes in practice. Springer International Publishing, Cham

    Google Scholar 

  • Pironet T (2015). Multi-period stochastic optimization problems in transportation management. 4OR 13, 113–114

  • Powell WB (2009) What you should know about approximate dynamic programming. Naval Res Logist 56:239–249

    Article  Google Scholar 

  • Powell WB (2011) Approximate dynamic programming: solving the curses of dimensionality, 2nd edn. Wiley, New Jersey

    Book  Google Scholar 

  • Powell WB, Meisel S (2015) Tutorial on stochastic optimization in energy-part i: Modeling and policies. IEEE Trans Power Syst 31:1459–1467

    Article  Google Scholar 

  • Rayle L, Dai D, Chan N, Cervero R, Shaheen S (2016) Just a better taxi? A survey-based comparison of taxis, transit, and ridesourcing services in San Francisco. Transp Policy 45:168–178

    Article  Google Scholar 

  • Rivera AEP, Mes MR (2017) Anticipatory freight selection in intermodal long-haul round-trips. Transp Res Part E: Logist Transp Rev 105:176–194

    Article  Google Scholar 

  • Ruszczyński A (2010) Commentary: post-decision states and separable approximations are powerful tools of approximate dynamic programming. INFORMS J Comput 22:20–22

    Article  Google Scholar 

  • Schilde M, Doerner KF, Hartl RF (2011) Metaheuristics for the dynamic stochastic dial-a-ride problem with expected return transports. Computers Op Res 38:1719–1730

    Article  Google Scholar 

  • Schmid V (2012) Solving the dynamic ambulance relocation and dispatching problem using approximate dynamic programming. Eur J Op Res 219:611–621

    Article  Google Scholar 

  • Secomandi N, Margot F (2009) Reoptimization approaches for the vehicle-routing problem with stochastic demands. Op Res 57:214–230

    Article  Google Scholar 

  • Sigaud O, Buffet O (2013) Markov decision processes in artificial intelligence. Wiley, New Jersey

    Book  Google Scholar 

  • Simao HP, Day J, George AP, Gifford T, Nienow J, Powell WB (2009) An approximate dynamic programming algorithm for large-scale fleet management: A case application. Transp Sci 43:178–197

    Article  Google Scholar 

  • Soeffker N, Ulmer M.W, Mattfeld D.C (2021). Stochastic dynamic vehicle routing in the light of prescriptive analytics: A review. European Journal of Operational Research . In Press

  • Souza GC (2014) Supply chain analytics. Business Horizons 57:595–605

    Article  Google Scholar 

  • Sutton RS, Barto AG (2018) Reinforcement learning: an introduction, 2nd edn. MIT press, USA

    Google Scholar 

  • Ulmer MW (2020) Dynamic pricing and routing for same-day delivery. Transp Sci 54:1016–1033

    Article  Google Scholar 

  • Ulmer MW, Goodson JC, Mattfeld DC, Hennig M (2019) Offline-online approximate dynamic programming for dynamic vehicle routing with stochastic requests. Transp Sci 53:185–202

    Article  Google Scholar 

  • Ulmer MW, Goodson JC, Mattfeld DC, Thomas BW (2020) On modeling stochastic dynamic vehicle routing problems. EURO J Transp Logist 9:100008

    Article  Google Scholar 

  • Ulmer MW, Soeffker N, Mattfeld DC (2018) Value function approximation for dynamic multi-period vehicle routing. Eur J Op Res 269:883–899

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arne Heinold.

Ethics declarations

Conflict of interest

The research leading to this tutorial received funding from the German Research Foundation (DFG) under reference 268276815. The tutorial is part of my PhD thesis that would not have been possible without the grateful support of my supervisor who gave me the freedom to write this spin-off paper. Finally, I thank the journal’s review team for their valuable comments that helped to improve the manuscript considerably.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (xlsx 11 KB)

Supplementary file 2 (py 12 KB)

Appendices

Appendix A. Python scripts

The Python scripts with the implementations of the two solution approaches (backward dynamic programming, VFA) are available as additional files in the electronic appendix of this paper and in the following repository: https://github.com/aheiX/Tutorial-on-Value-Function-Approximation. These scripts can be used to reproduce results from the experimental study. All data and material is also available from the corresponding author on request.

Appendix B. Additional algorithms

This section describes two algorithms: first, Algorithm 2 provides a generic description of how to solve the Bellman equation backwardly and dynamically and, second, Algorithm 3 extends the general VFA algorithm (see Algorithm 1 in Section 5) with a greedy exploration strategy.

figure b
figure c

Appendix C. Optimal policy in a small-scale example

This section provides details on the optimal policy of the small-scale example used in Section 5. Figure 10  shows the decision tree. Notice that the decision tree consists of repetitive sub-problems. For example, being in \( t = 2 \) in node A appears three times in the decision tree: (i) if \( x_{0} = \text {A} \) and \( x_{1} = \text {A} \), (ii) if \( x_{0} = \text {B} \) and \( x_{1} = \text {A} \), and (iii) if \( x_{0} = \text {C} \) and \( x_{1} = \text {A} \). In backward dynamic programming, values of sub-problems are memoized once they are calculated for the first time, which reduces the total computation time. In the decision tree, values atop of the nodes correspond to expected values following an optimal policy. Such policy considers the immediate reward (resulting from the demand) as well as the downstream value (resulting from future periods). These values are calculated backwardly and dynamically, i.e., first values in \( t = 2 \) are calculated, then values in \( t = 1 \), and so on. As explained in Section 4.3, the value of the optimal policy does not simply result from choosing the move with the highest expected value. Instead, it is calculated by considering the optimal decision under each possible demand-combination. The algorithm for this is provided in Fig. 4b and exemplary applied to nodes A, B, and A of periods 2, 1, and 0 in Formulas (20)–(22), respectively.

$$\begin{aligned} V_{2}(S_{2}|l_{2}=A) =~&(0.33) \cdot 66~ \nonumber \\&+((1-0.33) \cdot 0.61) \cdot 46~ \nonumber \\&+((1 - 0.33 - (1-0.33) \cdot 0.61) \cdot 0.36) \cdot 18 = 42.27 \end{aligned}$$
(20)
$$\begin{aligned} V_{1}(S_{1}|l_{1}=B) =~&\underbrace{(0.60 )}_{p^{r}_{0}}\cdot (72 + \gamma \cdot 79.61)~\nonumber \\&+\underbrace{((1-p^{r}_{0}) \cdot 0.79)}_{p^{r}_{1}} \cdot (33 + \gamma \cdot 42.27)~\nonumber \\&+\underbrace{((1 - p^{r}_{0} - p^{r}_{1}) \cdot 0.39)}_{p^{r}_{2}} \cdot (13 + \gamma \cdot 66.27)~\nonumber \\&+(1 - p^{r}_{0} - p^{r}_{1} - p^{r}_{2}) \cdot (0 + \gamma \cdot 79.61) = 107.96 \end{aligned}$$
(21)
$$\begin{aligned} V_{0}(S_{0}|l_{0}=A) =~&\underbrace{(0.49 )}_{p^{r}_{0}} \cdot (98 + \gamma \cdot 92.01)~\nonumber \\&+\underbrace{((1-p^{r}_{0}) \cdot 0.62)}_{p^{r}_{1}} \cdot (52 + \gamma \cdot 107.96)~\nonumber \\&+\underbrace{((1 - p^{r}_{0} - p^{r}_{1}) \cdot 0.74)}_{p^{r}_{2}} \cdot (28 + \gamma \cdot 115.15)~\nonumber \\&+(1 - p^{r}_{0} - p^{r}_{1} - p^{r}_{2}) \cdot (0 + \gamma \cdot 115.15) = 149.71 \end{aligned}$$
(22)
Fig. 10
figure 10

Decision tree for the example problem. Values atop of the arcs are the demand and its probability and the values atop of the decision nodes are the discounted downstream value following an optimal policy

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Heinold, A. A tutorial on value function approximation for stochastic and dynamic transportation. 4OR-Q J Oper Res 22, 145–173 (2024). https://doi.org/10.1007/s10288-023-00539-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10288-023-00539-3

Keywords

Mathematics Subject Classification

Navigation