A tutorial on value function approximation for stochastic and dynamic transportation

Heinold, Arne

doi:10.1007/s10288-023-00539-3

A tutorial on value function approximation for stochastic and dynamic transportation

Educational Paper
Published: 04 April 2023

Volume 22, pages 145–173, (2024)
Cite this article

4OR Aims and scope Submit manuscript

Arne Heinold ORCID: orcid.org/0000-0002-5460-4235¹

234 Accesses
Explore all metrics

Abstract

This paper provides an introductory tutorial on Value Function Approximation (VFA), a solution class from Approximate Dynamic Programming. VFA describes a heuristic way for solving sequential decision processes like a Markov Decision Process. Real-world problems in supply chain management (and beyond) containing dynamic and stochastic elements might be modeled as such processes, but large-scale instances are intractable to be solved to optimality by enumeration due to the curses of dimensionality. VFA can be a proper method for these cases and this tutorial is designed to ease its use in research, practice, and education. For this, the tutorial describes VFA in the context of stochastic and dynamic transportation and makes three main contributions. First, it gives a concise theoretical overview of VFA’s fundamental concepts, outlines a generic VFA algorithm, and briefly discusses advanced topics of VFA. Second, the VFA algorithm is applied to the taxicab problem that describes an easy-to-understand transportation planning task. Detailed step-by-step results are presented for a small-scale instance, allowing readers to gain an intuition about VFA’s main principles. Third, larger instances are solved by enhancing the basic VFA algorithm demonstrating its general capability to approach more complex problems. The experiments are done with artificial instances and the respective Python scripts are part of an electronic appendix. Overall, the tutorial provides the necessary knowledge to apply VFA to a wide range of stochastic and dynamic settings and addresses likewise researchers, lecturers, tutors, students, and practitioners.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data availibility statement

The experiments used artificial data generated with Python. The scripts for the algorithms are part of Appendix A.

References

Alnaggar A, Gzara F, Bookbinder JH (2020) Distribution planning with random demand and recourse in a transshipment network. EURO J Transp Logist 9:100007
Article Google Scholar
Behrend M, Meisel F (2018) The integration of item-sharing and crowdshipping: can collaborative consumption be pushed by delivering through the crowd? Transp Res Part B: Methodol 111:227–243
Article Google Scholar
Behrend M, Meisel F (2019) Heterogeneity of items in an integrated item-sharing and crowdshipping setting. In: Fortz B, Labbé M (eds) Operations Research Proceedings 2018. Springer International Publishing, Cham, pp 269–275
Bellman R (1957) Dynamic programming. Princeton University Press, Princeton
Google Scholar
Cattaruzza D, Absi N, Feillet D, González-Feliu J (2017) Vehicle routing problems for city logistics. EURO J Transp Logist 6:51–79
Article Google Scholar
Christopher M (2010). Logistics and supply chain management. Pearson Business
George A, Powell WB, Kulkarni SR (2008) Value function approximation using multiple aggregation for multiattribute resource management. J Mach Learn Res 9:2079–2111
Google Scholar
George AP, Powell WB (2006) Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming. Mach Learn 65:167–198
Article Google Scholar
Heinold A, Meisel F, Ulmer M.W (2022). Primal-dual value function approximation for stochastic dynamic intermodal transportation with eco-labels. Transp Sci (Ahead of Print). https://doi.org/10.1287/trsc.2022.1164
Hoberg K (2020) Supply chain and big data. In: Schintler LA, McNeely CL (eds) Encyclopedia of big data. Springer International Publishing, Cham
Google Scholar
Maxwell MS, Restrepo M, Henderson SG, Topaloglu H (2010) Approximate dynamic programming for ambulance redeployment. INFORMS J Comput 22:266–281
Article Google Scholar
Mes M, van der Heijden M, Schuur P (2010) Look-ahead strategies for dynamic pickup and delivery problems. OR Spectrum 32:395–421
Article Google Scholar
Mes MRK, Rivera AP (2017) Approximate dynamic programming by practical examples. In: Boucherie RJ, van Dijk NM (eds) Markov decision processes in practice. Springer International Publishing, Cham
Google Scholar
Pironet T (2015). Multi-period stochastic optimization problems in transportation management. 4OR 13, 113–114
Powell WB (2009) What you should know about approximate dynamic programming. Naval Res Logist 56:239–249
Article Google Scholar
Powell WB (2011) Approximate dynamic programming: solving the curses of dimensionality, 2nd edn. Wiley, New Jersey
Book Google Scholar
Powell WB, Meisel S (2015) Tutorial on stochastic optimization in energy-part i: Modeling and policies. IEEE Trans Power Syst 31:1459–1467
Article Google Scholar
Rayle L, Dai D, Chan N, Cervero R, Shaheen S (2016) Just a better taxi? A survey-based comparison of taxis, transit, and ridesourcing services in San Francisco. Transp Policy 45:168–178
Article Google Scholar
Rivera AEP, Mes MR (2017) Anticipatory freight selection in intermodal long-haul round-trips. Transp Res Part E: Logist Transp Rev 105:176–194
Article Google Scholar
Ruszczyński A (2010) Commentary: post-decision states and separable approximations are powerful tools of approximate dynamic programming. INFORMS J Comput 22:20–22
Article Google Scholar
Schilde M, Doerner KF, Hartl RF (2011) Metaheuristics for the dynamic stochastic dial-a-ride problem with expected return transports. Computers Op Res 38:1719–1730
Article Google Scholar
Schmid V (2012) Solving the dynamic ambulance relocation and dispatching problem using approximate dynamic programming. Eur J Op Res 219:611–621
Article Google Scholar
Secomandi N, Margot F (2009) Reoptimization approaches for the vehicle-routing problem with stochastic demands. Op Res 57:214–230
Article Google Scholar
Sigaud O, Buffet O (2013) Markov decision processes in artificial intelligence. Wiley, New Jersey
Book Google Scholar
Simao HP, Day J, George AP, Gifford T, Nienow J, Powell WB (2009) An approximate dynamic programming algorithm for large-scale fleet management: A case application. Transp Sci 43:178–197
Article Google Scholar
Soeffker N, Ulmer M.W, Mattfeld D.C (2021). Stochastic dynamic vehicle routing in the light of prescriptive analytics: A review. European Journal of Operational Research . In Press
Souza GC (2014) Supply chain analytics. Business Horizons 57:595–605
Article Google Scholar
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction, 2nd edn. MIT press, USA
Google Scholar
Ulmer MW (2020) Dynamic pricing and routing for same-day delivery. Transp Sci 54:1016–1033
Article Google Scholar
Ulmer MW, Goodson JC, Mattfeld DC, Hennig M (2019) Offline-online approximate dynamic programming for dynamic vehicle routing with stochastic requests. Transp Sci 53:185–202
Article Google Scholar
Ulmer MW, Goodson JC, Mattfeld DC, Thomas BW (2020) On modeling stochastic dynamic vehicle routing problems. EURO J Transp Logist 9:100008
Article Google Scholar
Ulmer MW, Soeffker N, Mattfeld DC (2018) Value function approximation for dynamic multi-period vehicle routing. Eur J Op Res 269:883–899
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Economics and Business, Kiel University, Kiel, Germany
Arne Heinold

Authors

Arne Heinold
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arne Heinold.

Ethics declarations

Conflict of interest

The research leading to this tutorial received funding from the German Research Foundation (DFG) under reference 268276815. The tutorial is part of my PhD thesis that would not have been possible without the grateful support of my supervisor who gave me the freedom to write this spin-off paper. Finally, I thank the journal’s review team for their valuable comments that helped to improve the manuscript considerably.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (xlsx 11 KB)

Supplementary file 2 (py 12 KB)

Appendices

Appendix A. Python scripts

The Python scripts with the implementations of the two solution approaches (backward dynamic programming, VFA) are available as additional files in the electronic appendix of this paper and in the following repository: https://github.com/aheiX/Tutorial-on-Value-Function-Approximation. These scripts can be used to reproduce results from the experimental study. All data and material is also available from the corresponding author on request.

Appendix B. Additional algorithms

This section describes two algorithms: first, Algorithm 2 provides a generic description of how to solve the Bellman equation backwardly and dynamically and, second, Algorithm 3 extends the general VFA algorithm (see Algorithm 1 in Section 5) with a greedy exploration strategy.

Appendix C. Optimal policy in a small-scale example

This section provides details on the optimal policy of the small-scale example used in Section 5. Figure 10 shows the decision tree. Notice that the decision tree consists of repetitive sub-problems. For example, being in $ t = 2 $ in node A appears three times in the decision tree: (i) if $ x_{0} = \text {A} $ and $ x_{1} = \text {A} $, (ii) if $ x_{0} = \text {B} $ and $ x_{1} = \text {A} $, and (iii) if $ x_{0} = \text {C} $ and $ x_{1} = \text {A} $. In backward dynamic programming, values of sub-problems are memoized once they are calculated for the first time, which reduces the total computation time. In the decision tree, values atop of the nodes correspond to expected values following an optimal policy. Such policy considers the immediate reward (resulting from the demand) as well as the downstream value (resulting from future periods). These values are calculated backwardly and dynamically, i.e., first values in $ t = 2 $ are calculated, then values in $ t = 1 $, and so on. As explained in Section 4.3, the value of the optimal policy does not simply result from choosing the move with the highest expected value. Instead, it is calculated by considering the optimal decision under each possible demand-combination. The algorithm for this is provided in Fig. 4b and exemplary applied to nodes A, B, and A of periods 2, 1, and 0 in Formulas (20)–(22), respectively.

$$\begin{aligned} V_{2}(S_{2}|l_{2}=A) =~&(0.33) \cdot 66~ \nonumber \\&+((1-0.33) \cdot 0.61) \cdot 46~ \nonumber \\&+((1 - 0.33 - (1-0.33) \cdot 0.61) \cdot 0.36) \cdot 18 = 42.27 \end{aligned}$$

(20)

$$\begin{aligned} V_{1}(S_{1}|l_{1}=B) =~&\underbrace{(0.60 )}_{p^{r}_{0}}\cdot (72 + \gamma \cdot 79.61)~\nonumber \\&+\underbrace{((1-p^{r}_{0}) \cdot 0.79)}_{p^{r}_{1}} \cdot (33 + \gamma \cdot 42.27)~\nonumber \\&+\underbrace{((1 - p^{r}_{0} - p^{r}_{1}) \cdot 0.39)}_{p^{r}_{2}} \cdot (13 + \gamma \cdot 66.27)~\nonumber \\&+(1 - p^{r}_{0} - p^{r}_{1} - p^{r}_{2}) \cdot (0 + \gamma \cdot 79.61) = 107.96 \end{aligned}$$

(21)

$$\begin{aligned} V_{0}(S_{0}|l_{0}=A) =~&\underbrace{(0.49 )}_{p^{r}_{0}} \cdot (98 + \gamma \cdot 92.01)~\nonumber \\&+\underbrace{((1-p^{r}_{0}) \cdot 0.62)}_{p^{r}_{1}} \cdot (52 + \gamma \cdot 107.96)~\nonumber \\&+\underbrace{((1 - p^{r}_{0} - p^{r}_{1}) \cdot 0.74)}_{p^{r}_{2}} \cdot (28 + \gamma \cdot 115.15)~\nonumber \\&+(1 - p^{r}_{0} - p^{r}_{1} - p^{r}_{2}) \cdot (0 + \gamma \cdot 115.15) = 149.71 \end{aligned}$$

(22)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Heinold, A. A tutorial on value function approximation for stochastic and dynamic transportation. 4OR-Q J Oper Res 22, 145–173 (2024). https://doi.org/10.1007/s10288-023-00539-3

Download citation

Received: 29 September 2021
Revised: 07 September 2022
Accepted: 17 February 2023
Published: 04 April 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s10288-023-00539-3

Keywords

Mathematics Subject Classification

90-01

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A tutorial on value function approximation for stochastic and dynamic transportation

Abstract

Access this article

Data availibility statement

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (xlsx 11 KB)

Supplementary file 2 (py 12 KB)

Appendices

Appendix A. Python scripts

Appendix B. Additional algorithms

Appendix C. Optimal policy in a small-scale example

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation