Skip to main content
Log in

A reinforcement learning approach to convoy scheduling on a contested transportation network

  • Original Paper
  • Published:
Optimization Letters Aims and scope Submit manuscript

Abstract

Ambushes, in the form of improvised explosive devices (IEDs), have posed grave risk to targeted vehicles operating on supply routes in recent theaters of war. In fact, history shows that this is an enduring problem that US military forces will likely face again in the future. This paper introduces a fundamental reinforcement learning (RL) model for determining convoy schedules and route clearance assignments in light of attack costs on a transportation network subject to IED ambushes. The model represents opponent interaction by assuming dependence between attack probabilities and targeted traffic patterns. There are currently few analytical approaches for this problem in the literature, but RL algorithms offer opportunities for meaningful improvements by optimizing individual movements across an extended planning horizon, accounting for downstream attacker-defender interaction. To our knowledge this approach has not been pursued elsewhere; therefore, this paper introduces the RL methodology with a fundamental formulation and initial computational results which show meaningful performance improvements over a one-step, myopic decision rules.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. ADP is a family of algorithms used to solve stochastic optimizations problems formulated as infinite horizon dynamic programs, see [11, 14, 27].

References

  1. US Department of the Army, FM 3–24 / MCWP 3–33.5, Counterinsurgency (2006)

  2. Israeli, E., Wood, K.: Shortest-path network interdiction. Networks 40(2), 97–111 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  3. Tamta, P., Pande, B.P., Dhami, H.S.: Reduction of maximum flow network interdiction problem: step towards the polynomial time solutions. Int. J. Appl. Inf. Syst. 5(5), 25–29 (2013)

    Google Scholar 

  4. Washburn, A.: Continuous network interdiction, report NPSOR-06-007. Naval Postgraduate School, Monterey (2006)

    Google Scholar 

  5. Washburn, A., Ewing, P.L.: Allocation of clearance assets in IED warfare. Naval Res. Logist. 58, 180–187 (2011)

    Article  MathSciNet  Google Scholar 

  6. Lin, K., Washburn, A.: The effect of decoys in IED warfare. Report prepared for Joint IED Defeat Organization. 5000 Army Pentagon, Washington D.C. (2010)

  7. Marks, C.E.: Optimization-based routing and scheduling of IED-detection assets in contemporary military operations. Masters Thesis, Massachusetts Institute of Technology, Cambridge (2009)

  8. Kolesar, P., Leister, K., Stimpson, D., Woodaman, R.: A simple model of improvised explosive device clearance. Ann. Oper. Res. 208(1), 451–468 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  9. Leister, K., Hudson, T.: Route clearance team scheduling, final report. George Mason University, Fairfax, Masters Degree Project Course (2009)

  10. Kailbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)

    Google Scholar 

  11. Sutton, R.S., Barto, A.G.: Reinforcement learning, an introduction. The MIT Press, Cambridge (1998)

    Google Scholar 

  12. Powell, W.B., Simao, H.P., Bouzaiene-Ayari, B.: Approximate dynamic programming in transportation and logistics. Eur. J. Transp. Logist. 1(3), 237–284 (2012)

    Article  Google Scholar 

  13. Bellman, R.E.: Dynamic programming. Princeton University Press, Princeton (1957)

  14. Powell, W.B.: Approximate dyanamic programming: solving the curses of dimentionality. Wiley, New York (2007)

    Book  Google Scholar 

  15. Berrtsekas, D.P., Tsitsiklis, J.: Neuro-dynamic programming. Athena Scientific, Belmont (1996)

    Google Scholar 

  16. Gosavi, A.: Reinforcement learning: a tutorial survey and recent advances. J. Comput. 21, 178–192 (2009)

    MATH  MathSciNet  Google Scholar 

  17. Balakrishna, P.: Scalable approximate dynamic programming models with applications in air transport. PhD Dissertation, George Mason University, Fairfax (2009)

  18. Hinton, T.G.: A thesis regarding the vehicle routing problem including a range of novel techniques for it solution. PhD Dissertation, University of Bristol, Bristol (2010)

  19. Stimpson, D.: Thinking about IED warfare. Marine Gazette, pp. 35–42 (2011)

  20. Yamanda, I., Thill, J.: Local indicators of network-constrained clusters in spatial point patterns. Geogr. Anal. 39(3), 268–292 (2007)

    Article  Google Scholar 

  21. Okabe, A., Satoh, T., Furuta, T., Suzuki, A., Okano, K.: Generalized network Voronoi diagrams: concepts, conputational methods, and applications. Int. J. Geogr. Inf. Sci. 22(9), 965–994 (2008)

    Article  Google Scholar 

  22. Xie, Z., Yan, J.: Kernel density estimation of traffic accidents in a network space. Comput. Environ. Urban Syst. 35(5), 396–406 (2008)

    Article  Google Scholar 

  23. Jonsson, G.K.: Hidden temporal pattern in interaction. PhD Thesis, University of Aberdeen, Aberdeen (2011)

  24. Salah, A.A., Pauwels, E., Tavenard, R., Gevers, T.: T-patterns revisited: mining for temporal patterns in sensor data. Sensors 10, 7496–7513 (2010)

    Article  Google Scholar 

  25. Lu, Y., Chen, X.: False alarm of planar K-function when analyzing urban crime distributed along streets. Soc. Sci. Res. 36(2), 611–632 (2007)

    Article  Google Scholar 

  26. Keefe, R., Sullivan, T.: Resource-constrained spatial hot spot identification. RAND CORP, Arlington (2011)

    Google Scholar 

  27. Bertsekas, D.P.: Dynamic programming and optimal control, 3rd edn, vol. II. Athena Scientific, Belmont (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Stimpson.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Stimpson, D., Ganesan, R. A reinforcement learning approach to convoy scheduling on a contested transportation network. Optim Lett 9, 1641–1657 (2015). https://doi.org/10.1007/s11590-015-0875-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11590-015-0875-6

Keywords

Navigation