Abstract
Ambushes, in the form of improvised explosive devices (IEDs), have posed grave risk to targeted vehicles operating on supply routes in recent theaters of war. In fact, history shows that this is an enduring problem that US military forces will likely face again in the future. This paper introduces a fundamental reinforcement learning (RL) model for determining convoy schedules and route clearance assignments in light of attack costs on a transportation network subject to IED ambushes. The model represents opponent interaction by assuming dependence between attack probabilities and targeted traffic patterns. There are currently few analytical approaches for this problem in the literature, but RL algorithms offer opportunities for meaningful improvements by optimizing individual movements across an extended planning horizon, accounting for downstream attacker-defender interaction. To our knowledge this approach has not been pursued elsewhere; therefore, this paper introduces the RL methodology with a fundamental formulation and initial computational results which show meaningful performance improvements over a one-step, myopic decision rules.








Similar content being viewed by others
References
US Department of the Army, FM 3–24 / MCWP 3–33.5, Counterinsurgency (2006)
Israeli, E., Wood, K.: Shortest-path network interdiction. Networks 40(2), 97–111 (2002)
Tamta, P., Pande, B.P., Dhami, H.S.: Reduction of maximum flow network interdiction problem: step towards the polynomial time solutions. Int. J. Appl. Inf. Syst. 5(5), 25–29 (2013)
Washburn, A.: Continuous network interdiction, report NPSOR-06-007. Naval Postgraduate School, Monterey (2006)
Washburn, A., Ewing, P.L.: Allocation of clearance assets in IED warfare. Naval Res. Logist. 58, 180–187 (2011)
Lin, K., Washburn, A.: The effect of decoys in IED warfare. Report prepared for Joint IED Defeat Organization. 5000 Army Pentagon, Washington D.C. (2010)
Marks, C.E.: Optimization-based routing and scheduling of IED-detection assets in contemporary military operations. Masters Thesis, Massachusetts Institute of Technology, Cambridge (2009)
Kolesar, P., Leister, K., Stimpson, D., Woodaman, R.: A simple model of improvised explosive device clearance. Ann. Oper. Res. 208(1), 451–468 (2013)
Leister, K., Hudson, T.: Route clearance team scheduling, final report. George Mason University, Fairfax, Masters Degree Project Course (2009)
Kailbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)
Sutton, R.S., Barto, A.G.: Reinforcement learning, an introduction. The MIT Press, Cambridge (1998)
Powell, W.B., Simao, H.P., Bouzaiene-Ayari, B.: Approximate dynamic programming in transportation and logistics. Eur. J. Transp. Logist. 1(3), 237–284 (2012)
Bellman, R.E.: Dynamic programming. Princeton University Press, Princeton (1957)
Powell, W.B.: Approximate dyanamic programming: solving the curses of dimentionality. Wiley, New York (2007)
Berrtsekas, D.P., Tsitsiklis, J.: Neuro-dynamic programming. Athena Scientific, Belmont (1996)
Gosavi, A.: Reinforcement learning: a tutorial survey and recent advances. J. Comput. 21, 178–192 (2009)
Balakrishna, P.: Scalable approximate dynamic programming models with applications in air transport. PhD Dissertation, George Mason University, Fairfax (2009)
Hinton, T.G.: A thesis regarding the vehicle routing problem including a range of novel techniques for it solution. PhD Dissertation, University of Bristol, Bristol (2010)
Stimpson, D.: Thinking about IED warfare. Marine Gazette, pp. 35–42 (2011)
Yamanda, I., Thill, J.: Local indicators of network-constrained clusters in spatial point patterns. Geogr. Anal. 39(3), 268–292 (2007)
Okabe, A., Satoh, T., Furuta, T., Suzuki, A., Okano, K.: Generalized network Voronoi diagrams: concepts, conputational methods, and applications. Int. J. Geogr. Inf. Sci. 22(9), 965–994 (2008)
Xie, Z., Yan, J.: Kernel density estimation of traffic accidents in a network space. Comput. Environ. Urban Syst. 35(5), 396–406 (2008)
Jonsson, G.K.: Hidden temporal pattern in interaction. PhD Thesis, University of Aberdeen, Aberdeen (2011)
Salah, A.A., Pauwels, E., Tavenard, R., Gevers, T.: T-patterns revisited: mining for temporal patterns in sensor data. Sensors 10, 7496–7513 (2010)
Lu, Y., Chen, X.: False alarm of planar K-function when analyzing urban crime distributed along streets. Soc. Sci. Res. 36(2), 611–632 (2007)
Keefe, R., Sullivan, T.: Resource-constrained spatial hot spot identification. RAND CORP, Arlington (2011)
Bertsekas, D.P.: Dynamic programming and optimal control, 3rd edn, vol. II. Athena Scientific, Belmont (2011)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Stimpson, D., Ganesan, R. A reinforcement learning approach to convoy scheduling on a contested transportation network. Optim Lett 9, 1641–1657 (2015). https://doi.org/10.1007/s11590-015-0875-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11590-015-0875-6