A reinforcement learning approach to convoy scheduling on a contested transportation network

Stimpson, Daniel; Ganesan, Rajesh

doi:10.1007/s11590-015-0875-6

A reinforcement learning approach to convoy scheduling on a contested transportation network

Original Paper
Published: 08 April 2015

Volume 9, pages 1641–1657, (2015)
Cite this article

Optimization Letters Aims and scope Submit manuscript

Daniel Stimpson¹ &
Rajesh Ganesan¹

449 Accesses
3 Citations
Explore all metrics

Abstract

Ambushes, in the form of improvised explosive devices (IEDs), have posed grave risk to targeted vehicles operating on supply routes in recent theaters of war. In fact, history shows that this is an enduring problem that US military forces will likely face again in the future. This paper introduces a fundamental reinforcement learning (RL) model for determining convoy schedules and route clearance assignments in light of attack costs on a transportation network subject to IED ambushes. The model represents opponent interaction by assuming dependence between attack probabilities and targeted traffic patterns. There are currently few analytical approaches for this problem in the literature, but RL algorithms offer opportunities for meaningful improvements by optimizing individual movements across an extended planning horizon, accounting for downstream attacker-defender interaction. To our knowledge this approach has not been pursued elsewhere; therefore, this paper introduces the RL methodology with a fundamental formulation and initial computational results which show meaningful performance improvements over a one-step, myopic decision rules.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

ADP is a family of algorithms used to solve stochastic optimizations problems formulated as infinite horizon dynamic programs, see [11, 14, 27].

References

US Department of the Army, FM 3–24 / MCWP 3–33.5, Counterinsurgency (2006)
Israeli, E., Wood, K.: Shortest-path network interdiction. Networks 40(2), 97–111 (2002)
Article MATH MathSciNet Google Scholar
Tamta, P., Pande, B.P., Dhami, H.S.: Reduction of maximum flow network interdiction problem: step towards the polynomial time solutions. Int. J. Appl. Inf. Syst. 5(5), 25–29 (2013)
Google Scholar
Washburn, A.: Continuous network interdiction, report NPSOR-06-007. Naval Postgraduate School, Monterey (2006)
Google Scholar
Washburn, A., Ewing, P.L.: Allocation of clearance assets in IED warfare. Naval Res. Logist. 58, 180–187 (2011)
Article MathSciNet Google Scholar
Lin, K., Washburn, A.: The effect of decoys in IED warfare. Report prepared for Joint IED Defeat Organization. 5000 Army Pentagon, Washington D.C. (2010)
Marks, C.E.: Optimization-based routing and scheduling of IED-detection assets in contemporary military operations. Masters Thesis, Massachusetts Institute of Technology, Cambridge (2009)
Kolesar, P., Leister, K., Stimpson, D., Woodaman, R.: A simple model of improvised explosive device clearance. Ann. Oper. Res. 208(1), 451–468 (2013)
Article MATH MathSciNet Google Scholar
Leister, K., Hudson, T.: Route clearance team scheduling, final report. George Mason University, Fairfax, Masters Degree Project Course (2009)
Kailbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement learning, an introduction. The MIT Press, Cambridge (1998)
Google Scholar
Powell, W.B., Simao, H.P., Bouzaiene-Ayari, B.: Approximate dynamic programming in transportation and logistics. Eur. J. Transp. Logist. 1(3), 237–284 (2012)
Article Google Scholar
Bellman, R.E.: Dynamic programming. Princeton University Press, Princeton (1957)
Powell, W.B.: Approximate dyanamic programming: solving the curses of dimentionality. Wiley, New York (2007)
Book Google Scholar
Berrtsekas, D.P., Tsitsiklis, J.: Neuro-dynamic programming. Athena Scientific, Belmont (1996)
Google Scholar
Gosavi, A.: Reinforcement learning: a tutorial survey and recent advances. J. Comput. 21, 178–192 (2009)
MATH MathSciNet Google Scholar
Balakrishna, P.: Scalable approximate dynamic programming models with applications in air transport. PhD Dissertation, George Mason University, Fairfax (2009)
Hinton, T.G.: A thesis regarding the vehicle routing problem including a range of novel techniques for it solution. PhD Dissertation, University of Bristol, Bristol (2010)
Stimpson, D.: Thinking about IED warfare. Marine Gazette, pp. 35–42 (2011)
Yamanda, I., Thill, J.: Local indicators of network-constrained clusters in spatial point patterns. Geogr. Anal. 39(3), 268–292 (2007)
Article Google Scholar
Okabe, A., Satoh, T., Furuta, T., Suzuki, A., Okano, K.: Generalized network Voronoi diagrams: concepts, conputational methods, and applications. Int. J. Geogr. Inf. Sci. 22(9), 965–994 (2008)
Article Google Scholar
Xie, Z., Yan, J.: Kernel density estimation of traffic accidents in a network space. Comput. Environ. Urban Syst. 35(5), 396–406 (2008)
Article Google Scholar
Jonsson, G.K.: Hidden temporal pattern in interaction. PhD Thesis, University of Aberdeen, Aberdeen (2011)
Salah, A.A., Pauwels, E., Tavenard, R., Gevers, T.: T-patterns revisited: mining for temporal patterns in sensor data. Sensors 10, 7496–7513 (2010)
Article Google Scholar
Lu, Y., Chen, X.: False alarm of planar K-function when analyzing urban crime distributed along streets. Soc. Sci. Res. 36(2), 611–632 (2007)
Article Google Scholar
Keefe, R., Sullivan, T.: Resource-constrained spatial hot spot identification. RAND CORP, Arlington (2011)
Google Scholar
Bertsekas, D.P.: Dynamic programming and optimal control, 3rd edn, vol. II. Athena Scientific, Belmont (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

SEOR Department, MSN 4A6, George Mason University, 4400 University Drive, Fairfax, VA, 22030, USA
Daniel Stimpson & Rajesh Ganesan

Authors

Daniel Stimpson
View author publications
You can also search for this author in PubMed Google Scholar
Rajesh Ganesan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Stimpson.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Stimpson, D., Ganesan, R. A reinforcement learning approach to convoy scheduling on a contested transportation network. Optim Lett 9, 1641–1657 (2015). https://doi.org/10.1007/s11590-015-0875-6

Download citation

Received: 01 April 2014
Accepted: 05 March 2015
Published: 08 April 2015
Issue Date: December 2015
DOI: https://doi.org/10.1007/s11590-015-0875-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A reinforcement learning approach to convoy scheduling on a contested transportation network

Abstract

Access this article

Similar content being viewed by others

Trends and Applications in Stackelberg Security Games

Trends and Applications in Stackelberg Security Games

Securing Infrastructure Facilities: When Does Proactive Defense Help?

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A reinforcement learning approach to convoy scheduling on a contested transportation network

Abstract

Access this article

Similar content being viewed by others

Trends and Applications in Stackelberg Security Games

Trends and Applications in Stackelberg Security Games

Securing Infrastructure Facilities: When Does Proactive Defense Help?

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation