Abstract
In this paper, we present a parameter-free variation of the Sampled Fictitious Play algorithm that facilitates fast solution of deterministic dynamic programming problems. Its random tie-breaking procedure imparts a natural randomness to the algorithm which prevents it from “getting stuck” at a local optimal solution and allows the discovery of an optimal path in a finite number of iterations. Furthermore, we illustrate through an application to maritime navigation that, in practice, a parameter-free Sampled Fictitious Play algorithm finds a high-quality solution after only a few iterations, in contrast with traditional methods.
Similar content being viewed by others
References
Denardo, E.V.: Dynamic Programming. Dover Publications Inc, Mineola, NY (2003)
Bertsekas, D.P.: Dynamic Programming and Optimal Control, 3rd edn. Athena Scientific, Belmont (2007)
Androulakis, I.P.: Dynamic programming: inventory control dynamic programming: Inventory control. In: Floudas, C.A., Pardalos, P.M. (eds.) Encyclopedia of Optimization, pp. 853–856. Springer, US (2009). doi:10.1007/978-0-387-74759-0_149
Khaledi, H., Reisi-Nafchi, M.: Dynamic production planning model: a dynamic programming approach. Int J Adv Manuf Technol 67(5–8), 1675–1681 (2013). doi:10.1007/s00170-012-4600-7
Sancho, N.: A dynamic programming solution of a shortest path problem with time constraints on movement and parking. J. Math. Anal. Appl. 166(1), 192–198 (1992). doi:10.1016/0022-247X(92)90335-B. http://www.sciencedirect.com/science/article/pii/0022247X9290335B
Righini, G., Salani, M.: New dynamic programming algorithms for the resource constrained elementary shortest path problem. Networks 51(3), 155–170 (2008). doi:10.1002/net.v51:3
Plant, W.J., Keller, W.C., Hayes, K.: Simultaneous measurement of ocean winds and waves with an airborne coherent real aperture radar. J. Atmos. Oceanic Technol. 22, 832–846 (2005)
Johnson, J.T., Burkholder, R.J., Toporkov, J.V., Lyzenga, D.R., Plant, W.J.: A numerical study of the retrieval of sea surface height profiles from low grazing angle radar data. IEEE Trans. Geosci. Remote Sens. 47(6), 1641–1650 (2009)
Alford, L.K., Beck, R.F., Johnson, J.T., Lyzenga, D., Nwogu, O., Zundel, A.: Design, implementation, and evaluation of a system for environmental and ship motion forecasting. In: 30th Symposium on Naval Hydrodynamics. Hobart, Tasmania, Australia (2014)
Nwogu, O.G.: Interaction of finite-amplitude waves with vertically-sheared current fields. J. Fluid Mech. 627, 179–213 (2009)
Nwogu, O.G., Lyzenga, D.R.: Surface wavefield estimation from coherent marine radars. IEEE Geosci. Remote Sens. Lett. 7(4), 631–635 (2010)
Zhang, X., Bandyk, P., Beck, R.F.: Seakeeping computations using double-body basis flows. Appl. Ocean Res. 32(4), 471–482 (2010)
Dreyfus, S.E.: An appraisal of some shortest-path algorithms. Oper. Res. 17(3), 395–412 (1969)
Ahuja, R.K., Mehlhorn, K., Orlin, J., Tarjan, R.E.: Faster algorithms for the shortest path problem. JACM 37(2), 213–223 (1990). doi:10.1145/77600.77615
Ahuja, R.K., Magnanti, T.L., Orlin, J.B.: Network Flows. Prentice Hall, Englewood Cliffs (1993)
Schrijver, A.: Combinatorial Optimization: Polyhedra and Efficiency, vol. 24. Springer Science & Business Media, Berlin (2003)
Pearl, J.: Heuristics: Intelligent Search Strategies for Computer Problem Solving. Addison-Wesley, Reading (1984)
Gubichev, A., Bedathur, S., Seufert, S., Weikum, G.: Fast and accurate estimation of shortest paths in large graphs. In: Proceedings of the 19th ACM international conference on information and knowledge management, CIKM ’10, pp. 499–508. ACM, New York, NY (2010). doi:10.1145/1871437.1871503
Brown, G.W.: Iterative solution of games by fictitious play. In: Koopmans, T.C. (ed.) Activity Analysis of Production and Allocation, chap. XXIV, pp. 374–376. Wiley, New York (1951)
Robinson, J.: An iterative method of solving a game. Ann. Math. 54(2), 296–301 (1951)
Monderer, D., Shapley, L.S.: Fictitious play property for games with identical interests. J. Econ. Theory 68(14), 258–265 (1996)
Lambert, T.J.I., Epelman, M.A., Smith, R.L.: A fictitious play approach to large-scale optimization. Oper. Res. 53(3), 477–489 (2005)
Cheng, S.F., Epelman, M.A., Smith, R.L.: CoSIGN: a parallel algorithm for coordinated traffic signal control. IEEE Trans. Intell. Trans. Syst. 7(4), 551–564 (2006)
Garcia, A., Reaume, D., Smith, R.L.: Fictitious play for finding system optimal routing in dynamic traffic networks. Trans. Res. B 34(2), 147–156 (2000)
Garcia, A., Patek, S.D., Sinha, K.: A decentralized approach to discrete optimization via simulation: application to network flow. Oper. Res. 55(4), 717–732 (2007)
Ghate, A., Cheng, S.F., Baumert, S., Reaume, D., Sharma, D., Smith, R.L.: Sampled fictitious play for multi-action stochastic dynamic programs. IIE Trans. 46(7), 742–756 (2014)
Sisikoglu, E.: Distributed algorithms based on fictitious play for near optimal sequential decision making. Ph.D. thesis, The University of Michigan, Ann Arbor, MI (2009)
Epelman, M.A., Ghate, A., Smith, R.L.: Sampled fictitious play for approximate dynamic programming. Comput. Oper. Res. 36(12), 1705–1718 (2011)
Sisikoglu, E., Epelman, M.A., Smith, R.L.: A sampled fictitious play based learning algorithm for infinite horizon markov decision processes. In: S. Jain, R.R. Creasey, J. Himmelspach, K.P. White, M. Fu (eds.) Proceedings of the 2011 winter simulation conference, pp. 4086–4097 (2011)
Powell, W.B.: Approximate Dynamic Programming: Solving the Curses of Dimensionality, vol. 703. Wiley, Hoboken (2007)
Si, J., Barto, A.G., Powell, W.B., Wunsch, D.: Handbook of Learning and Approximate Dynamic Programming (IEEE Press Series on Computational Intelligence). Wiley-IEEE Press, New York (2004)
Marden, J.R., Young, H.P., Arslan, G., Shamma, J.S.: Payoff-based dynamics for multiplayer weakly acyclic games. SIAM J. Control Optim. 48(1), 373–396 (2009). doi:10.1137/070680199
Buşoniu, L., Babuška, R., De Schutter, B., Ernst, D.: Reinforcement Learning and Dynamic Programming Using Function Approximators. CRC Press, Boca Raton (2010) doi:10.1201/9781439821091
Vrabie, D., Vamvoudakis, K.G., Lewis, F.L.: Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles. The Institute of Engineering and Technology, London (2012)
Zermelo, E.: Über das navigationsproblem bei ruhender oder veränderlicher windverteilung. Z. Angew. Math. Mech. 11(2), 114–124 (1931)
Faulkner, F.D.: A general numerical method for determining optimum ship routes. Navigation 10(2), 143–148 (1963)
Faulkner, F.D.: Numerical methods for determining optimum ship routes. Navigation 10(4), 351–367 (1963)
Papadakis, N.A., Perakis, A.N.: Deterministic minimal time vessel routing. Oper. Res. 38(3), 426–438 (1990)
Perakis, A.N., Papadakis, N.A.: New models for minimal time ship weather routing. Soc. Naval Arch. Marine Eng. Trans. 96, 247–269 (1988)
Perakis, A.N., Papadakis, N.A.: Minimal time vessel routing in a time-dependent environment. Trans. Sci. 23(4), 266–276 (1989)
Kimball, J.C., Story, H.: Fermat’s principle, Huygens’ principle, Hamilton’s optics and sailing strategy. Eur. J. Phys. 19, 15–24 (1998)
Philpott, A.B., Sullivan, R.M., Jackson, P.S.: Yacht velocity prediction using mathematical programming. Eur. J. Oper. Res. 67(1), 13–24 (1993)
Allsopp, T., Mason, A., Philpott, A.B.: Optimal sailing routes with uncertain weather. In: Proceedings of the 35th annual conference of the operational research society of New Zealand, pp. 65–74 (2000)
Philpott, A.B.: Stochastic optimization and yacht racing. In: Applications of stochastic programming, MPS/SIAM Ser. Optim., vol. 5, pp. 315–336. SIAM, Philadelphia, PA (2005)
Philpott, A.B., Mason, A.: Optimising yacht routes under uncertainty. In: The 15th Cheasapeake Sailing Yacht Symposium (2001)
Mitchell, J.S.B.: Geometric shortest paths and network optimization. In: Handbook of computational geometry, pp. 633–701. North-Holland, Amsterdam (2000)
Lanthier, M., Maheshwari, A., Sack, J.R.: Shortest anisotropic paths on terrains. In: Automata, languages and programming (Prague, 1999), Lecture Notes in Comput. Sci., vol. 1644, pp. 524–533. Springer, Berlin (1999)
Rowe, N.C.: Obtaining optimal mobile-robot paths with nonsmooth anisotropic cost functions using qualitative-state reasoning. Int. J. Rob. Res. 16(3), 375–399 (1997)
Rowe, N.C., Ross, R.S.: Optimal grid-free path planning across arbitrarily contoured terrain with anisotropic friction and gravity effects. IEEE Trans. Rob. Autom. 6(5), 540–553 (1990)
Sun, Z., Rief, J.H.: On finding energy-minimizing paths on terrains. IEEE Trans. Rob. 21(1), 102–114 (2005)
Nilim, A., El Ghaoui, L., Hansen, M., Duong, V.: Trajectory-based air traffic management (TB-ATM) under weather uncertainty. In: Proceedings of the Fourth International Air Traffic Management R&D Seminar ATM. Santa Fe, New Mexico (2001)
Nilim, A., El Ghaoui, L.: Algorithms for air traffic flow management under stochastic environments. Proceedings of American control conference 4, 3429–3434 (2004)
Fang, M.C., Luo, J.H.: On the track keeping and roll reduction of the ship in random waves using different sliding mode controllers. Ocean Eng. 34, 479–488 (2007)
Treakle, T.W.I., Mook, D.T., Liapis, S.I., Nayfeh, A.H.: A time-domain method to evaluate the use of moving weights to reduce the roll motion of a ship. Ocean Eng. 27(12), 1321–1343 (2000)
Smith, T.C., Thomas III, W.L.: A survey of ship motion reduction devices. Departmental Report SHD-1338-01, David Taylor Research Center, Bethesda, Maryland 20084-5000 (1990)
Dolinskaya, I.S.: Optimal path finding in direction, location and time dependent environments. Nav. Res. Logist. Quart. 59(5), 325–339 (2012)
Dijkstra, E.W.: A note on two problems in connexion with graphs. Numer. Math. 1(1), 269–271 (1959)
Ross, S.M.: Stochastic Processes, 2nd edn. Wiley, New York (1995)
Zwillinger, D., Kokoska, S.: CRC Standard Probability and Statistics Tables and Formulae. CRC Press, Boca Raton (1999)
Fossen, T.I.: Guidance and Control of Ocean Vehicles. Wiley, New York (1994)
Dubins, L.E.: On curves of minimal length with a constraint on average curvature, and with prescribed initial and terminal positions and tangents. Amer. J. Math. 79, 497–516 (1957)
Sussmann, H.J., Tang, G.: Shortest path for the Reeds-Shepp car: a worked out example of the use of geometric techniques in nonlinear optimal control. Tech. Rep. SYCON-91-10, Rutgers Center for Systems and Control (1991)
Boissonnat, J.D., Cérézo, A., Leblond, J.: Shortest paths of bounded curvature in the plane. J. Intell. Rob. Syst. 11(1–2), 5–20 (1994)
Alden, J.M., Smith, R.L.: Rolling horizon procedures in nonhomogeneous Markov decision processes. Oper. Res. 40(suppl. 2), S183–S194 (1992)
Lee, C.Y., Denardo, E.V.: Rolling planning horizons: error bounds for the dynamic lot size model. Math. Oper. Res. 11(3), 423–432 (1986)
Ovacikt, I.M., Uzsoy, R.: Rolling horizon algorithms for a single-machine dynamic scheduling problem with sequence-dependent setup times. Int. J. Prod. Res. 32(6), 1243–1263 (1994)
Office of Naval Research: MURI-optimal vessel maneuvering in evolving nonlinear wave fields: Final meeting. Arlington, VA (2011)
Acknowledgments
The authors would like to thank Okey Nwogu and Fernando Tavares for their assistance with implementation and numerical results. This work was supported in part by the Office of Naval Research through the Multidisciplinary University Research Initiative (MURI) Optimum Vessel Performance in Evolving Nonlinear Wave Fields Grant (N00014-05-1-0537).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Kyriakos G. Vamvoudakis.
Rights and permissions
About this article
Cite this article
Dolinskaya, I.S., Epelman, M.A., Şişikoğlu Sir, E. et al. Parameter-Free Sampled Fictitious Play for Solving Deterministic Dynamic Programming Problems. J Optim Theory Appl 169, 631–655 (2016). https://doi.org/10.1007/s10957-015-0798-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-015-0798-5