Abstract
We study the problem of inferring the discount factor of an agent optimizing a discounted reward objective in a finite state Markov Decision Process (MDP). Discounted reward objectives are common in sequential optimization, reinforcement learning, and algorithmic game theory. The discount factor is an important parameter used in formulating the discounted reward. It captures the “time value” of the reward - i.e., how much reward at hand would equal a promised reward at a future time. Knowing an agent’s discount factor can provide valuable insights into their decision-making, and help predict their preferences in previously unseen environments. However, pinpointing the exact value of the discount factor used by the agent is a challenging problem. Ad-hoc guesses are often incorrect.
This paper focuses on the problem of computing the range of possible discount factors for a rational agent given their policy. A naive solution to this problem can be quite expensive. A classic result by Smallwood shows that the interval [0, 1) of possible discount factor can be partitioned into finitely many sub-intervals, such that the optimal policy remains the same for each such sub-interval. Furthermore, optimal policies for neighboring sub-intervals differ for a single state. We show how Smallwood’s result can be exploited to search for discount factor intervals for which a given policy is optimal by reducing it to polynomial root isolation. We extend the result to situations where the policy is suboptimal, but with a value function that is close to optimal. We develop numerical approaches to solve the discount factor elicitation problem and demonstrate the effectiveness of our algorithms through some case studies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agranov, M., Kim, J., Yariv, L.: Coordination with differential time preferences: experimental evidence. Technical report, National Bureau of Economic Research (2023)
Amit, R., Meir, R., Ciosek, K.: Discount factor as a regularizer in reinforcement learning. In: International Conference on Machine Learning, pp. 269–278. PMLR (2020)
Blackwell, D.: Discrete dynamic programming. Ann. Math. Stat. 33(2), 719–726 (1962)
Chen, B., Takahashi, S.: A folk theorem for repeated games with unequal discounting. Games Econ. Behav. 76(2), 571–581 (2012)
Collins, G.E., Akritas, A.G.: Polynomial real root isolation using Descarte’s rule of signs. In: Proceedings of the Third ACM Symposium on Symbolic and Algebraic Computation, SYMSAC 1976, pp. 272–275. Association for Computing Machinery, New York (1976). https://doi.org/10.1145/800205.806346
Faddeev, D.K., Faddeeva, V.N., Williams, R.C.: Computational methods of linear algebra (1963)
Filar, J., Vrieze, K.: Competitive Markov Decision Processes. Springer, New York (2012)
Fisher, I.: The theory of interest. N. Y. 43, 1–19 (1930)
François-Lavet, V., Fonteneau, R., Ernst, D.: How to discount deep reinforcement learning: towards new dynamic strategies. arXiv preprint arXiv:1512.02011 (2015)
Giwa, B.H., Lee, C.G.: A marginal log-likelihood approach for the estimation of discount factors of multiple experts in inverse reinforcement learning. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 7786–7791 (2021). https://doi.org/10.1109/IROS51168.2021.9636479
Gurvich, V., Miltersen, P.B.: On the computational complexity of solving stochastic mean-payoff games. CoRR abs/0812.0486 (2008). http://arxiv.org/abs/0812.0486
Hu, H., Yang, Y., Zhao, Q., Zhang, C.: On the role of discount factor in offline reinforcement learning. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., Sabato, S. (eds.) Proceedings of the 39th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 162, pp. 9072–9098. PMLR (2022)
Jackson, M.O., Yariv, L.: Collective dynamic choice: the necessity of time inconsistency. Am. Econ. J. Microeconomics 7(4), 150–178 (2015)
Lehrer, E., Pauzner, A.: Repeated games with differential time preferences. Econometrica 67(2), 393–412 (1999)
Lehrer, E., Solan, E., Solan, O.N.: The value functions of Markov decision processes. Oper. Res. Lett. 44(5), 587–591 (2016)
Littman, M.L., Topcu, U., Fu, J., Isbell, C., Wen, M., MacGlashan, J.: Environment-independent task specifications via GLTL. arXiv preprint arXiv:1704.04341 (2017)
Mischel, W., Ebbesen, E.B., Raskoff Zeiss, A.: Cognitive and attentional mechanisms in delay of gratification. J. Pers. Soc. Psychol. 21(2), 204 (1972)
Mnih, V., et al.: Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)
Patek, S.D., Bertsekas, D.P.: Stochastic shortest path games. SIAM J. Control. Optim. 37(3), 804–824 (1999)
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John, Hoboken (2014)
Roy, M.F., Basu, S., Pollack, R.: Algorithms in Real Algebraic Geometry. Algorithms and Computation in Mathematics, vol. 10. Springer, Heidelberg (2006)
Sagraloff, M., Mehlhorn, K.: Computing real roots of real polynomials. J. Symb. Comput. 73, 46–86 (2016). https://doi.org/10.1016/j.jsc.2015.03.004. https://www.sciencedirect.com/science/article/pii/S0747717115000292
Smallwood, R.D.: Optimum policy regions for Markov processes with discounting. Oper. Res. 14(4), 658–669 (1966)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)
Tessler, C.: Deep reinforcement learning works - now what? (2020). https://tesslerc.github.io/posts/drl_works_now_what/
Vinyals, O., et al.: Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575(7782), 350–354 (2019)
Wilf, H.S.: Mathematics for the Physical Sciences. Courier Corporation (2013)
Acknowledgement
We would like to thank the anonymous reviewers for their insightful comments. This work was supported in part by the NSF CAREER Award CCF-2146563, and the NSF IUCRC Center for Autonomous Air Mobility and Sensing (CAAMS).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Tasdighi Kalat, S., Sankaranarayanan, S., Trivedi, A. (2024). What is Your Discount Factor?. In: Hillston, J., Soudjani, S., Waga, M. (eds) Quantitative Evaluation of Systems and Formal Modeling and Analysis of Timed Systems. QEST+FORMATS 2024. Lecture Notes in Computer Science, vol 14996. Springer, Cham. https://doi.org/10.1007/978-3-031-68416-6_19
Download citation
DOI: https://doi.org/10.1007/978-3-031-68416-6_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-68415-9
Online ISBN: 978-3-031-68416-6
eBook Packages: Computer ScienceComputer Science (R0)