Mutually Dependent Markov Decision Processes

Toshiharu Fujita; Akifumi Kira

doi:10.20965/jaciii.2014.p0992

single-jc.php

« previous

JACIII Vol.18 No.6 pp. 992-998

doi: 10.20965/jaciii.2014.p0992

(2014)

Paper:

Views over last 60 days: 745

Mutually Dependent Markov Decision Processes

Toshiharu Fujita^* and Akifumi Kira^**

^*Graduate School of Engineering, Kyushu Institute of Technology, 1-1 Sensui-cho, Tobata, Kitakyushu 804-8550, Japan

^**Graduate School of Economics and Management, Tohoku University, 27-1 Kawauchi, Aoba-ku, Sendai 980-8576, Japan

Received:

February 20, 2014

Accepted:

June 10, 2014

Published:

November 20, 2014

Keywords:

markov decision process, dynamic programming, nonserial system, additive reward

Abstract

In this paper, we introduce a basic framework for mutually dependent Markov decision processes (MDMDP) showing recursive mutual dependence. Our model is structured upon two types of finite-stage Markov decision processes. At each stage, the reward in one process is given by the optimal value of the alternative process problem, whose initial state is determined by the current state and decision in the original process. We formulate the MDMDP model and derive mutually dependent recursive equations by dynamic programming. Furthermore, MDMDP is illustrated in a numerical example. The model enables easier treatment of some classes of complex multi-stage decision processes.

Cite this article as:

T. Fujita and A. Kira, “Mutually Dependent Markov Decision Processes,” J. Adv. Comput. Intell. Intell. Inform., Vol.18 No.6, pp. 992-998, 2014.

Data files:

References

[1] R.E. Bellman, “Dynamic Programming,” Princeton Univ. Press, NJ, 1957.
[2] R. A. Howard, “Dynamic Programming and Markov Processes,” MITPress, Cambridge, Mass., 1960.
[3] G. L. Nemhauser, “Introduction to Dynamic Programming,” Wiley, NY, 1966.
[4] D.P. Bertsekas, “Dynamic Programming and Stochastic Control,” Academic Press, NY, 1976.
[5] D.J. White, “Dynamic Programming,” Holden-Day, San Francisco, Calif., 1969.
[6] M. L. Puterman, “Markov Decision Processes : discrete stochastic dynamic programming,” Wiley & Sons, NY, 1994.
[7] M. Sniedovich, “Dynamic Programming,” Marcel Dekker, Inc. NY, 1992.
[8] S. Iwamoto and T. Fujita, “Stochastic decision-making in a fuzzy environment,” J. of Operations Research Society of Japan, Vol.38, pp. 467-482, 1995.
[9] S. Iwamoto, T. Ueno and T. Fujita, “Controlled Markov Chains with Utility Functions,” Eds. H. Zhenting, J. A. Filar and A. Chen, Markov Processes and Controlled Markov Chains, Chap.8, pp. 135-148, Kluwer, 2002.
[10] T. Fujita, “An optimal path problem with fuzzy expectation,” Proc. of the 8th Bellman Continuum, pp. 191-195, 2000.
[11] T. Fujita, “An optimal path problem with threshold probability,” Frontiers in Artificial Intelligence and Applications, Vol.69, pp. 792-796, 2001.
[12] T. Fujita, “On nondeterministic dynamic programming,” Mathematical Analysis in Economics (Kyoto, 2005), RIMS Kôkyûroku No.1488, pp. 15-24, 2006.
[13] T. Fujita, “Infinite stage nondeterministic stopped decision processes and maximum linear equation,” Bulletin of the Kyushu Institute of Technology, Vol.55, pp. 15-24, 2008.
[14] T. Fujita, “Deterministic decision process under range constraint through all stages,” Proc. of Modeling Decisions for Artificial Intelligence 2008 (CD-ROM), pp. 60-70, 2008.
[15] T. Fujita, and K. Tsurusaki, “Stochastic optimization of multiplicative functions with negative value,” J. of Operations Research Society of Japan, Vol.41, No.3, pp. 351-373, 1998.
[16] A. Kira, T. Ueno and T. Fujita “Threshold probability of nonterminal type in finite horizon Markov decision processes,” J. of Mathematical Analysis and Applications, Vol.386, pp. 461-472, 2012.
[17] T. Fujita, “Re-examination of Markov policies for additive decision process,” Bulletin of Informatics Cybernetics, Vol.29, No.1, pp. 51-65, 1997.
[18] T. Fujita, “On policy classes in dynamic programming theory,” Proc. of the 9th Bellman Continuum Int. Workshop on Uncertain System and Soft Computing, Series of Information & Management Sciences Vol.2, pp. 39-43, 2002.
[19] U. Bertelé and F. Brioschi, “Nonserial Dynamic Programming,” Academic Press, NY, 1972.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] R.E. Bellman, “Dynamic Programming,” Princeton Univ. Press, NJ, 1957.

[2] [2] R. A. Howard, “Dynamic Programming and Markov Processes,” MITPress, Cambridge, Mass., 1960.

[3] [3] G. L. Nemhauser, “Introduction to Dynamic Programming,” Wiley, NY, 1966.

[4] [4] D.P. Bertsekas, “Dynamic Programming and Stochastic Control,” Academic Press, NY, 1976.

[5] [5] D.J. White, “Dynamic Programming,” Holden-Day, San Francisco, Calif., 1969.

[6] [6] M. L. Puterman, “Markov Decision Processes : discrete stochastic dynamic programming,” Wiley & Sons, NY, 1994.

[7] [7] M. Sniedovich, “Dynamic Programming,” Marcel Dekker, Inc. NY, 1992.

[8] [8] S. Iwamoto and T. Fujita, “Stochastic decision-making in a fuzzy environment,” J. of Operations Research Society of Japan, Vol.38, pp. 467-482, 1995.

[9] [9] S. Iwamoto, T. Ueno and T. Fujita, “Controlled Markov Chains with Utility Functions,” Eds. H. Zhenting, J. A. Filar and A. Chen, Markov Processes and Controlled Markov Chains, Chap.8, pp. 135-148, Kluwer, 2002.

[10] [10] T. Fujita, “An optimal path problem with fuzzy expectation,” Proc. of the 8th Bellman Continuum, pp. 191-195, 2000.

[11] [11] T. Fujita, “An optimal path problem with threshold probability,” Frontiers in Artificial Intelligence and Applications, Vol.69, pp. 792-796, 2001.

[12] [12] T. Fujita, “On nondeterministic dynamic programming,” Mathematical Analysis in Economics (Kyoto, 2005), RIMS Kôkyûroku No.1488, pp. 15-24, 2006.

[13] [13] T. Fujita, “Infinite stage nondeterministic stopped decision processes and maximum linear equation,” Bulletin of the Kyushu Institute of Technology, Vol.55, pp. 15-24, 2008.

[14] [14] T. Fujita, “Deterministic decision process under range constraint through all stages,” Proc. of Modeling Decisions for Artificial Intelligence 2008 (CD-ROM), pp. 60-70, 2008.

[15] [15] T. Fujita, and K. Tsurusaki, “Stochastic optimization of multiplicative functions with negative value,” J. of Operations Research Society of Japan, Vol.41, No.3, pp. 351-373, 1998.

[16] [16] A. Kira, T. Ueno and T. Fujita “Threshold probability of nonterminal type in finite horizon Markov decision processes,” J. of Mathematical Analysis and Applications, Vol.386, pp. 461-472, 2012.

[17] [17] T. Fujita, “Re-examination of Markov policies for additive decision process,” Bulletin of Informatics Cybernetics, Vol.29, No.1, pp. 51-65, 1997.

[18] [18] T. Fujita, “On policy classes in dynamic programming theory,” Proc. of the 9th Bellman Continuum Int. Workshop on Uncertain System and Soft Computing, Series of Information & Management Sciences Vol.2, pp. 39-43, 2002.

[19] [19] U. Bertelé and F. Brioschi, “Nonserial Dynamic Programming,” Academic Press, NY, 1972.

Mutually Dependent Markov Decision Processes

Toshiharu Fujita* and Akifumi Kira**

Toshiharu Fujita^* and Akifumi Kira^**