Partially observed Markov decision processes with binomial observations
Introduction
A production process can be in either a “Good” or a “Bad” state. The states form a Markov chain: if the process is in the good state, while producing one unit (during one period), there is a constant probability that it will deteriorate to be in the bad state while producing the next unit (during the next period). Once the process enters the bad state it remains there. Units produced in either state may end up conforming or defective. The probability of obtaining a conforming unit in the good state is larger than that of obtaining a conforming unit in the bad state.
A controller observes the process periodically over time. The true state is unobservable and can only be inferred from the quality of the output. At the beginning of each period, the controller must select one of two actions: CONTINUE (CON: do nothing) or REPLACE (REP: renew the system for a fixed cost). The objective is to maximize the expected present value of total future profits.
The problem above represents a discrete-time partially observed Markov decision process (POMDP). The fundamental idea is to base actions upon the probability that the system is in the good state. This “good state probability”, also referred to as the “information state”, is updated periodically, using Bayes’ formula.
POMDPs provide a powerful probabilistic tool for decision making. Structural results have been studied for over thirty years; see [1], [5], [14], [17], [22]. For several computational procedures and algorithms, see [5], [13], [15], [18], [21], [24], [25], [27] and the references therein. The extension of the model to more than two states and more than two actions was discussed in [14], [16], [17], [19].
Several authors have proposed applications of POMDPs to machine maintenance, machine replacement, and quality control; among them are [2], who considered a lot-sizing problem with inspection and non-rigid demand, and [9], who considered a lot-sizing problem with inspection and rigid demand. Applications in other domains may be found in [17], [26], [23]. Givon and Grosfeld-Nir [6] provided an application for optimal control of TV shows. Lane [12] analyzed a POMDP application for fishermen. Aviv and Pazgal [3] studied a pricing problem faced by sellers of fashion-like goods. Hu et al. [10] explored control policies in medical drug therapy. Kaelbling et al. [11] considered navigation scenarios. Smallwood and Sondik [20] looked at object detection scenarios. Ben-Zvi and Nickerson [4] investigated intruder detection strategies.
This paper shows non-monotonic properties of POMDPs. (1) We analytically show how the finite-horizon control limits, as a function of the time remaining, are not necessarily monotonic; (2) we analytically show how the control limits, as a function of probability of obtaining a conforming unit, are not necessarily monotonic. In addition, we prove that the infinite-horizon control limit can be calculated by solving a finite set of linear equations.
Although the first two properties were suggested in previous studies (see, for example, [2], [7], [9]), only numerical examples were given, without providing any intuition to why that happens. The lack of analytical insight into these peculiar properties, and the complexity of the numerical calculations mentioned above, motivated our study. We accentuate that researchers obtaining such numerical results might be tempted not to trust their calculations, and thus analytical analysis is warranted.
Furthermore, the analytical formulas we provide can help test and compare existing POMDP algorithms. These formulas also make it easy to test the sensitivity of the problem parameters. In addition, we provide several numerical results. These results are innovative in that calculations do not involve complex programs and are easy to replicate.
Section snippets
Preliminaries
A production process can be in either a “Good” or a “Bad” state. The true state is unobservable and can only inferred from the quality of the output. Products are classified as either conforming or defective. The probability of obtaining a conforming unit in the Good (Bad) state is (). That is, let () denote that a unit produced in state is conforming (defective). Then, and . Naturally, we assume that .
The states are probabilistically
The model
We assume that the revenue during each period strictly increases in the quality of the output. We denote and the (per period) expected revenue in the good and the bad states, respectively; thus, . The objective is to maximize the expected present value of the total future profits.
Let be the number of remaining periods. We denote the expected present value of the total future profits if the current action is CONTINUE, the information state is , and all future actions are
“Bad in Bad” (BinB)
We refer to the problem with as “Bad-in-Bad” (BinB): all units produced in the bad state end up defective (bad). Hence, after detecting a conforming unit, it is certain that the state was good. Using , we have , , , and .
Thus, (7) becomes , .
Note that , and, for , , . That is, for small values of , , . This
Infinite horizon
Taking in (7), (8), we have
We refer to (9), (10) as the infinite-horizon optimality equations and to as the infinite-horizon value function. It follows from Theorem 1 and from Grosfeld-Nir [8] that is convex and strictly increasing; and that there is a unique control limit, , so that CON is optimal if and only if . The control limit, , is the root of .
Next, we show several cases where can be
Conclusions
The POMDP model uses recursive equations, making analytical insight hard to obtain. We provide analytical analysis demonstrating some of the peculiar properties of the model. Our analytical formulas are particularly useful in testing and comparing existing POMDP algorithms. These formulas also make it easy to test the sensitivity of the problem parameters.
The binomial observations model we analyzed is practical and valuable in its own right. Future research could use this model for application.
References (27)
- et al.
Using partially observed Markov processes to select optimal termination time of TV show
Omega.
(2008) Control limit for two-state partially observable Markov decision processes
European J. Oper. Res.
(2007)- et al.
Planning and acting in partially observable stochastic domains
Art. Intell.
(1998) - et al.
Markov decision processes
European J. Oper. Res.
(1989) Structural results for partially observable Markov decision processes
Oper. Res.
(1979)- et al.
An optimal lot-sizing and offline inspection policy in the case of nonrigid demand
Oper. Res.
(2006) - et al.
A partially observed Markov decision process for dynamic pricing
Manag. Sci.
(2005) - et al.
Intruder detection: an optimal decision analysis strategy
IEEE Trans. Syst. Man Cybern.
(2012) Dynamic Programming and Stochastic Control
(1976)A two state partially observable Markov decision process with uniformly distributed observations
Oper. Res.
(1996)
Production with rigid demand and costly inspection
Nav. Res. Logistics.
Comparison of some suboptimal control policies in medical drug therapy
Oper. Res.
A partially observable model of decision making by Fishermen
Oper. Res.
Cited by (9)
A lexicographic optimization approach for a bi-objective parallel-machine scheduling problem minimizing total quality loss and total tardiness
2023, Computers and Operations ResearchA two-state partially observable Markov decision process with three actions
2016, European Journal of Operational ResearchCitation Excerpt :Although the true state of the process may be unknown, it can be inferred from observing the process output. Many studies have addressed such scenarios (see, for example, Anily & Grosfeld-Nir, 2006; Ben-Zvi & Grosfeld-Nir, 2013; Ben-Zvi & Nickerson, 2012; Douer & Yechiali, 1994; Grosfeld-Nir, 2007), and assumed that the decision maker can take one of two actions (decisions): continue process (CON action) or adjust/repair the process and continue (REP action). The model proposed herein extends this stream of literature by making the following innovative contributions:
Optimal maintenance control of machine tools for energy efficient manufacturing
2019, International Journal of Advanced Manufacturing TechnologyPerformance Improvement of Time-Balance Radar Schedulers Through Decision Policies
2018, IEEE Transactions on Aerospace and Electronic Systems