INTRODUCTION
While a production process produces items, an operator or controller observes it over time and by the quality of the output classifies the process to be either in a good or in a bad state. At the beginning of each period the operator must make one of two decisions; do nothing (continue) and accept having defective products or renew (replace) the system and pay a fixed cost (halt). The process is stochastically deteriorating over time, i.e., during one period if the process is in the good state, there is a constant probability that it will be in the bad state during the next period. Decisions and state transitions are considered instantaneous. The objective is to maximize the expected discounted value of the total future profits. This model, which represents a partially observable Markov decision problem, has been discussed by many researchers and its applications in many areas can be found by Monahan (1982), Ross (1983), White (1988), Valdez-Flores and Feldman (1989), Scarf (1997) and Wang (2002).
We intuitively expect that the quality of the output in the good state to be higher than in the bad state. There are two popular ways of modeling this notion: (a) stochastic dominance, i.e., the quality of the output is stochastically higher in good state than in the bad state and (b) dominance in expectation, i.e., the expected quality of the output in the good state is higher than in the bad state. In the application of these models, it can be shown that the optimal policy initiates a maintenance (or a replacement) of the operating device if the degree of its deterioration is greater than or equal to a critical level. Such a policy is usually called control-limit policy (Kyriakidis and Dimitrakos, 2006). In other words, the optimal policy has a control limit and the optimal decision is continue if and only if the probability that the process is in the good state exceeds the control limit.
While Albright (1979), Bertsekas (1976), Lovejoy (1987) and White (1979) used the stochastic dominance condition in their modeling, Grosfeld-Nir (2007) showed that the dominance in expectation suffices for the optimality of a control limit policy; making the partially observable Markov decision problem more applicable.
Tagaras (1988) studied the joint process control and machine maintenance problem
of a Markovian deteriorating machine. Assuming that sampling and preventive
maintenance were performed at fixed intervals, he searched the best
control
chart limits, preventive maintenance interval and sampling interval to minimize
the time average maintenance and quality control related cost numerically.
Kuo (2006) studied the joint machine maintenance and product quality control
problem in which both the timing of the sampling action and the sample size
were directly included in the action space of the dynamic programming model
of the system. Unlike previous studies in this area, he did not impose a mandatory
fixed sample size and fixed sampling intervals on the system. Instead, he let
the dynamic programming mechanism dictated the best sample size and sampling
epoch based on the current state of the system. He derived some properties of
the objective function that minimized the expected total discounted system cost
in the value iteration algorithm of the dynamic programming model.
In many realworld decision-making problems (like the ones in which we either continue or halt a production process, replace or repair a specific machine, whether or not the available data comes from a certain probability distribution and so forth), first we divide all of the probable solution space into smaller subspaces (the solution is one of the subspaces), then we assign a probability measure to every subspace considering our experiences and finally based on the current situation we update the probabilities and make the decision. In production environments, one of the probability measures in a decision-making problem is the time between producing defective products. Assuming a certain probability distribution, based upon the information from a sample taken from the process at any time, we may estimate the parameters of this distribution. If the value of this parameter is less than a given threshold, we will halt the process.
PROBLEM STATEMENT
Assuming that the true state of a production process at any stage can be indirectly measured in term of number of defective items produced by the process (Sinuany-Stern et al., 1997) and that the time between defective products follows an exponential distribution, in this study, first we estimate the parameter of this distribution (λ as hazard rate) by a sequential decision-making framework. Then, if λ is less than a threshold, the production process continues and we accept to have a cost associated with producing defective products. Otherwise, we halt the process; accepting to pay the unsatisfied customers cost together with corrective maintenance costs.
In order to estimate the hazard rate (λ) at any stage of the sampling process, in this research, we propose a sequential decision-making framework for high-yield production processes such that not only the total costs of unsatisfied customers, defective products and the corrective maintenance will be minimized, but also the probability of making correct decision is maximized. In other words, at the beginning of each period we want to either continue the production or supply the customers demands based on the current process condition or to halt the process and do maintenance action while not being able to satisfy the demands.
In order to make the proposed method more realistic, we assume that λ is a random variable that follows a gamma distribution. Then based upon an objective function definition based on cost and risk we derive several properties of the optimal value function, which help us to find the optimal policy for a vendor to either satisfy the customer order or to halt the production process and consider corrective maintenance action to be taken in any period. This policy is derived based on a stochastic dynamic programming and Bayesian estimation approach that develops an optimal framework for the decision-making process at hand.
THE MODEL
In production processes, in cases where we are to decide between producing and not producing a batch, we are in stochastic state and we never can surely say that a batch should be produced or should not be produced. Since the stochastic state of the process may be dynamic, we may use the concept of the stochastic dynamic programming to model such problems.
Some researchers have developed sequential analysis inference in combination with optimal stopping problem to determine the probability of making correct decision. One of these researches is a new approach in probability distribution fitting of a given statistical data that Eshragh and Modarres (2001) named it Decision On Belief (DOB). In this decision-making method, a sequential analysis approach is employed to find the best underlying probability distribution of the observed data. Moreover, Eshragh and Niaki (2006) applied the DOB concept as a decision-making tool in response surface methodology. In this study, we use the concept of DOB to model the problem. However, before doing so, first we need to have some notations and definitions.
Notations and definitions: We will use the following notations and definitions in the rest of the study:
We illustrate an application of the proposed approach by specifying the distribution of the time to produce defective products as an exponential distribution with hazard rate λ.
Let ti denote the time between productions of (i-1)st and (i)th
defective products in a production cycle. During these failures if m defective
products are produced, to use a non-informative prior by assuming that parameters
of gamma converge to zero, i.e., the prior distribution of λ is gamma (0,0).
Then, using Bayesian inference, the posterior distribution of λ is also
gamma with parameters of m and (Nair et al., 2001). In other words:
where:
| f: |
The probability density function of λ. |
| R: |
Defined as the cost of halting production process (it includes cost of
not satisfying customer order and cost of maintenance actions). |
| C: |
The cost of having one defective product in an order. |
| Vn(λ): |
The cost associated with λ when there are n remaining stages to make
the decision. |
| Wn(λ): |
Defined as the probability of correct choice associated with λ when
there are n remaining stages to make the decision. |
| dn: |
The upper threshold for λ. If the hazard rate is more than dn,
then we halt the production process. |
: |
Defined as the lower threshold for λ. If the hazard rate is less
than d´n, we continue the production process. |
| δ1: |
The maximum acceptable level of the batch quality (Accepted Quality Level
(AQL)). |
| δ2: |
Defined as the minimum rejectable level of the batch quality (Lot Tolerance
Proportion Defective (LTPD)). |
| λ1: |
The maximum acceptable level of the hazard rate. |
| λ2: |
Defined as the minimum rejectable level of the hazard rate. |
| CS: |
Is the event of making the correct decision. |
| ε1: |
The size of type-one error in making a decision. |
| ε2: |
Defined as the size of type-two error in making a decision. |
| H: |
The default time to produce the product. |
| D: |
The total number of products in an order. |
Derivations: We may model described the decision-making problem as an
optimal stopping problem in which in each stage of the decision-making process
we take a sample from a batch and based on the information obtained from the
sample we want to decide whether to halt or to continue the production or take
more samples.
We mentioned that the hazard rate (λ) could be modeled as
.Hence, P(λ≥dn) shows the probability of halting a production
process and P(λ≤d´n) shows the probability of continuing
a production process. Then, by use of the total probability theorem [1-P(λ≥dn)-P(λ≤d´n)]
shows the probability of neither halting nor continuing and hence taking more
samples. We note that for the third probability not to be negative we need to
have dn≥d´n.
If we define n to be the index of the decision-making stage and λ to be
the state variable, then RP(λ≥dn) shows the cost when we
halt the production process, CHλP(λ≤d´n) represents
the cost when we continue the production process and α´Vn-1(λ)
shows the cost when we continue to the next stage. It is obvious that we need
the discount factor a´ to evaluate the cost of the next stage in the current
stage (according to the approach of stochastic dynamic programming). Hence,
we can define the stochastic dynamic equation of the cost as:
Then the cost associated with λ when there are n remaining stages to make
the decision is:
However, we defined CS as the event of correct decision, so we will have:
It is obvious that
Hence,
we have
Now we can define the stochastic dynamic equation of making the correct decision
as:
Since we are to minimize the objective function given in Eq.
3 and maximize the objective function of Eq. 5 simultaneously,
based on the ratio of the cost to (1-risk) criterion we combine these two equations
in one as:
It is obvious that this function should be minimized. In theorem 1, we will
show that the minimum value of Hn(λ) occurs at the boundary
limits of dn and d´n.
Theorem 1: The optimal value of Hn(λ) in Eq.
6 occur at the boundary limits of dn and d´n.
Proof: We take the first derivatives of Hn(λ) in Eq.
6 with respect to dn and d´n and set them both
equal to zeros. That is,
In other words:
As Eq. 7 and 8 share a unique left hand
side, their right hand sides must be equal. However, we notice that in general
the right hand sides cannot be equal. Hence, we conclude that at most one of
the derivatives can be equal to zero.
Assume the derivative in (a) is equal to zero, hence the equation in (b) is
not equal to zero and we conclude that the optimal values of d´n
is in its boundary limits. However, if we expand equation (7),
we will have:
This is contradiction, because, we showed that the optimal values of d´n
is in its boundary limits. Hence we can conclude that none of the derivatives
(7) and (8) is equal to zero and hence the optimal values of dn and
d´n are in their boundary limits. For the condition when the
derivative in (b) is equal to zero, the reasoning is similar.
In order to determine the boundary limits of dn and d´n,
we use the concept of the first and the second type errors. First type error
shows the probability of halting the production process when the hazard rate
of production process is acceptable and second type error is the probability
of continuing the production process when the hazard rate of production process
is not acceptable. Then on one hand if λ≤λ1, the probability
of halting the production process will be smaller than ε1 and
on the other hand, in cases where, λ≥λ2, the probability
of continuing the production process will be smaller than ε2.
Hence, as the mean of the gamma distribution is
,
for a process being in a good sate we have:
In this case, the probability of halting the production process (type-one error)
is
where, f(λ) is the probability density function of a Gamma distribution
with parameters of α and and F(dn) is t he cumulative probability
distribution function of λ evaluated at dn.
However, if we define th1 to be the boundary limit of dn,
F(dn) is an increasing function and we have:
Similarly, defining th2 to be the boundary limit of d´n,
for a production process being in a bad state we have:
In this case, the probability of continuing the production process (type-two
error) is
where, f(λ) is the probability density function of a gamma distribution
with parameters of α and and F(dn) is the cumulative probability
distribution function of λ evaluated at d´n.
Hence
Now, since the optimal values of dn and d´n are
in the boundary limits, in order to make the optimum decision, we can consider
the framework given in Eq. 11 to make a decision. In this
Eq.
is the mean of the gamma distribution for λ given in Eq.
1.
In order to identify the signs of the derivatives in Eq. 11,
we note that Wn(λ) and Vn(λ) are required to
be evaluated. Besides, to obtain these functions the values of dn
and d´n are needed. We showed that these values are at their
boundaries, resulting in four cases. These cases are different combinations
of the values for dn and d´n as dn =
∞, dn = th1, d´n = 0 and d´n
= th2. We evaluate the objective function given in Eq.
6 by these cases and pick the one with the lowest value. Then, we compare
the mean hazard rate,
with the optimum boundary points of the objective function and make the decision
based upon the framework given in (11).
In the decision-making framework given in (11), we note that whenever in a
given stage of the sampling process, the expected hazard rate is either less
than th1 or greater than th2 we continue sampling in the
next stage. Since this event occurs with a low probability, the probability
of not making the decision in stage n, as n becomes large goes to zero. In other
words, the proposed method eventually converges to make a decision.
In summary, we propose the following algorithm to solve the problem at hand:
THE SOLUTION ALGORITHM
According to what we derived in earlier section, the steps involved in the
solution algorithm are:
| • |
Based on the given values of the parameters α, β,
R, H, C, D, ε1, ε2, δ1 and
δ2, in the first stage, n = 1, we define Hn(λ)
using Eq. 6. |
| • |
Using Eq. 9 and 10 and by numerical
integrations, next we determine th1 and th2 as the
thresholds of d1 and d2, respectively. |
| • |
Knowing that the optimal value of Hn(λ)can only happen
in one of the four cases (d1 = ∞, d´1
= 0), (d1 = ∞, d´1 = th2),
(d1 = th1, d´1 = 0) and (d1
= th1, d´1 = th2), we evaluate Hn(λ)
at these points and pick the point with the minimum value of Hn(λ). |
| • |
We employ the framework given in (11) to make the decision at the stage
Hn-1(λ). If the optimal decision is to go to the next stage,
then we go to step 5. Else, we stop the decision making process. |
| • |
Set n = n + 1 and determine the optimal value of Hn-1(λ).
Then, go to step 1. |
We note that in order to evaluate the optimal value of Hn-1(λ)
in step 5, we need to calculate the optimal values of Hn-2(λ),
Hn-3(λ),... and H1(λ).
The flowchart given in Fig. 1 summarizes the steps involved
in the proposed algorithm.
Numerical example 1: In this example, the parameters are set such that the optimal decision is made in stage 2 of the decision-making framework. Suppose α = 5, β = 80, R = 100, H = 1000, C = 1, D = 1000, ε1 = 0.05, ε2 = 0.1, δ1 = 0.04 and δ2 = 0.1. Knowing that λ ∼ Γ(α = 5, β = 80), in the first step of the algorithm we define:
In the second step, using Eq. 9 and 10
for the production process to be in good and bad states, respectively, we have:
which are numerically evaluated for th1 = 0.091 and th2
= 0.048.
Then in the third step of the algorithm, we evaluate the objective function for different possible boundary values of d1,d´1 and then choose d1 and d´1 that minimizes the objective function, i.e:
Hence, the optimum values for d1 and d´1 are d1
= ∞, d´1 = 0.048.
In the fourth step of the solution algorithm, since the expected hazard rate
is equal to the mean of the Gamma distribution with parameters α = 5, β
= 80, that is 0.0625, we are in state (1-a) of the decision tree in Eq.
11 and should continue to the next stage.
In stage n = 2, assume α = 1 and β = 280. According to the solution algorithm, first we should determine
In the second step, using Eq. 9 and 10
for good and bad states we have
Then in the third step of the algorithm, we evaluate the objective function
for different possible boundary values of d1,d´1
and then choose d1 and d´1 that minimizes the objective
function, i.e.:
 |
| Fig. 1: |
The flowchart of the proposed algorithm |
Hence, the optimum values for d1 and d´1 are d1
= ∞, d´1 = 0.0622, V1(0.05454) = 39.76 and
W1(0.0454) = 0.337, which enables us to calculate the values of V2(0.0454),
W2(0.0454).
Then in the fourth step of the algorithm, we evaluate the objective function for different possible boundary values of d2,d´2 and then choose d2 and d´2 that minimizes the objective function, i.e.:
In the fifth step of the solution algorithm, since the expected hazard rate
is equal to the mean of the Gamma distribution with parameters α = 10,
that is 0.0454, we are in state (1-a) of the decision tree in Eq.
11 and should continue the production process.
Numerical example 2: In this example, the optimal solution is to halt the production process at the first stage of the sampling process. Suppose α = 5, β = 40 and other parameters are the same as the numerical example 1.
In the second step, using Eq. 9 and 10
for a good and a bad process, respectively, we obtain
Then, in the third step of the algorithm, we evaluate the objective function
for different possible boundary values of d1,d´1
and then choose d1 and d´1 that minimizes the objective
function, i.e:
Hence, the optimum values for d1 and d´1 are d1
= 0.091, d´1 = 0.
In the fourth step of the solution algorithm, since the expected hazard rate
is equal to the mean of the Gamma distribution with parameters α = 5, β
= 40, that is 0.125, we are in state (4-a) of the decision tree in Eq.
11 and should halt the production process and do maintenance action.
An error study: Here, we investigate the performance of the proposed method in terms of type-one and type-two error. To do this, let us consider the simplest case where we have only one stage for the decision-making process (n = 1). The objective function for this stage is:
It can be easily shown that to minimize
either
or
should
be equal to zero. To prove it, assume.
If in the minimum value of
, both
and
are
more than zero, then we have
which is contradiction. Hence, we should only consider two cases:
Case 1: 
Case 2: 
For given values of R = 100, C = 1, D = 1000, a defective production rate of
0.05, type-two error = 0.1, λ1 = 0.05 and λ2
= 0.1, based on the information from some simulated samples of different sizes,
suppose we want to estimate the size of type-one error of the proposed method.
Table 1 shows the results of this estimation.
The first column of Table 1 shows samples with different
sizes taken for evaluation. The second column is the threshold value of the
hazard rate of the production process. It means that if the hazard rate (defective
product rate) is less than this threshold, we continue producing items. In column
three, the probability of the right decision (continuing the production process)
has been given. This probability has been calculated by Gamma distribution.
The results of Table 1 show that, as expected, the type-one
error associated with the performance of the proposed methodology decreases
as the sample size increases. However, as the cost increases with an increase
in the sample size, we need to determine the optimum value of the sample size.
Based on the information given in Table 1, we may estimate
the probability of making correct decision (accepting the batch and continue
the process) as a function of sample size. The regression function using excel
software is shown in Fig. 2, in which the coefficient of determination
is 0.8356.
If C1 denotes the fixed cost associated with a sample of size one, R denotes the cost of halting the process, then using the regression function y = 0.0243ln(x) + 0.8989 and y denotes the probability of correct decision, hence the cost function of taking a sample of size x is:
 |
| Fig. 2: |
Regression function for the probability of correct decision |
| Table 1: |
Type-one error estimation of the proposed method for different
samples |
 |
| Table 2: |
Type-two error estimation of the proposed method for different
samples |
 |
which has its minimum value at
In order to estimate the type-two error of the proposed method, let a process
to have a defective product rate of 0.1. Then, the probability of making a wrong
decision (continuing the production process) has been calculated based upon
different sample sizes and is given in Table 2. The results
of Table 2 indicate that as the sample size increases the
probability of accepting a batch with a wrong defective rate decreases; implying
a similar trade-off between the costs of sampling and the probability of correct
decision.
The evaluation study for the cases in which the decision is made in stage n>1 can be performed in a similar way.
CONCLUSIONS
In this research, we applied Bayesian inference and stochastic dynamic programming to model a decision-making problem in production environments in which we observe the time between breakdowns based on the produced defective items. Assuming the time between breakdowns follow an exponential distribution with parameter λ, to estimate λ at any stage of the sampling process, we proposed a sequential decision-making framework such that not only the total costs of unsatisfied customers, defective products and the corrective maintenance will be minimized, but also the probability of making correct decision is maximized. In order to demonstrate the application of the proposed framework, we provided two numerical examples.
For further research, we propose either to consider some other objective functions or to employ other functions to model the state of the system. Moreover, we can employ other functions to model the probability of correct choice when the production process is accepted or rejected.