Abstract
This paper considers undiscounted Markov Decision Problems. For the general multichain case, we obtain necessary and sufficient conditions which guarantee that the maximal total expected reward for a planning horizon of n epochs minus n times the long run average expected reward has a finite limit as n approaches infinity for each initial state and each final reward vector. In addition, we obtain a characterization of the chain and periodicity structure of the set of one-step and J-step maximal gain policies. Finally, we discuss the asymptotic properties of the undiscounted value-iteration method.
Full Citation
Schweitzer, Paul and Awi Federgruen. “The asymptotic behavior of undiscounted value iteration in Markov decision problems.”
Mathematics of Operations Research
vol. 2,
(November 01, 1977): 360-381.