The asymptotic behavior of undiscounted value iteration in Markov decision problems

Abstract

This paper considers undiscounted Markov Decision Problems. For the general multichain case, we obtain necessary and sufficient conditions which guarantee that the maximal total expected reward for a planning horizon of n epochs minus n times the long run average expected reward has a finite limit as n approaches infinity for each initial state and each final reward vector. In addition, we obtain a characterization of the chain and periodicity structure of the set of one-step and J-step maximal gain policies. Finally, we discuss the asymptotic properties of the undiscounted value-iteration method.

Authors: Paul Schweitzer and Awi Federgruen

Format: Journal Article

Publication Date: November 1, 1977

Journal: Mathematics of Operations Research

Full Citation

Schweitzer, Paul and Awi Federgruen. “The asymptotic behavior of undiscounted value iteration in Markov decision problems.”

Mathematics of Operations Research

vol. 2, (November 01, 1977): 360-381.

The asymptotic behavior of undiscounted value iteration in Markov decision problems

Abstract

Full Citation

External CSS