Abstract:
Markov decision processes (MDPs) are applied as a stan- dard model in Arti cial Intelligence planning. MDPs are used to con- struct optimal or near optimal policies or plans. One area that is often missing from discussions of planning under stochastic environment is how MDPs handle safety constraints expressed as probability of reach- ing threat states. We introduce a method for nding a value optimal policy satisfying the safety constraint, and report on the validity and effectiveness of our method through a set of experiments.