Stabilizing Policy Improvement for Large-Scale Infinite-Horizon Dynamic Programming

O'Sullivan, Michael; Saunders, MA

Stabilizing Policy Improvement for Large-Scale Infinite-Horizon Dynamic Programming

O'Sullivan, Michael ; Saunders, MA

Identifier: http://hdl.handle.net/2292/13786

Issue Date: 2009

Reference: SIAM Journal on Matrix Analysis and Applications 31(2):434-459 2009

Rights: Copyright: Society for Industrial and Applied Mathematics

Rights (URI): https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm

Abstract:

Today’s focus on sustainability within industry presents a modeling challenge that may be dealt with using dynamic programming over an infinite time horizon. However, the curse of dimensionality often results in a large number of states in these models. These large-scale models require numerically stable solution methods. The best method for infinite-horizon dynamic programming depends on both the optimality concept considered and the nature of transitions in the system. Previous research uses policy improvement to find strong-present-value optimal policies within normalized systems. A critical step in policy improvement is the calculation of coefficients for the Laurent expansion of the present-value for a given policy. Policy improvement uses these coefficients to search for improvements of that policy. The system of linear equations that yields the coefficients will often be rank-deficient, so a specialized solution method for large singular systems is essential. We focus on implementing policy improvement for systems with substochastic classes (a subset of normalized systems). We present methods for calculating the present-value Laurent expansion coefficients of a policy with substochastic classes. Classifying the states allows for a decomposition of the linear system into a number of smaller linear systems. Each smaller linear system has full rank or is rank-deficient by one. We show how to make repeated use of a rank-revealing LU factorization to solve the smaller systems. In the rank-deficient case, excellent numerical properties are obtained with an extension of Veinott’s method [Ann. Math. Statist., 40 (1969), pp. 1635–1660] for substochastic systems.

Show full item record

Files in this item

There are no files associated with this item.

DOI: 10.1137/060653305

This item appears in the following Collection(s)

Journal Articles [23290]

Stabilizing Policy Improvement for Large-Scale Infinite-Horizon Dynamic Programming

Stabilizing Policy Improvement for Large-Scale Infinite-Horizon Dynamic Programming

Abstract:

Files in this item

This item appears in the following Collection(s)

Search ResearchSpace

Browse

All of ResearchSpace

This Collection

Statistics

Stabilizing Policy Improvement for Large-Scale Infinite-Horizon Dynamic Programming

Stabilizing Policy Improvement for Large-Scale Infinite-Horizon Dynamic Programming

Abstract:

Files in this item

This item appears in the following Collection(s)

Share

Search ResearchSpace

Browse

All of ResearchSpace

This Collection

Statistics