Stabilizing policy improvement for large-scale infinite-horizon dynamic programming

O'Sullivan, Michael; Saunders, MA

dc.contributor.author	O'Sullivan, Michael	en
dc.contributor.author	Saunders, MA	en
dc.date.accessioned	2011-12-12T23:36:44Z	en
dc.date.issued	2006	en
dc.identifier.citation	Stabilizing policy improvement for large-scale infinite-horizon dynamic programming. Report SOL 2006-1. 2006. Stanford University. 1-19	en
dc.identifier.uri	http://hdl.handle.net/2292/9991	en
dc.description.abstract	Today's focus on sustainability within industry presents a modeling challenge that may be dealt with using dynamic programming over an infinite time horizon. However, the curse of dimensionality often results in a large number of states in these models. These large-scale models require numerically stable solution methods. The best method for infinite-horizon dynamic programming depends on both the optimality concept considered and the nature of transitions in the system. Previous research uses policy improvement to find strong-present-value optimal policies within normalized systems. A critical step in policy improvement is the calculation of coefficients for the Laurent expansion of the present-value for a given policy. Policy improvement uses these coefficients to search for improvements of that policy. The system of linear equations that yields the coefficients will often be rank-deficient, so a specialized solution method for large singular systems is essential. We focus on implementing policy improvement for systems with substochastic classes (a subset of normalized systems). We present methods for calculating the present-value Laurent expansion coefficients of a policy with substochastic classes. Classifying the states allows for a decomposition of the linear system into a number of smaller linear systems. Each smaller linear system has full rank or is rank-deficient by one. We show how to make repeated use of a rank-revealing LU factorization to solve the smaller systems. In the rank-deficient case, excellent numerical properties are obtained with an extension of Veinott's method [Ann. Math. Statist., 40 (1969), pp. 1635-1660] for substochastic systems.	en
dc.publisher	Stanford University	en
dc.relation.ispartof	Report SOL 2006-1	en
dc.rights	Items in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated. Previously published items are made available in accordance with the copyright policy of the publisher.	en
dc.rights.uri	https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm	en
dc.title	Stabilizing policy improvement for large-scale infinite-horizon dynamic programming	en
dc.type	Report	en
dc.identifier.doi	10.1137/060653305	en
pubs.begin-page	1	en
dc.rights.holder	Copyright: Stanford University	en
pubs.author-url	http://dl.acm.org/citation.cfm?id=1654920	en
pubs.commissioning-body	Systems Optimization Laboratory, Management Science and Engineering Department	en
pubs.end-page	19	en
dc.rights.accessrights	http://purl.org/eprint/accessRights/RestrictedAccess	en
pubs.subtype	Technical Report	en
pubs.elements-id	22611	en
dc.relation.isnodouble	17007	*
pubs.org-id	Engineering	en
pubs.org-id	Engineering Science	en
pubs.record-created-at-source-date	2010-09-01	en