Stabilizing policy improvement for large-scale infinite-horizon dynamic programming

Show simple item record

dc.contributor.author O'Sullivan, Michael en
dc.contributor.author Saunders, MA en
dc.date.accessioned 2011-12-12T23:36:44Z en
dc.date.issued 2006 en
dc.identifier.citation Stabilizing policy improvement for large-scale infinite-horizon dynamic programming. Report SOL 2006-1. 2006. Stanford University. 1-19 en
dc.identifier.uri http://hdl.handle.net/2292/9991 en
dc.description.abstract Today's focus on sustainability within industry presents a modeling challenge that may be dealt with using dynamic programming over an infinite time horizon. However, the curse of dimensionality often results in a large number of states in these models. These large-scale models require numerically stable solution methods. The best method for infinite-horizon dynamic programming depends on both the optimality concept considered and the nature of transitions in the system. Previous research uses policy improvement to find strong-present-value optimal policies within normalized systems. A critical step in policy improvement is the calculation of coefficients for the Laurent expansion of the present-value for a given policy. Policy improvement uses these coefficients to search for improvements of that policy. The system of linear equations that yields the coefficients will often be rank-deficient, so a specialized solution method for large singular systems is essential. We focus on implementing policy improvement for systems with substochastic classes (a subset of normalized systems). We present methods for calculating the present-value Laurent expansion coefficients of a policy with substochastic classes. Classifying the states allows for a decomposition of the linear system into a number of smaller linear systems. Each smaller linear system has full rank or is rank-deficient by one. We show how to make repeated use of a rank-revealing LU factorization to solve the smaller systems. In the rank-deficient case, excellent numerical properties are obtained with an extension of Veinott's method [Ann. Math. Statist., 40 (1969), pp. 1635-1660] for substochastic systems. en
dc.publisher Stanford University en
dc.relation.ispartof Report SOL 2006-1 en
dc.rights Items in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated. Previously published items are made available in accordance with the copyright policy of the publisher. en
dc.rights.uri https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm en
dc.title Stabilizing policy improvement for large-scale infinite-horizon dynamic programming en
dc.type Report en
dc.identifier.doi 10.1137/060653305 en
pubs.begin-page 1 en
dc.rights.holder Copyright: Stanford University en
pubs.author-url http://dl.acm.org/citation.cfm?id=1654920 en
pubs.commissioning-body Systems Optimization Laboratory, Management Science and Engineering Department en
pubs.end-page 19 en
dc.rights.accessrights http://purl.org/eprint/accessRights/RestrictedAccess en
pubs.subtype Technical Report en
pubs.elements-id 22611 en
dc.relation.isnodouble 17007 *
pubs.org-id Engineering en
pubs.org-id Engineering Science en
pubs.record-created-at-source-date 2010-09-01 en


Files in this item

Find Full text

This item appears in the following Collection(s)

Show simple item record

Share

Search ResearchSpace


Browse

Statistics