Batch-Switching Policy Iteration

资源分类

2019-11-22 |

142 |

122 |

Abstract Policy Iteration (PI) is a widely-used family of algorithms for computing an optimal policy for a given Markov Decision Problem (MDP). Starting with an arbitrary initial policy, PI repeatedly updates to a dominating policy until an optimal policy is found. The update step involves switching the actions corresponding to a set of “improvable” states, which are easily identified. Whereas progress is guaranteed even if just one improvable state is switched at every step, the canonical variant of PI, attributed to Howard [1960], switches every improvable state in order to obtain the next iterate. For MDPs with n states and 2 actions per state, the tightest known bound on the complexity of Howard’s PI is O(2n /n) iterations. To date, the tightest bound known across all variants of PI is O(1.7172n ) expected iterations for a randomised variant introduced by Mansour and Singh [1999]. We introduce Batch-Switching Policy Iteration (BSPI), a family of deterministic PI algorithms that switches states in “batches”, taking the batch size b as a parameter. By varying b, BSPI interpolates between Howard’s PI and another previously-studied variant called Simple PI [Melekopoglou and Condon, 1994]. Our main contribution is a bound of O(1.6479n ) on the number of iterations taken by an instance of BSPI. We believe this is the tightest bound shown yet for any variant of PI. We also present experimental results that suggest Howard’s PI might itself enjoy an even tighter bound.

上一篇：Anticipatory Troubleshooting

下一篇：In Search of Tractability for Partial Satisfaction Planning

用户评价

全部评价

还没有评论，说两句吧！

热门资源

Regularizing RNNs...

Recently, caption generation with an encoder-de...
Deep Cross-media ...

Cross-media retrieval is a research hotspot in ...
Compact MDDs for ...

Pseudo-Boolean (PB) constraints are usually en...
Supervised Descen...

Many computer vision problems (e.
Predictive-Correc...

While deep feature learning has revolutionized ...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com