Abstract
The options framework provides a concrete way to
implement and reason about temporally extended
actions. Existing literature has demonstrated the
value of planning with options empirically, but
there is a lack of theoretical analysis formalizing
when planning with options is more efficient than
planning with primitive actions. We provide a general analysis of the convergence rate of a popular Approximate Value Iteration (AVI) algorithm
called Fitted Value Iteration (FVI) with options.
Our analysis reveals that longer duration options
and a pessimistic estimate of the value function
both lead to faster convergence. Furthermore, options can improve convergence even when they are
suboptimal and sparsely distributed throughout the
state space. Next we consider generating useful options for planning based on a subset of landmark
states. This suggests a new algorithm, Landmarkbased AVI (LAVI), that represents the value function only at landmark states. We analyze OFVI and
LAVI using the proposed landmark-based options
and compare the two algorithms. Our theoretical
and experimental results demonstrate that options
can play an important role in AVI by decreasing approximation error and inducing fast convergence