Abstract
We propose Stable Yet Memory Bounded OpenLoop (SYMBOL) planning, a general memory
bounded approach to partially observable openloop planning. SYMBOL maintains an adaptive
stack of Thompson Sampling bandits, whose size
is bounded by the planning horizon and can be automatically adapted according to the underlying domain without any prior domain knowledge beyond
a generative model. We empirically test SYMBOL in four large POMDP benchmark problems to
demonstrate its effectiveness and robustness w.r.t.
the choice of hyperparameters and evaluate its
adaptive memory consumption. We also compare
its performance with other open-loop planning algorithms and POMCP.