Abstract
Kernel-based stochastic factorization (KBSF) is an algorithm for solving reinforcement learning tasks with continuous state spaces which builds a Markov decision process (MDP) based on a set of sample transitions. What sets KBSF apart from other kernel-based approaches is the fact that the size of its MDP is independent of the number of transitions, which makes it possible to control the trade-off between the quality of the resulting approximation and the associated computational cost. However, KBSF’s memory usage grows linearly with the number of transitions, precluding its application in scenarios where a large amount of data must be processed. In this paper we show that it is possible to construct KBSF’s MDP in a fully incremental way, thus freeing the space complexity of this algorithm from its dependence on the number of sample transitions. The incremental version of KBSF is able to process an arbitrary amount of data, which results in a model-based reinforcement learning algorithm that can be used to solve continuous MDPs in both off-line and on-line regimes. We present theoretical results showing that KBSF can approximate the value function that would be computed by conventional kernel-based learning with arbitrary precision. We empirically demonstrate the effectiveness of the proposed algorithm in the challenging threepole balancing task, in which the ability to process a large number of transitions is crucial for success.