Abstract The soundness and optimality of a plan depends on the correctness of the domain model. Specifying complete domain models can be diffificult when interactions between an agent and its environment are complex. We propose a model-based reinforcement learning (MBRL) approach to solve planning problems with unknown models. The model is learned incrementally over episodes using only experiences from the current episode which suits non-stationary environments. We introduce the novel concept of reliability as an intrinsic motivation for MBRL, and a method to learn from failure to prevent repeated instances of similar failures. Our motivation is to improve the learning effificiency and goaldirectedness of MBRL. We evaluate our work with experimental results for three planning domains