资源论文Hidden Parameter Markov Decision Processes: A Semiparametric Regression Approach for Discovering Latent Task Parametrizations

Hidden Parameter Markov Decision Processes: A Semiparametric Regression Approach for Discovering Latent Task Parametrizations

2019-11-22 | |  68 |   46 |   0
Abstract Control applications often feature tasks with similar, but not identical, dynamics. We introduce the Hidden Parameter Markov Decision Process (HiPMDP), a framework that parametrizes a family of related dynamical systems with a low-dimensional set of latent factors, and introduce a semiparametric regression approach for learning its structure from data. We show that a learned HiP-MDP rapidly identifies the dynamics of new task instances in several settings, flexibly adapting to task variation. Many control applications involve repeated encounters with domains that have similar, but not identical, dynamics. An agent that swings bats may encounter several bats with different weights or lengths, while an agent that manipulates cups may encounter cups with different amounts of liquid. An agent that drives cars may encounter many different cars, each with unique handling characteristics. In all of these scenarios, it makes little sense of the agento start afresh when it encounters a new bat, a new cup, or a new car. Exposure to a variety of related domains should correspond to faster and more reliable adaptation to a new in-stance of the same type of domain, via transfer learning. If anagent has already swung several bats, for example, we wouldhope that it could easily learn to swing a new bat. Why? Like many domains, swinging a bat has a low-dimensional representation that affects its dynamics in structured ways. The agent’s prior experience should allow it to both learn how to model related instances of a domain—such as via the bat’s length, which smoothly changes in the bat’s dynamics—and what specific model parameters (e.g., lengths) are likely. We introduce the Hidden Parameter Markov Decision Process (HiP-MDP) as a formalization of these types of domains, with two important features. First, we posit that there exist a bounded number of latent parameters that, if known, would fully specify the dynamics of each individual task. Second, we assume that the parameter values remain fixed for a task’s duration (e.g. the bat’s length will not change during a swing), and the agent will know when a change has occurred (e.g. getting a new bat). The HiP-MDP parameters encode the minimum learning ? Both authors are primary authors.

上一篇:Learning Higher-Order Logic Programs through Abstraction and Invention

下一篇:EBEK: Exemplar-Based Kernel Preserving Embedding

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...