资源论文Stick-Breaking Policy Learning in Dec-POMDPs

Stick-Breaking Policy Learning in Dec-POMDPs

2019-11-19 | |  64 |   52 |   0
Abstract Expectation maximization (EM) has recently been shown to be an efficient algorithm for learning finite-state controllers (FSCs) in large decentralized POMDPs (Dec-POMDPs). However, current methods use fixed-size FSCs and often converge to maxima that are far from the optimal value. This paper represents the local policy of each agent using variable-sized FSCs that are constructed using a stick-breaking prior, leading to a new framework called decentralized stick-breaking policy representation (Dec-SBPR). This approach learns the controller parameters with a variational Bayesian algorithm without having to assume that the DecPOMDP model is available. The performance of Dec-SBPR is demonstrated on several benchmark problems, showing that the algorithm scales to large problems while outperforming other state-of-the-art methods.

上一篇:Optimization of Probabilistic Argumentation with Markov Decision Models

下一篇:Structure in Dichotomous Preferences

用户评价
全部评价

热门资源

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • Joint Pose and Ex...

    Facial expression recognition (FER) is a challe...

  • dynamical system ...

    allows to preform manipulations of heavy or bul...