Environment Upgrade Reinforcement Learning for Non-differentiable Multi-stage Pipelines

资源分类

2019-10-14 |

87 |

42 |

Abstract Recent advances in multi-stage algorithms have shown great promise, but two important problems still remain. First of all, at inference time, information can’t feed back from downstream to upstream. Second, at training time, end-to-end training is not possible if the overall pipeline involves non-differentiable functions, and so different stages can’t be jointly optimized. In this paper, we propose a novel environment upgrade reinforcement learning framework to solve the feedback and joint optimization problems. Our framework re-links the downstream stage to the upstream stage by a reinforcement learning agent. While training the agent to improve fifinal performance by refifining the upstream stage’s output, we also upgrade the downstream stage (environment) according to the agent’s policy. In this way, agent policy and environment are jointly optimized. We propose a training algorithm for this framework to address the different training demands of agent and environment. Experiments on instance segmentation and human pose estimation demonstrate the effectiveness of the proposed framework

上一篇：Distort-and-Recover: Color Enhancement using Deep Reinforcement Learning

下一篇：GraphBit: Bitwise Interaction Mining via Deep Reinforcement Learning

用户评价

全部评价

还没有评论，说两句吧！

热门资源

The Variational S...

Unlike traditional images which do not offer in...
Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Rating-Boosted La...

The performance of a recommendation system reli...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com