资源论文Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control

Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control

2020-03-06 | |  79 |   47 |   0

Abstract

Cumulative prospect theory (CPT) is known to model human decisions well, with substantial empirical evidence supporting this claim. CPT works by distorting probabilities and is more general than the classic expected utility and coherent risk measures. We bring this idea to a risksensitive reinforcement learning (RL) setting and design algorithms for both estimation and control. The RL setting presents two particular challenges when CPT is applied: estimating the CPT objective requires estimations of the entire distribution of the value function and finding a randomized optimal policy. The estimation scheme that we propose uses the empirical distribution to estimate the CPT-value of a random variable. We then use this scheme in the inner loop of a CPT-value optimization procedure that is based on the well-known simulation optimization idea of simultaneous perturbation stochastic approximation (SPSA). We provide theoretical convergence guarantees for all the proposed algorithms and also illustrate the usefulness of CPT-based criteria in a traffic signal control application. Proceedings of the 33 rd International Conference on MachLearning, New York, NY, USA, 2016. JMLR: W&CP volume 48. Copyright 2016 by the author(s).

上一篇:Discriminative Embeddings of Latent Variable Models for Structured Data

下一篇:Training Deep Neural Networks via Direct Loss Minimization

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...