Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control

资源分类

2020-03-06 |

143 |

93 |

Abstract

Cumulative prospect theory (CPT) is known to model human decisions well, with substantial empirical evidence supporting this claim. CPT works by distorting probabilities and is more general than the classic expected utility and coherent risk measures. We bring this idea to a risksensitive reinforcement learning (RL) setting and design algorithms for both estimation and control. The RL setting presents two particular challenges when CPT is applied: estimating the CPT objective requires estimations of the entire distribution of the value function and finding a randomized optimal policy. The estimation scheme that we propose uses the empirical distribution to estimate the CPT-value of a random variable. We then use this scheme in the inner loop of a CPT-value optimization procedure that is based on the well-known simulation optimization idea of simultaneous perturbation stochastic approximation (SPSA). We provide theoretical convergence guarantees for all the proposed algorithms and also illustrate the usefulness of CPT-based criteria in a traffic signal control application. Proceedings of the 33 rd International Conference on MachLearning, New York, NY, USA, 2016. JMLR: W&CP volume 48. Copyright 2016 by the author(s).

上一篇：Discriminative Embeddings of Latent Variable Models for Structured Data

下一篇：Training Deep Neural Networks via Direct Loss Minimization

用户评价

全部评价

还没有评论，说两句吧！

热门资源

A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Learning to Predi...

Much of model-based reinforcement learning invo...
Joint Pose and Ex...

Facial expression recognition (FER) is a challe...
The Variational S...

Unlike traditional images which do not offer in...
Depth Super Resol...

We tackle the problem of jointly increasing the...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com