Using Contextual Bandits with Behavioral Constraints for Constrained Online Movie Recommendation

资源分类

2019-11-07 |

81 |

43 |

Abstract AI systems that learn through reward feedback about the actions they take are increasingly deployed in domains that have significant impact on our daily life. In many cases the rewards should not be the only guiding criteria, as there are additional constraints and/or priorities imposed by regulations, values, preferences, or ethical principles. We detail a novel online system, based on an extension of the contextual bandits framework, that learns a set of behavioral constraints by observation and uses these constraints as a guide when making decisions in an online setting while still being reactive to reward feedback. In addition, our system can highlight features of the context which are more predicted to be more rewarding and/or are in line with the behavioral constraints. We demonstrate the system by building an interactive interface for an online movie recommendation agent and show that our system is able to act within a set of behavior constraints without significantly degrading overall performance.

上一篇：Grounded Language Learning: Where Robotics and NLP Meet* Cynthia Matuszek

下一篇：CISA: Chinese Information Structure Analysis for Scientific Writing with Cross-lingual Adversarial Learning

用户评价

全部评价

还没有评论，说两句吧！

热门资源

Stratified Strate...

In this paper we introduce Stratified Strategy ...
The Variational S...

Unlike traditional images which do not offer in...
Learning to learn...

The move from hand-designed features to learned...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Learning to Predi...

Much of model-based reinforcement learning invo...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com