Safe Exploration for Optimization with Gaussian Processes

资源分类

2020-03-05 |

71 |

51 |

Abstract

We consider sequential decision problems under uncertainty, where we seek to optimize an unknown function from noisy samples. This requires balancing exploration (learning about the objective) and exploitation (localizing the maximum), a problem well-studied in the multiarmed bandit literature. In many applications, however, we require that the sampled function values exceed some prespecified “safety” threshold, a requirement that existing algorithms fail to meet. Examples include medical applications where patient comfort must be guaranteed, recommender systems aiming to avoid user dissatisfaction, and robotic control, where one seeks to avoid controls causing physical harm to the platform. We tackle this novel, yet rich, set of problems under the assumption that the unknown function satisfies regularity conditions expressed via a Gaussian process prior. We develop an efficient algorithm called S AFE O PT, and theoretically guarantee its convergence to a natural notion of optimum reachable under safety constraints. We evaluate S AFE O PT on synthetic data, as well as two real applications: movie recommendation, and therapeutic spinal cord stimulation. Proceedings of the 32 nd International Conference on MachLearning, Lille, France, 2015. JMLR: W&CP volume 37. Copyright 2015 by the author(s).

上一篇：How Can Deep Rectifier Networks Achieve Linear Separability and Preserve Distances?

下一篇：Mind the duality gap: safer rules for the Lasso

用户评价

全部评价

还没有评论，说两句吧！

热门资源

The Variational S...

Unlike traditional images which do not offer in...
Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Rating-Boosted La...

The performance of a recommendation system reli...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com