Abstract
Recommendations are commonly used to modify
user’s natural behavior, for example, increasing
product sales or the time spent on a website. This
results in a gap between the ultimate business objective and the classical setup where recommendations are optimized to be coherent with past user behavior. To bridge this gap, we propose a new learning setup for recommendation that optimizes for
the Incremental Treatment Effect (ITE) of the policy. We show this is equivalent to learning to predict recommendation outcomes under a fully random recommendation policy and propose a new domain adaptation algorithm that learns from logged
data containing outcomes from a biased recommendation policy and predicts recommendation outcomes according to behaviour under random exposure. We compare our method against state-of-theart factorization methods, in addition to new approaches of causal recommendation and show significant improvements