We show that the -return target used in the TD() family of algorithms is the maximum likelihood estimator for a specific model of how the variance of an nstep return estimate increases with n. We introduce the eturn estimator, an alternative target based on a more accurate model of variance, which defines the family of complex-backup temporal difference learning algorithms. We derive the -return equivalent of the original algorithm, which eliminates theparameter but can only perform updates at the end of an episode and requires time and space proportional to the episode length. We then derive a second algorithm, with a capacity parameter requires C times more time and memory than and is incremental and online. We show that outperforms for any setting of on 4 out of 5 benchmark domains, and that performs as well as or better than for intermediate settings of C.