Abstract. We present a novel regularization approach to train neural
networks that enjoys better generalization and test error than standard
stochastic gradient descent. Our approach is based on the principles of
cross-validation, where a validation set is used to limit the model over-
fitting. We formulate such principles as a bilevel optimization problem.
This formulation allows us to define the optimization of a cost on the
validation set subject to another optimization on the training set. The
overfitting is controlled by introducing weights on each mini-batch in
the training set and by choosing their values so that they minimize the
error on the validation set. In practice, these weights define mini-batch
learning rates in a gradient descent update equation that favor gradients
with better generalization capabilities. Because of its simplicity, this approach can be integrated with other regularization methods and training
schemes. We evaluate extensively our proposed algorithm on several neural network architectures and datasets, and find that it consistently improves the generalization of the model, especially when labels are noisy