Abstract
Mini-batch gradient descent based methods are the
de facto algorithms for training neural network architectures today. We introduce a mini-batch selection strategy based on submodular function maximization. Our novel submodular formulation captures the informativeness of each sample and diversity of the whole subset. We design an efficient,
greedy algorithm which can give high-quality solutions to this NP-hard combinatorial optimization
problem. Our extensive experiments on standard
datasets show that the deep models trained using
the proposed batch selection strategy provide better
generalization than Stochastic Gradient Descent as
well as a popular baseline sampling strategy across
different learning rates, batch sizes, and distance
metrics