Abstract
Deep Learning has had a transformative impact on Computer Vision, but for all of the success there is also a significant cost. This is that the models and procedures used are
so complex and intertwined that it is often impossible to distinguish the impact of the individual design and engineering choices each model embodies. This ambiguity diverts
progress in the field, and leads to a situation where developing a state-of-the-art model is as much an art as a science. As a step towards addressing this problem we present
a massive exploration of the effects of the myriad architectural and hyperparameter choices that must be made in generating a state-of-the-art model. The model is of particular
interest because it won the 2017 Visual Question Answering
Challenge. We provide a detailed analysis of the impact of
each choice on model performance, in the hope that it will
inform others in developing models, but also that it might
set a precedent that will accelerate scientific progress in the
field