Abstract
We tackle image question answering (ImageQA) prob-lem by learning a convolutional neural network (CNN) witha dynamic parameter layer whose weights are determinedadaptively based on questions. For the adaptive parameterprediction, we employ a separate parameter prediction net-work, which consists of gated recurrent unit (GRU) takinga question as its input and a fully-connected layer gener-ating a set of candidate weights as its output. However, itis challenging to construct a parameter prediction networkfor a large number of parameters in the fully-connected dy-namic parameter layer of the CNN. We reduce the complex-ity of this problem by incorporating a hashing technique,where the candidate weights given by the parameter pre-diction network are selected using a predefined hash function to determine individual weights in the dynamic parameter layer. The proposed network—joint network with the CNN for ImageQA and the parameter prediction network—is trained end-to-end through back-propagation, where itsweights are initialized using a pre-trained CNN and GRU. The proposed algorithm illustrates the state-of-the-art performance on all available public ImageQA benchmarks.