Abstract
Visual question answering (VQA) and visual question
generation (VQG) are two trending topics in the computer
vision, but they are usually explored separately despite their
intrinsic complementary relationship. In this paper, we propose an end-to-end unified model, the Invertible Question
Answering Network (iQAN), to introduce question generation as a dual task of question answering to improve the
VQA performance. With our proposed invertible bilinear fusion module and parameter sharing scheme, our iQAN can
accomplish VQA and its dual task VQG simultaneously. By
jointly trained on two tasks with our proposed dual regularizers (termed as Dual Training), our model has a better
understanding of the interactions among images, questions
and answers. After training, iQAN can take either question or answer as input, and output the counterpart. Evaluated on the CLEVR and VQA2 datasets, our iQAN improves
the top-1 accuracy of the prior art MUTAN VQA method
by 1.33% and 0.88% (absolute increase) respectiely. We
also show that our proposed dual training framework can
consistently improve model performances of many popular
VQA architectures