Abstract
We introduce an interactive learning framework for the
development and testing of intelligent visual systems, called
learning-by-asking (LBA). We explore LBA in context of the
Visual Question Answering (VQA) task. LBA differs from
standard VQA training in that most questions are not observed during training time, and the learner must ask questions it wants answers to. Thus, LBA more closely mimics natural learning and has the potential to be more dataefficient than the traditional VQA setting. We present a
model that performs LBA on the CLEVR dataset, and show
that it automatically discovers an easy-to-hard curriculum
when learning interactively from an oracle. Our LBA generated data consistently matches or outperforms the CLEVR
train data and is more sample efficient. We also show that
our model asks questions that generalize to state-of-the-art
VQA models and to novel test time distributions