Abstract
Most existing zero-shot learning methods consider the
problem as a visual semantic embedding one. Given the
demonstrated capability of Generative Adversarial Networks(GANs) to generate images, we instead leverage
GANs to imagine unseen categories from text descriptions
and hence recognize novel classes with no examples being
seen. Specifically, we propose a simple yet effective generative model that takes as input noisy text descriptions about
an unseen class (e.g.Wikipedia articles) and generates synthesized visual features for this class. With added pseudo
data, zero-shot learning is naturally converted to a traditional classification problem. Additionally, to preserve the
inter-class discrimination of the generated features, a visual pivot regularization is proposed as an explicit supervision. Unlike previous methods using complex engineered
regularizers, our approach can suppress the noise well without additional regularization. Empirically, we show that
our method consistently outperforms the state of the art on
the largest available benchmarks on Text-based Zero-shot
Learning