Abstract
Zero-shot learning (ZSL) aims to recognize unseen image categories by learning an embedding space between
image and semantic representations. For years, among existing works, it has been the center task to learn the proper
mapping matrices aligning the visual and semantic space,
whilst the importance to learn discriminative representations for ZSL is ignored. In this work, we retrospect existing
methods and demonstrate the necessity to learn discriminative representations for both visual and semantic instances
of ZSL. We propose an end-to-end network that is capable
of 1) automatically discovering discriminative regions by
a zoom network; and 2) learning discriminative semantic
representations in an augmented space introduced for both
user-defined and latent attributes. Our proposed method is
tested extensively on two challenging ZSL datasets, and the
experiment results show that the proposed method signifi-
cantly outperforms state-of-the-art methods