Abstract
Zero-shot learning (ZSL) aims to build models to
recognize novel visual categories that have no associated labelled training samples. The basic framework is to transfer knowledge from seen classes
to unseen classes by learning the visual-semantic
embedding. However, most of approaches do not
preserve the underlying sub-manifold of samples
in the embedding space. In addition, whether the
mapping can precisely reconstruct the original visual feature is not investigated in-depth. In order to
solve these problems, we formulate a novel framework named Graph and Autoencoder Based Feature Extraction (GAFE) to seek a low-rank mapping to preserve the sub-manifold of samples. Taking the encoder-decoder paradigm, the encoder part
learns a mapping from the visual feature to the semantic space, while decoder part reconstructs the
original features with the learned mapping. In addition, a graph is constructed to guarantee the learned
mapping can preserve the local intrinsic structure
of the data. To this end, an L21 norm sparsity constraint is imposed on the mapping to identify features relevant to the target domain. Extensive experiments on five attribute datasets demonstrate the
effectiveness of the proposed model