Abstract
Zero-shot recognition aims to accurately recognize objects of unseen classes by using a shared visual-semantic
mapping between the image feature space and the semantic embedding space. This mapping is learned on training
data of seen classes and is expected to have transfer ability
to unseen classes. In this paper, we tackle this problem by
exploiting the intrinsic relationship between the semantic
space manifold and the transfer ability of visual-semantic
mapping. We formalize their connection and cast zero-shot
recognition as a joint optimization problem. Motivated by
this, we propose a novel framework for zero-shot recognition, which contains dual visual-semantic mapping paths.
Our analysis shows this framework can not only apply prior semantic knowledge to infer underlying semantic manifold in the image feature space, but also generate optimized
semantic embedding space, which can enhance the transfer
ability of the visual-semantic mapping to unseen classes.
The proposed method is evaluated for zero-shot recognition
on four benchmark datasets, achieving outstanding results