Abstract
We propose a new learning method to infer a mid-level fea- ture representation that combines the advantage of semantic attribute representations with the higher expressive power of non-semantic fea- tures. The idea lies in augmenting an existing attribute-based repre- sentation with additional dimensions for which an autoencoder model is coupled with a large-margin principle. This construction allows a smooth transition between the zero-shot regime with no training example, the unsupervised regime with training examples but without class labels, and the supervised regime with training examples and with class labels. The resulting optimization problem can be solved efficiently, because sev- eral of the necessity steps have closed-form solutions. Through extensive experiments we show that the augmented representation achieves bet- ter results in terms of ob ject categorization accuracy than the semantic representation alone.