Abstract
The problem of data augmentation in feature space is
considered. A new architecture, denoted the FeATure TransfEr Network (FATTEN), is proposed for the modeling of feature trajectories induced by variations of object pose. This
architecture exploits a parametrization of the pose manifold in terms of pose and appearance. This leads to a deep
encoder/decoder network architecture, where the encoder
factors into an appearance and a pose predictor. Unlike
previous attempts at trajectory transfer, FATTEN can be
efficiently trained end-to-end, with no need to train separate feature transfer functions. This is realized by supplying
the decoder with information about a target pose and the
use of a multi-task loss that penalizes category- and posemismatches. In result, FATTEN discourages discontinuous
or non-smooth trajectories that fail to capture the structure
of the pose manifold, and generalizes well on object recognition tasks involving large pose variation. Experimental
results on the artificial ModelNet database show that it can
successfully learn to map source features to target features
of a desired pose, while preserving class identity. Most notably, by using feature space transfer for data augmentation
(w.r.t. pose and depth) on SUN-RGBD objects, we demonstrate considerable performance improvements on one/fewshot object recognition in a transfer learning setup, compared to current state-of-the-art methods