Abstract
Generative Adversarial Nets (GANs) and Conditonal
GANs (CGANs) show that using a trained network as loss
function (discriminator) enables to synthesize highly structured outputs (e.g. natural images). However, applying a
discriminator network as a universal loss function for common supervised tasks (e.g. semantic segmentation, line detection, depth estimation) is considerably less successful.
We argue that the main difficulty of applying CGANs to supervised tasks is that the generator training consists of optimizing a loss function that does not depend directly on
the ground truth labels. To overcome this, we propose to
replace the discriminator with a matching network taking
into account both the ground truth outputs as well as the
generated examples. As a consequence, the generator loss
function also depends on the targets of the training examples, thus facilitating learning. We demonstrate on three
computer vision tasks that this approach can significantly
outperform CGANs achieving comparable or superior results to task-specific solutions and results in stable training.
Importantly, this is a general approach that does not require
the use of task-specific loss functions