Abstract. Semantic segmentation has made much progress with increasingly
powerful pixel-wise classifiers and incorporating structural priors via Conditional
Random Fields (CRF) or Generative Adversarial Networks (GAN). We propose
a simpler alternative that learns to verify the spatial structure of segmentation
during training only. Unlike existing approaches that enforce semantic labels on
individual pixels and match labels between neighbouring pixels, we propose the
concept of Adaptive Affinity Fields (AAF) to capture and match the semantic
relations between neighbouring pixels in the label space. We use adversarial
learning to select the optimal affinity field size for each semantic category. It
is formulated as a minimax problem, optimizing our segmentation neural network in a best worst-case learning scenario. AAF is versatile for representing
structures as a collection of pixel-centric relations, easier to train than GAN and
more efficient than CRF without run-time inference. Our extensive evaluations on
PASCAL VOC 2012, Cityscapes, and GTA5 datasets demonstrate its above-par
segmentation performance and robust generalization across domains