Abstract. Colorizing a given gray-level image is an important task in
the media and advertising industry. Due to the ambiguity inherent to
colorization (many shades are often plausible), recent approaches started
to explicitly model diversity. However, one of the most obvious artifacts,
structural inconsistency, is rarely considered by existing methods which
predict chrominance independently for every pixel. To address this issue,
we develop a conditional random field based variational auto-encoder
formulation which is able to achieve diversity while taking into account
structural consistency. Moreover, we introduce a controllability mechanism that can incorporate external constraints from diverse sources including a user interface. Compared to existing baselines, we demonstrate
that our method obtains more diverse and globally consistent colorizations on the LFW, LSUN-Church and ILSVRC-2015 datasets.