Self-supervised Learning of Geometrically Stable Features Through
Probabilistic Introspection
Abstract
Self-supervision can dramatically cut back the amount
of manually-labelled data required to train deep neural networks. While self-supervision has usually been considered
for tasks such as image classification, in this paper we aim
at extending it to geometry-oriented tasks such as semantic
matching and part detection. We do so by building on several recent ideas in unsupervised landmark detection. Our
approach learns dense distinctive visual descriptors from
an unlabeled dataset of images using synthetic image transformations. It does so by means of a robust probabilistic
formulation that can introspectively determine which image
regions are likely to result in stable image matching. We
show empirically that a network pre-trained in this manner
requires significantly less supervision to learn semantic object parts compared to numerous pre-training alternatives.
We also show that the pre-trained representation is excellent
for semantic object matching