Abstract
We present a method for localizing facial keypoints on animals by transferring knowledge gained from human faces. Instead of directly fifinetuning a network trained to detect keypoints on human faces to animal faces (which is sub-optimal since human and animal faces can look quite different), we propose to fifirst adapt the animal images to the pre-trained human detection network by correcting for the differences in animal and human face shape. We fifirst fifind the nearest human neighbors for each animal image using an unsupervised shape matching method. We use these matches to train a thin plate spline warping network to warp each animal face to look more human-like. The warping network is then jointly fifinetuned with a pre-trained human facial keypoint detection network using an animal dataset. We demonstrate state-of-the-art results on both horse and sheep facial keypoint detection, and signifificant improvement over simple fifinetuning, especially when training data is scarce. Additionally, we present a new dataset with 3717 images with horse face and facial keypoint annotations.