Abstract
We present a texture network called Deep Encoding
Pooling Network (DEP) for the task of ground terrain
recognition. Recognition of ground terrain is an important
task in establishing robot or vehicular control parameters,
as well as for localization within an outdoor environment.
The architecture of DEP integrates orderless texture details
and local spatial information and the performance of DEP
surpasses state-of-the-art methods for this task. The GTOS
database (comprised of over 30,000 images of 40 classes
of ground terrain in outdoor scenes) enables supervised
recognition. For evaluation under realistic conditions, we
use test images that are not from the existing GTOS dataset,
but are instead from hand-held mobile phone videos of similar terrain. This new evaluation dataset, GTOS-mobile,
consists of 81 videos of 31 classes of ground terrain such
as grass, gravel, asphalt and sand. The resultant network
shows excellent performance not only for GTOS-mobile, but
also for more general databases (MINC and DTD). Leveraging the discriminant features learned from this network,
we build a new texture manifold called DEP-manifold. We
learn a parametric distribution in feature space in a fully
supervised manner, which gives the distance relationship
among classes and provides a means to implicitly represent
ambiguous class boundaries. The source code and database
are publicly available