Our ability of vividly describing the content of images is a clear demonstration of the power of human visual system. Not only we can recognise objects in images (e.g. a cat, a person, or a car), but we can also describe them to the most minute details, extracting an impressive amount of information at a glance. But visual perception is not limited to the recognition and description of objects. Prior to high-level semantic understanding, most textural patterns elicit a rich array of visual impressions. We could describe a texture as "polka dotted, regular, sparse, with blue dots on a white background"; or as "noisy, line-like, and irregular".
Our aim is to reproduce this capability in machines. Scientifically, the aim is to gain further insight in how textural information may be processed, analysed, and represented by an intelligent system. Compared to classic task of textural analysis such as material recognition, such perceptual properties are much richer in variety and structure, inviting new technical challenges.
DTD is a texture database, consisting of 5640 images, organized according to a list of 47 terms (categories) inspired from human perception. There are 120 images for each category. Image sizes range between 300x300 and 640x640, and the images contain at least 90% of the surface representing the category attribute. The images were collected from Google and Flickr by entering our proposed attributes and related terms as search queries. The images were annotated using Amazon Mechanical Turk in several iterations. For each image we provide key attribute (main category) and a list of joint attributes.
The data is split in three equal parts, in train, validation and test, 40 images per class, for each split. We provide the ground truth annotation for both key and joint attributes, as well as the 10 splits of the data we used for evaluation.