Description:
The LabelMe-12-50k dataset consists of 50,000 JPEG images (40,000 for training and 10,000 for testing), which were extracted from LabelMe [1]. Each image is 256x256 pixels in size. 50% of the images in the training and testing set show a centered object, each belonging to one of the 12 object classes shown in Table 1. The remaining 50% show a randomly selected region of a randomly selected image ("clutter").
The dataset is a quite difficult challenge for object recognition systems because the instances of each object class vary greatly in appearance, lighting conditions, and angles of view. Furthermore, centered objects may be partly occluded or other objects (or parts of them) may be present in the image. See [1] for a more detailed descripton of the dataset.
Table 1: Object classes and number of instances in the LabelMe-12-50k dataset
# | Object class | Instances in training set | Instances in testing set |
---|
1 | person | 4,885 | 1,180 |
2 | car | 3,829 | 974 |
3 | building | 2,085 | 531 |
4 | window | 4,097 | 1,028 |
5 | tree | 1,846 | 494 |
6 | sign | 954 | 249 |
7 | door | 830 | 178 |
8 | bookshelf | 391 | 100 |
9 | chair | 385 | 88 |
10 | table | 192 | 54 |
11 | keyboard | 324 | 75 |
12 | head | 212 | 49 |
| clutter | 20,000 | 5,000 |
| total number of images | 40,000 | 10,000 |
Annotation format:
The dataset archive contains annotation files in two formats:
Human-readable text files (annotation-train.txt and annotation-test.txt), which contain in each line an image file name (without the .jpg extension) and 12 class labels corresponding to the 12 object classes.
Binary files (annotation-train.bin and annotation-test.bin), which contain 12 successive 32-bit float values for each image, each value representing the class label of the corresponding class. The file does not contain any meta information (e.g., there is no header).
The annotation label values of the two file formats differ slightly because the values in the text files are rounded to the second decimal place. If you want to report recognition rates, you should use the binary annotation files for training and testing because of the more precise label values.
All label values are between -1.0 and 1.0. For the 50% of non-clutter images, the label of the depicted object is set to 1.0. As instances of other object classes may also be present in the image (in object images as well as in clutter images), the other labels either have a value of -1.0 or a value between 0.0 and 1.0. A value of -1.0 is set either if no instance of the object class is present in the image or if the level of overlapping (calculated by the size and position of the object's bounding box) is below a certain threshold. Values above 0.0 are assigned if this threshold is exceeded. A value of 1.0 means that the corresponding object is exactly centered in the image and 160 pixels in size (in its larger dimension), just like the extracted objects.
Recognition rates:
Currently, the only results shown in Table 2 are from our paper [1]. If you would like to report recognition rates, please send them to uetz _at_ ais.uni-bonn.de, including a link to your publication or a description of the method you used.
Table 2: Training and testing error rates on the LabelMe-12-50k dataset
Method used | Training error rate | Testing error rate | Reported by... |
---|
Locally-connected Neural Pyramid | 3.77% | 16.27% | Uetz and Behnke 2009 [1]
|