资源数据集IMDB-WIKI 500k 人脸图像、年龄性别数据

IMDB-WIKI 500k 人脸图像、年龄性别数据

2019-11-08 | |  96 |   0 |   0

The IMDB-WIKI dataset

To the best of our knowledge this is the largest publicly available dataset of face images with gender and age labels for training. We provide pretrained models for both age and gender prediction.

image.png

Description

Since the publicly available face image datasets are often of small to medium size, rarely exceeding tens of thousands of images, and often without age information we decided to collect a large dataset of celebrities. For this purpose, we took the list of the most popular 100,000 actors as listed on the IMDb website and (automatically) crawled from their profiles date of birth, name, gender and all images related to that person. Additionally we crawled all profile images from pages of people from Wikipedia with the same meta information. We removed the images without timestamp (the date when the photo was taken). Assuming that the images with single faces are likely to show the actor and that the timestamp and date of birth are correct, we were able to assign to each such image the biological (real) age. Of course, we can not vouch for the accuracy of the assigned age information. Besides wrong timestamps, many images are stills from movies - movies that can have extended production times. In total we obtained 460,723 face images from 20,284 celebrities from IMDb and 62,328 from Wikipedia, thus 523,051 in total.

As some of the images (especially from IMDb) contain several people we only use the photos where the second strongest face detection is below a threshold. For the network to be equally discriminative for all ages, we equalize the age distribution for training. For more details please the see the paper.

Usage

For both the IMDb and Wikipedia images we provide a separate .mat file which can be loaded with Matlab containing all the meta information. The format is as follows:

  • dob: date of birth (Matlab serial date number)

  • photo_taken: year when the photo was taken

  • full_path: path to file

  • gender: 0 for female and 1 for male, NaN if unknown

  • name: name of the celebrity

  • face_location: location of the face. To crop the face in Matlab run

    img(face_location(2):face_location(4),face_location(1):face_location(3),:))
  • face_score: detector score (the higher the better). Inf implies that no face was found in the image and the face_location then just returns the entire image

  • second_face_score: detector score of the face with the second highest score. This is useful to ignore images with more than one face. second_face_score is NaN if no second face was detected.

  • celeb_names (IMDB only): list of all celebrity names

  • celeb_id (IMDB only): index of celebrity name

The age of a person can be calculated based on the date of birth and the time when the photo was taken (note that we assume that the photo was taken in the middle of the year):

[age,~]=datevec(datenum(wiki.photo_taken,7,1)-wiki.dob);


上一篇:Stanford Cars 汽车图片数据

下一篇:The Food-101 图像数据

用户评价
全部评价

热门资源

  • GRAZ 图像分类数据

    GRAZ 图像分类数据

  • MIT Cars 汽车图像...

    MIT Cars 汽车图像数据

  • 凶杀案报告数据

    凶杀案报告数据

  • 猫和狗图像分类数...

    Kaggle 上的竞赛数据,用以区分猫和狗两类对象,...

  • Bosch 流水线降低...

    数据来自产品在Bosch真实生产线上制造过程中的设备...