VGG Human Pose Estimation 姿势图像标注数据
The YouTube Pose dataset is a collection of 50 YouTube videos for human upper body pose estimation. It consists of 50 videos found on YouTube covering a broad range of activities and people, e.g., dancing, stand-up comedy, how-to, sports, disk jockeys, performing arts and dancing sign language signers. One hundred frames from each video have been manually annotated with 2D locations of the upper body joints.
BBC Pose [6]
BBC Pose consists of 20 videos (each 0.5h-1.5h in length) recorded from BBC with an overlaid sign language interpreter.
Split into train/validation/testThe 20 videos are split into 10 videos for training, 5 for validation and 5 for testing. The dataset contains 9 signers; of these 9 signers, the training and validation sets contain 5, and the testing set contains another 4. Splitting the data this way maintains enough diversity for training but also ensures fairness as the test set contains completely different signers than the training and validation sets. Manual ground truth (validation & testing)200 frames from each validation and test video are sampled by clustering the signers' poses (using tracking output from Buehler et al. CVPR'09 - see Sect 2 in [3]) and uniformly sampling frames across clusters, yielding in total 1,000 frames for validation and 1,000 frames for testing. Sampling in this manner ensures the accuracy of joint estimates are not biased towards poses which occur more frequently. These 2,000 sampled frames are manually annotated with upper-body joint locations (head, wrists, elbows and shoulders). Semi-automatic ground truth (training)In addition to the manual ground truth labels above, all frames of all videos have been assigned joint locations using a semi-automatic but reliable tracker by Buehler et al. CVPR'09. These labels are used as ground truth for training. Pose visualisationThe above figure shows a scatter plot of stickmen in BBC Pose. |
Extended BBC Pose [4]
Extended BBC Pose contains all videos from the BBC Pose dataset plus 72 additional training videos. Combined with the original BBC TV dataset, the dataset contains 92 videos (82 training, 5 validation and 5 testing), i.e. around 7 million frames. The frames of the new 72 videos are automatically assigned joint locations (used as ground truth for training) with the tracker of Charles et al. IJCV'13 [4]. In practice, these `ground truth' joint locations are slightly noisier than those in the original BBC Pose dataset (which were obtained using the slow, semi-automatic tracker of Buehler et al. CVPR'09).
Short BBC Pose [7]
Short BBC Pose contains five one-hour-long videos with sign language signers each with different sleeve length (in contrast to the above datasets, which only contain signers with moderately long sleeves). Each of the five videos has 200 test frames (which have been manually annotated with joint locations), amounting to 1,000 test frames in total. Test frames were selected by the authors to contain a diverse range of poses.
ChaLearn Pose
ChaLearn Pose is a subset of the ChaLearn 2013 Multi-modal gesture dataset from Escalera et al. ICMI'13, which contains 23 hours of Kinect data of 27 persons performing 20 Italian gestures. The data includes RGB, depth, foreground segmentations and full body skeletons. In this dataset, both the training and testing labels are noisy (from Kinect).
Dataset statistics
BBC Pose | Extended BBC Pose | BBC Short Pose | ChaLearn Pose | YouTube Pose | |
Total videos | 20 | 92 | 5 | 5 | 50 |
Train videos | 10 | 82 (10 same) | - | 393 | - |
Val videos | 5 | 5 (same) | - | 287 | - |
Test videos | 5 | 5 (same) | 5 | 275 | 50 |
People | 9 | ~40 | 5 | 27 | 50 |
Frames | 1.5M | 7M | 380K | 1.3M | - |
Train labels | Buehler et al. | Buehler et al. (10) + Charles et al. (72) | - | Kinect | - |
Val labels | 1,000 manual GT | 1,000 manual GT (same) | - | Kinect | - |
Test labels | 1,000 manual GT | 1,000 manual GT (same) | 1,000 manual GT | 3,200 Kinect | 5,000 manual GT |
Evaluation
The code below contain scripts for reproducing the following plots which compare pose estimation results to all VGG papers.
下一篇:TIMIT 语音识别数据
还没有评论,说两句吧!
热门资源
GRAZ 图像分类数据
GRAZ 图像分类数据
MIT Cars 汽车图像...
MIT Cars 汽车图像数据
凶杀案报告数据
凶杀案报告数据
猫和狗图像分类数...
Kaggle 上的竞赛数据,用以区分猫和狗两类对象,...
Bosch 流水线降低...
数据来自产品在Bosch真实生产线上制造过程中的设备...
智能在线
400-630-6780
聆听.建议反馈
E-mail: support@tusaishared.com