Abstract
Deep learning methods have achieved great successesin pedestrian detection, owing to its ability to learn dis-criminative features from raw pixels. However, they treatpedestrian detection as a single binary classification task,which may confuse positive with hard negative samples(Fig.1 (a)). To address this ambiguity, this work jointly op-timize pedestrian detection with semantic tasks, including pedestrian attributes (e.g. ‘carrying backpack’) and sceneattributes (e.g. ‘vehicle’, ‘tree’, and ‘horizontal’). Ratherthan expensively annotating scene attributes, we transferattributes information from existing scene segmentationdatasets to the pedestrian dataset, by proposing a noveldeep model to learn high-level features from multiple tasks and multiple data sources. Since distinct tasks have distinctconvergence rates and data from different datasets havedifferent distributions, a multi-task deep model is carefullydesigned to coordinate tasks and reduce discrepanciesamong datasets. Extensive evaluations show that theproposed approach outperforms the state-of-the-art on the challenging Caltech [9] and ETH [10] datasets where itreduces the miss rates of previous deep models by 17 and 5.5 percent, respectively.