资源论文Pedestrian Attribute Recognition by Joint Visual-semantic Reasoning and Knowledge Distillation

Pedestrian Attribute Recognition by Joint Visual-semantic Reasoning and Knowledge Distillation

2019-10-08 | |  48 |   39 |   0

Abstract Pedestrian attribute recognition in surveillance is a challenging task in computer vision due to signifificant pose variation, viewpoint change and poor image quality. To achieve effective recognition, this paper presents a graph-based global reasoning framework to jointly model potential visualsemantic relations of attributes and distill auxiliary human parsing knowledge to guide the relational learning. The reasoning framework models attribute groups on a graph and learns a projection function to adaptively assign local visual features to the nodes of the graph. After feature projection, graph convolution is utilized to perform global reasoning between the attribute groups to model their mutual dependencies. Then, the learned node features are projected back to visual space to facilitate knowledge transfer. An additional regularization term is proposed by distilling human parsing knowledge from a pre-trained teacher model to enhance feature representations. The proposed framework is verifified on three large scale pedestrian attribute datasets including PETA, RAP, and PA- 100k. Experiments show that our method achieves state-of-the-art results

上一篇:Out of Sight But Not Out of Mind: An Answer Set Programming Based Online Abduction Framework for Visual Sensemaking in Autonomous Driving

下一篇:RBCN: Rectified Binary Convolutional Networks for Enhancing the Performance of 1-bit DCNNs

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...