Attentive Fashion Grammar Network for
Fashion Landmark Detection and Clothing Category Classification
Abstract
This paper proposes a knowledge-guided fashion network to solve the problem of visual fashion analysis, e.g.,
fashion landmark localization and clothing category classification. The suggested fashion model is leveraged with
high-level human knowledge in this domain. We propose
two important fashion grammars: (i) dependency grammar
capturing kinematics-like relation, and (ii) symmetry grammar accounting for the bilateral symmetry of clothes. We introduce Bidirectional Convolutional Recurrent Neural Networks (BCRNNs) for efficiently approaching message passing over grammar topologies, and producing regularized
landmark layouts. For enhancing clothing category classification, our fashion network is encoded with two novel
attention mechanisms, i.e., landmark-aware attention and
category-driven attention. The former enforces our network to focus on the functional parts of clothes, and learns
domain-knowledge centered representations, leading to a
supervised attention mechanism. The latter is goal-driven,
which directly enhances task-related features and can be
learned in an implicit, top-down manner. Experimental results on large-scale fashion datasets demonstrate the superior performance of our fashion grammar network