Abstract
This paper presents a novel descriptor for human detec- tion in video sequence. It is referred to as spatial-temporal granularity -tunable gradients partition (STGGP), which is an extension of granularity-tunable gradients partition (GGP) from the still image do- main to the spatial-temporal domain. Specifically, the moving human body is considered as a 3-dimensional entity in the spatial-temporal do- main. Then in 3D Hough space, we define the generalized plane as a primitive to parse the structure of this 3D entity. The advantage of the generalized plane is that it can tolerate imperfect planes with certain level of uncertainty in rotation and translation. The robustness to the uncertainty is controlled quantitatively by the granularity parameters defined explicitly in the generalized plane. This property endows the STGGP descriptors versatile ability to represent both the deterministic structures and the statistical summarizations of the ob ject. Moreover, the STGGP descriptor encodes much heterogeneous information such as the gradients’ strength, position, and distribution, as well as their temporal motion to enrich its representation ability. We evaluate the STGGP on human detection in sequence on the public datasets and very promising results have been achieved.