Abstract
The visual analysis of human manipulation actions is of in- terest for e.g. human-robot interaction applications where a robot learns how to perform a task by watching a human. In this paper, a method for classifying manipulation actions in the context of the objects manip- ulated, and classifying objects in the context of the actions used to ma- nipulate them is presented. Hand and ob ject features are extracted from the video sequence using a segmentation based approach. A shape based representation is used for both the hand and the ob ject. Experiments show this representation suitable for representing generic shape classes. The action-ob ject correlation over time is then modeled using condi- tional random fields. Experimental comparison show great improvement in classification rate when the action-ob ject correlation is taken into ac- count, compared to separate classification of manipulation actions and manipulated objects.