Abstract
Physical fluents,a term originally used by Newton1401.refers to time-varying object states in dynamic scenes.In this paper, we are interested in inferring the fluents of ve-hicles from video.For example,a door(hood,trunk)is open or closed through various actions,light is blinking to turn.Recognizing these fluents has broad applications,yet have received scant attention in the computer vision litera-ture.Car fluent recognition entails a unified framework for car detection,car part localization and part status recog-nition,which is made dificult by large structural and ap-pearance variations,low resolutions and occlusions.This paper learns a spatial-temporal And-Or hierarchical model to represent car fluents.The learning of this model is for-mulated under the latent structural SVM framework.Since there are no publicly related dataset,we collect and anno-tate a car fluent dataset consisting of car videos with diverse fluents.In experiments,the proposed method outperforms several highly related baseline methods in terms of car filu-ent recognition and car part localization.