资源论文CLEVRER: COLLISION EVENTS FORV IDEO REPRESENTATION AND REASONING

CLEVRER: COLLISION EVENTS FORV IDEO REPRESENTATION AND REASONING

2020-01-02 | |  78 |   61 |   0

Abstract

The ability to reason about temporal and causal events from videos lies at the core of human intelligence. Most video reasoning benchmarks, however, focus on pattern recognition from complex visual and language input, instead of on causal structure. We study the complementary problem, exploring the temporal and causal structures behind videos of objects with simple visual appearance. To this end, we introduce the CoLlision Events for Video REpresentation and Reasoning (CLEVRER) dataset, a diagnostic video dataset for systematic evaluation of computational models on a wide range of reasoning tasks. Motivated by the theory of human causal judgment, CLEVRER includes four types of question: descriptive (e.g., ‘what color’), explanatory (‘what’s responsible for’), predictive (‘what will happen next’), and counterfactual (‘what if’). We evaluate various state-of-the-art models for visual reasoning on our benchmark. While these models thrive on the perceptionbased task (descriptive), they perform poorly on the causal tasks (explanatory, predictive and counterfactual), suggesting that a principled approach for causal reasoning should incorporate the capability of both perceiving complex visual and language inputs, and understanding the underlying dynamics and causal relations. We also study an oracle model that explicitly combines these components via symbolic representations. CLEVRER will be made publicly available.

上一篇:GUIDING PROGRAM SYNTHESIS BYL EARNING TO GENERATE EXAMPLES

下一篇:TO RELIEVE YOUR HEADACHE OF TRAINING AN MRF,TAKE ADVIL

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...