Abstract
Predicting future actions from observed partial
videos is very challenging as the missing future
is uncertain and sometimes has multiple possibilities. To obtain a reliable future estimation, a
novel encoder-decoder architecture is proposed for
integrating the tasks of synthesizing future motions from observed videos and reconstructing observed motions from synthesized future motions
in an unified framework, which can capture the
bi-directional dynamics depicted in partial videos
along the temporal (past-to-future) direction and
reverse chronological (future-back-to-past) direction. We then employ a bi-directional long shortterm memory (Bi-LSTM) architecture to exploit
the learned bi-directional dynamics for predicting
early actions. Our experiments on two benchmark
action datasets show that learning bi-directional
dynamics benefits the early action prediction and
our system clearly outperforms the state-of-the-art
methods.