Abstract. This work introduces double-mapping Gated Recurrent Units
(dGRU), an extension of standard GRUs where the input is considered
as a recurrent state. An extra set of logic gates is added to update the input given the output. Stacking multiple such layers results in a recurrent
auto-encoder: the operators updating the outputs comprise the encoder,
while the ones updating the inputs form the decoder. Since the states
are shared between corresponding encoder and decoder layers, the representation is stratified during learning: some information is not passed
to the next layers. We test our model on future video prediction. Main
challenges for this task include high variability in videos, temporal propagation of errors, and non-specificity of future frames. We show how only
the encoder or decoder needs to be applied for encoding or prediction.
This reduces the computational cost and avoids re-encoding predictions
when generating multiple frames, mitigating error propagation. Furthermore, it is possible to remove layers from a trained model, giving an
insight to the role of each layer. Our approach improves state of the art
results on MMNIST and UCF101, being competitive on KTH with 2 and
3 times less memory usage and computational cost than the best scored
approach