These models are described in this NAACL-HLT 2015 paper.
Translating Videos to Natural Language Using Deep Recurrent Neural Networks
S. Venugopalan, H. Xu, J. Donahue, M. Rohrbach, R. Mooney, K. Saenko
NAACL-HLT 2015
More details can be found on this project <br/>page.
Model:
Video2Text_VGG_mean_pool:
This model is an improved version of the mean pooled model described in the NAACL-HLT 2015 paper. It uses video frame features from the
VGG-16 layer model. This is trained only on the Youtube video dataset.
Compatibility:
These are pre-release models. They do not run in any current version of BVLC/caffe, as they require unmerged PRs. The models are currently supported by the recurrent
branch of the Caffe fork provided at https://github.com/jeffdonahue/caffe/tree/recurrent and https://github.com/vsubhashini/caffe/tree/recurrent.
无链接